Code Review Agent with Claude Agent SDK

Code review is a perfect task for an AI agent: it requires reading context, finding related files, and applying different criteria to different parts of code. We break down how to build an agent that reviews code like an experienced engineer — prioritizing findings, searching for context, and posting a structured report directly to GitHub.

AdvancedAI Agents35 minClaude Agent SDK, Python, GitHub API

The agent is not a replacement — it is a first filter

Why put AI on code review at all? Not to replace a human — to free them up. The agent reads every line of the diff in 30 seconds, catches null-pointers and forgotten awaits, checks imports. The human spends that time on architectural decisions and business logic — the stuff AI does poorly. Practical takeaway: design the agent as an assistant with a clear scope. Security, bugs, tests — yes. Architecture suggestions and code style — no.

🤖 Give to the agent

Check 100% of diff lines
Common bug patterns
Security and leaked secrets
Test coverage gaps

👤 Leave to humans

Architectural decisions
Business context and goals
Mentoring and teaching
Nuanced trade-offs

An agent that finds 80% of problems in 30 seconds changes the entire team's rhythm. Don't chase 100% — that's a trap.

Five tools — and not one more

A good reviewer does not just look at the diff. They open adjacent files, check how a function is used elsewhere, look for tests. The agent does the same — but only if you gave it the right tools. Each tool answers one question: "what changed?", "how is it used?", "are there tests?". The agent decides which question to ask next — that is what makes it an agent, not a script. And every tool is strictly read-only — the agent analyzes but never touches code.

Read diff

what to understand?

Fetch context

context gathered

Analyze

findings ready

Build report

5 инструментов агента (только чтение!):
  что_изменилось → читает diff PR
  контекст → открывает файл вокруг изменений
  использования → где ещё вызывается эта функция
  тесты → есть ли тесты на изменённую логику
  история → кто и зачем менял этот файл раньше

`get_file_history` prevents false alarms. The agent sees that "suspicious code" was intentionally written three months ago — and stays quiet.

The system prompt is the reviewer's personality

Claude Agent SDK handles the reasoning loop: calling the model, executing tools, feeding results back. You do not write this loop — the SDK runs it until the agent decides the task is complete. But the SDK is the engine. The steering wheel is the system prompt. That is where you define what kind of reviewer your agent will be. A pedantic formalist and a pragmatic senior will produce different results on the same code. Write concrete rules into the prompt: which categories to check, what tone to use, what to ignore. The more specific, the more predictable. `max_turns` is your safeguard against infinite loops. For review, 10-15 iterations are enough: read the diff, fetch context, build the report. If the agent spins longer — the prompt is too vague.

агент = модель + инструменты + system prompt
  system prompt = личность ревьюера:
    какие категории проверять (баги, безопасность, тесты)
    какой тон (строгий / дружелюбный)
    что игнорировать (стиль, форматирование)
  max_turns = 10-15 (страховка от зацикливания)

запуск: "Review PR: {url}" → агент сам решает
  какие файлы открыть и в каком порядке

Start with haiku for testing — verify the agent calls the right tools. Switch to opus when the logic works. The cost difference is 10x.

If everything is equally important — nothing is

The most common problem with AI review: 20 findings with no priorities. The developer does not know what to fix before merging and what to ignore. The solution is four levels baked into the system prompt: Critical (blocks merge), Important (should fix), Minor (nice to have), Nitpick (skip it). Only Critical and Important appear as line comments in the PR. Minor and Nitpick go into the summary only. This is not just convenience: developers quickly tire of an agent flooding the PR with trivia and start ignoring everything.

Critical — blocks merge

Bugs, vulnerabilities, data loss

Important — should fix

Performance, tests, duplication

Minor — when possible

Readability, naming, comments

Nitpick — can ignore

Style, formatting, personal taste

Add a limit to the prompt: no more than 3 Critical per review. If the agent finds 10 — the problem is not the code, it is your criteria.

Line comments in the PR — or the agent is useless

An agent that writes reviews to the console is a toy. Real value appears when findings are attached to code lines directly in the PR. The developer sees them in context without switching tools. GitHub API nuance: you can only attach a comment to a line that exists in the current PR diff. If the line is outside the diff — post as a general comment. A pattern that works: first a summary with the overall count ("2 Critical, 3 Important"), then line comments only for Critical and Important. Format for each comment: severity in the header, one sentence about the problem, one about the fix. No walls of text. The developer should understand the finding in 5 seconds.

публикация в PR:
  1. summary-комментарий: "2 Critical, 3 Important"
  2. для каждого замечания с severity ≥ important:
     → комментарий К СТРОКЕ в diff
     → формат: [SEVERITY] проблема + исправление
  3. minor и nitpick → только в summary, не в строках

  нюанс: строчный комментарий можно привязать
  только к строке, которая есть в diff текущего PR

Forbid the agent from commenting on code style. Style comments are the most subjective and the most annoying. Let the linter do its job, and the agent do its own.

Result

An agent built on Claude Agent SDK that reads the PR diff, fetches repository context, classifies findings by four severity levels, and publishes a structured review to GitHub — summary plus line comments for Critical and Important findings.

All Recipes

Code Review Agent with Claude Agent SDK

AdvancedAI Agents35 minClaude Agent SDK, Python, GitHub API

The agent is not a replacement — it is a first filter

🤖 Give to the agent

Check 100% of diff lines
Common bug patterns
Security and leaked secrets
Test coverage gaps

👤 Leave to humans

Architectural decisions
Business context and goals
Mentoring and teaching
Nuanced trade-offs

An agent that finds 80% of problems in 30 seconds changes the entire team's rhythm. Don't chase 100% — that's a trap.

Five tools — and not one more

Read diff

what to understand?

Fetch context

context gathered

Analyze

findings ready

Build report

5 инструментов агента (только чтение!):
  что_изменилось → читает diff PR
  контекст → открывает файл вокруг изменений
  использования → где ещё вызывается эта функция
  тесты → есть ли тесты на изменённую логику
  история → кто и зачем менял этот файл раньше

`get_file_history` prevents false alarms. The agent sees that "suspicious code" was intentionally written three months ago — and stays quiet.

The system prompt is the reviewer's personality

агент = модель + инструменты + system prompt
  system prompt = личность ревьюера:
    какие категории проверять (баги, безопасность, тесты)
    какой тон (строгий / дружелюбный)
    что игнорировать (стиль, форматирование)
  max_turns = 10-15 (страховка от зацикливания)

запуск: "Review PR: {url}" → агент сам решает
  какие файлы открыть и в каком порядке

Start with haiku for testing — verify the agent calls the right tools. Switch to opus when the logic works. The cost difference is 10x.

If everything is equally important — nothing is

Critical — blocks merge

Bugs, vulnerabilities, data loss

Important — should fix

Performance, tests, duplication

Minor — when possible

Readability, naming, comments

Nitpick — can ignore

Style, formatting, personal taste

Add a limit to the prompt: no more than 3 Critical per review. If the agent finds 10 — the problem is not the code, it is your criteria.

Line comments in the PR — or the agent is useless

публикация в PR:
  1. summary-комментарий: "2 Critical, 3 Important"
  2. для каждого замечания с severity ≥ important:
     → комментарий К СТРОКЕ в diff
     → формат: [SEVERITY] проблема + исправление
  3. minor и nitpick → только в summary, не в строках

  нюанс: строчный комментарий можно привязать
  только к строке, которая есть в diff текущего PR

Forbid the agent from commenting on code style. Style comments are the most subjective and the most annoying. Let the linter do its job, and the agent do its own.

Code Review Agent with Claude Agent SDK

The agent is not a replacement — it is a first filter

🤖 Give to the agent

👤 Leave to humans

Five tools — and not one more

The system prompt is the reviewer's personality

If everything is equally important — nothing is

Line comments in the PR — or the agent is useless

Result

Related Theory

Code Review Agent with Claude Agent SDK

The agent is not a replacement — it is a first filter

🤖 Give to the agent

👤 Leave to humans

Five tools — and not one more

The system prompt is the reviewer's personality

If everything is equally important — nothing is

Line comments in the PR — or the agent is useless

Result

Related Theory