Code Review Agent with Claude Agent SDK
Code review is a perfect task for an AI agent: it requires reading context, finding related files, and applying different criteria to different parts of code. We break down how to build an agent that reviews code like an experienced engineer — prioritizing findings, searching for context, and posting a structured report directly to GitHub.
AdvancedAI Agents35 minClaude Agent SDK, Python, GitHub API
1
The agent is not a replacement — it is a first filter
Why put AI on code review at all? Not to replace a human — to free them up. The agent reads every line of the diff in 30 seconds, catches null-pointers and forgotten awaits, checks imports. The human spends that time on architectural decisions and business logic — the stuff AI does poorly.
Practical takeaway: design the agent as an assistant with a clear scope. Security, bugs, tests — yes. Architecture suggestions and code style — no.
🤖 Give to the agent
- Check 100% of diff lines
- Common bug patterns
- Security and leaked secrets
- Test coverage gaps
👤 Leave to humans
- Architectural decisions
- Business context and goals
- Mentoring and teaching
- Nuanced trade-offs
An agent that finds 80% of problems in 30 seconds changes the entire team's rhythm. Don't chase 100% — that's a trap.
2
Five tools — and not one more
A good reviewer does not just look at the diff. They open adjacent files, check how a function is used elsewhere, look for tests. The agent does the same — but only if you gave it the right tools.
Each tool answers one question: "what changed?", "how is it used?", "are there tests?". The agent decides which question to ask next — that is what makes it an agent, not a script. And every tool is strictly read-only — the agent analyzes but never touches code.
Read diff
what to understand?
Fetch context
context gathered
Analyze
findings ready
Build report
5 инструментов агента (только чтение!):
что_изменилось → читает diff PR
контекст → открывает файл вокруг изменений
использования → где ещё вызывается эта функция
тесты → есть ли тесты на изменённую логику
история → кто и зачем менял этот файл раньше`get_file_history` prevents false alarms. The agent sees that "suspicious code" was intentionally written three months ago — and stays quiet.
3
The system prompt is the reviewer's personality
Claude Agent SDK handles the reasoning loop: calling the model, executing tools, feeding results back. You do not write this loop — the SDK runs it until the agent decides the task is complete.
But the SDK is the engine. The steering wheel is the system prompt. That is where you define what kind of reviewer your agent will be. A pedantic formalist and a pragmatic senior will produce different results on the same code. Write concrete rules into the prompt: which categories to check, what tone to use, what to ignore. The more specific, the more predictable.
`max_turns` is your safeguard against infinite loops. For review, 10-15 iterations are enough: read the diff, fetch context, build the report. If the agent spins longer — the prompt is too vague.
агент = модель + инструменты + system prompt
system prompt = личность ревьюера:
какие категории проверять (баги, безопасность, тесты)
какой тон (строгий / дружелюбный)
что игнорировать (стиль, форматирование)
max_turns = 10-15 (страховка от зацикливания)
запуск: "Review PR: {url}" → агент сам решает
какие файлы открыть и в каком порядкеStart with haiku for testing — verify the agent calls the right tools. Switch to opus when the logic works. The cost difference is 10x.
4
If everything is equally important — nothing is
The most common problem with AI review: 20 findings with no priorities. The developer does not know what to fix before merging and what to ignore. The solution is four levels baked into the system prompt: Critical (blocks merge), Important (should fix), Minor (nice to have), Nitpick (skip it).
Only Critical and Important appear as line comments in the PR. Minor and Nitpick go into the summary only. This is not just convenience: developers quickly tire of an agent flooding the PR with trivia and start ignoring everything.
Critical — blocks merge
Bugs, vulnerabilities, data loss
Important — should fix
Performance, tests, duplication
Minor — when possible
Readability, naming, comments
Nitpick — can ignore
Style, formatting, personal taste
Add a limit to the prompt: no more than 3 Critical per review. If the agent finds 10 — the problem is not the code, it is your criteria.
5
Line comments in the PR — or the agent is useless
An agent that writes reviews to the console is a toy. Real value appears when findings are attached to code lines directly in the PR. The developer sees them in context without switching tools.
GitHub API nuance: you can only attach a comment to a line that exists in the current PR diff. If the line is outside the diff — post as a general comment. A pattern that works: first a summary with the overall count ("2 Critical, 3 Important"), then line comments only for Critical and Important.
Format for each comment: severity in the header, one sentence about the problem, one about the fix. No walls of text. The developer should understand the finding in 5 seconds.
публикация в PR:
1. summary-комментарий: "2 Critical, 3 Important"
2. для каждого замечания с severity ≥ important:
→ комментарий К СТРОКЕ в diff
→ формат: [SEVERITY] проблема + исправление
3. minor и nitpick → только в summary, не в строках
нюанс: строчный комментарий можно привязать
только к строке, которая есть в diff текущего PRForbid the agent from commenting on code style. Style comments are the most subjective and the most annoying. Let the linter do its job, and the agent do its own.
Result
An agent built on Claude Agent SDK that reads the PR diff, fetches repository context, classifies findings by four severity levels, and publishes a structured review to GitHub — summary plus line comments for Critical and Important findings.