Reflection: The Agent That Critiques Itself
Agents trust their first answer too much — and fail exactly where a human would reread and fix. Reflection forces the model to draft first, then critique, then revise — and quality jumps significantly.
IntermediateAI Agents20 minClaude API, Any LLM, Prompt chaining
1
Draft → critic → revise: three chairs of one agent
Picture a writer who first drafts quickly, then swaps to an editor's chair and brutally marks problems, then returns to the desk and rewrites. One person, three roles, three different modes of thinking. Reflection does the same with an LLM: the same model plays generator, critic, and editor in turn.
Why it works: an LLM is far better at spotting problems in finished text than avoiding them upfront. Generation and evaluation are different cognitive tasks, and separating them in the prompt gives the model a chance to see its own work with fresh eyes. The first draft is usually mediocre — and that's a feature, not a bug.
Request
Draft
Critique
Revise
Done
The first draft is always bad — and that's fine. Reflection's job isn't to write perfectly on the first try, but to improve systematically.
2
When reflection helps, and when it just burns tokens
Reflection isn't free — every critique-and-revise cycle is extra model calls. It pays off where the 'correct' answer can't be formalized in advance: code, long-form writing, analysis, complex reasoning. For tasks with short or tightly constrained outputs — classification, yes/no, field extraction — reflection just burns money and time.
The most reliable test: can you check the result programmatically? If yes — write an eval or unit test. If 'correct' is defined only by human taste — reflection is your tool. Between these poles lies a grey zone (creative writing) where the critic is subjective, and iterations may not converge to anything better.
Where does reflection pay off?
Generating code with non-trivial logic
Long-form analysis or research report
Yes/no or single-class classification
Field extraction on a fixed schema
One-to-two-sentence summary
Creative writing — helps, but critic is subjective
If a task can be verified programmatically — reflection is overkill, use an eval. Reflection is for tasks where 'correct' can't be formalized.
3
The critic should be a different model — or at least a different prompt
Counter-intuitive point: the same model with the same context rarely finds its own mistakes. It's committed to its answer and tends to confirm its decisions — pure confirmation bias. The naive 'now check yourself' loop usually returns 'looks fine'.
Two working approaches. First — a different model as the critic (Haiku generates, Sonnet critiques, or vice versa). Fresh perspective, different training data, fewer overlapping blind spots. Second — same model, but in a new session with no history: the prompt isn't 'review your answer' but 'you are a strict reviewer, here is someone else's text, find problems'. Role plus fresh context fools confirmation bias about as well as one model can.
❌ Same prompt, same context
- Model already committed to the answer
- Confirmation bias — "looks fine"
- Finds cosmetics, misses substance
✅ Fresh context, critic role
- No attachment to the prior answer
- Explicit mandate to find problems
- Finds real bugs and logical gaps
4
The infinite loop is reflection's biggest enemy
Without a hard limit the loop turns into a spiral: the critic finds three problems, the editor fixes them, a new critic finds three more in the fresh text, and so on. Every iteration looks meaningful — the critic will always say something if you ask. But quality stops rising after 2-3 passes: the curve flatlines while the token counter keeps climbing.
Empirical rule: 2-3 iterations for code and summarization, up to 4 for deep analysis. Beyond that you're paying for stylistic rearrangement. Three stopping mechanisms work together. Hard cap — a fixed iteration limit, no exceptions. Quality threshold — the critic returns a structured response with an issue counter, and when high-severity issues = 0, the loop stops. Time/token budget — an overall limit per task so one hard question doesn't eat your daily quota.
A separate trap: if on the third pass the critic is still finding 'problems' — the critic prompt is usually too strict, not the text actually bad. Log every iteration: what the critic found, what the editor changed, whether it got better.
Log every iteration. If on the 3rd pass the critic is still finding 'problems' — the critic prompt is too strict, not the code bad.
5
The critic prompt isn't "review it" — it's "find exactly 3 problems"
The main lever for reflection quality is the critic prompt, not the editor's. Vague 'review this code' gives a vague answer: 'looks fine', 'seems to work', 'could be more readable'. None of that turns into a fix. A structured requirement — 'find exactly N issues with severity and line number' — forces specificity, and you get an actionable list.
Two techniques work especially well. First — fix the output format upfront (JSON with an issues array, each with severity and a fix suggestion). Second — explicitly allow the critic to say 'no issues found': without that escape hatch the model invents fake problems just to 'complete the task'.
❌ Плохо:
"Проверь этот код. Есть ли ошибки?"
→ Claude: "выглядит нормально"
✅ Хорошо:
"Найди до 3 конкретных проблем в этом коде.
Для каждой: [строка] [severity: low/med/high] [как исправить].
Если проблем нет — верни {"done": true, "issues": []}."
→ Claude: структурированный список с severity и fix-предложениемReturn the critique as JSON with severity levels. Stop the reflection loop when high-severity issues = 0, not when the critic simply goes quiet.
Result
You understand when reflection pays off and when it's just an expensive way to get the same answer. You know the critic needs a fresh context, the loop needs a hard cap, and the critic prompt needs structure plus an explicit escape hatch.