Reflection: The Agent That Critiques Itself

Agents trust their first answer too much — and fail exactly where a human would reread and fix. Reflection forces the model to draft first, then critique, then revise — and quality jumps significantly.

IntermediateAI Agents20 minClaude API, Any LLM, Prompt chaining

Draft → critic → revise: three chairs of one agent

Picture a writer who first drafts quickly, then swaps to an editor's chair and brutally marks problems, then returns to the desk and rewrites. One person, three roles, three different modes of thinking. Reflection does the same with an LLM: the same model plays generator, critic, and editor in turn. Why it works: an LLM is far better at spotting problems in finished text than avoiding them upfront. Generation and evaluation are different cognitive tasks, and separating them in the prompt gives the model a chance to see its own work with fresh eyes. The first draft is usually mediocre — and that's a feature, not a bug.

Request

Draft

Critique

Revise

Done

The first draft is always bad — and that's fine. Reflection's job isn't to write perfectly on the first try, but to improve systematically.

When reflection helps, and when it just burns tokens

Reflection isn't free — every critique-and-revise cycle is extra model calls. It pays off where the 'correct' answer can't be formalized in advance: code, long-form writing, analysis, complex reasoning. For tasks with short or tightly constrained outputs — classification, yes/no, field extraction — reflection just burns money and time. The most reliable test: can you check the result programmatically? If yes — write an eval or unit test. If 'correct' is defined only by human taste — reflection is your tool. Between these poles lies a grey zone (creative writing) where the critic is subjective, and iterations may not converge to anything better.

Where does reflection pay off?

Generating code with non-trivial logic

Long-form analysis or research report

Yes/no or single-class classification

Field extraction on a fixed schema

One-to-two-sentence summary

Creative writing — helps, but critic is subjective

If a task can be verified programmatically — reflection is overkill, use an eval. Reflection is for tasks where 'correct' can't be formalized.

The critic should be a different model — or at least a different prompt

Counter-intuitive point: the same model with the same context rarely finds its own mistakes. It's committed to its answer and tends to confirm its decisions — pure confirmation bias. The naive 'now check yourself' loop usually returns 'looks fine'. Two working approaches. First — a different model as the critic (Haiku generates, Sonnet critiques, or vice versa). Fresh perspective, different training data, fewer overlapping blind spots. Second — same model, but in a new session with no history: the prompt isn't 'review your answer' but 'you are a strict reviewer, here is someone else's text, find problems'. Role plus fresh context fools confirmation bias about as well as one model can.

❌ Same prompt, same context

Model already committed to the answer
Confirmation bias — "looks fine"
Finds cosmetics, misses substance

✅ Fresh context, critic role

No attachment to the prior answer
Explicit mandate to find problems
Finds real bugs and logical gaps

The infinite loop is reflection's biggest enemy

Without a hard limit the loop turns into a spiral: the critic finds three problems, the editor fixes them, a new critic finds three more in the fresh text, and so on. Every iteration looks meaningful — the critic will always say something if you ask. But quality stops rising after 2-3 passes: the curve flatlines while the token counter keeps climbing. Empirical rule: 2-3 iterations for code and summarization, up to 4 for deep analysis. Beyond that you're paying for stylistic rearrangement. Three stopping mechanisms work together. Hard cap — a fixed iteration limit, no exceptions. Quality threshold — the critic returns a structured response with an issue counter, and when high-severity issues = 0, the loop stops. Time/token budget — an overall limit per task so one hard question doesn't eat your daily quota. A separate trap: if on the third pass the critic is still finding 'problems' — the critic prompt is usually too strict, not the text actually bad. Log every iteration: what the critic found, what the editor changed, whether it got better.

Log every iteration. If on the 3rd pass the critic is still finding 'problems' — the critic prompt is too strict, not the code bad.

The critic prompt isn't "review it" — it's "find exactly 3 problems"

The main lever for reflection quality is the critic prompt, not the editor's. Vague 'review this code' gives a vague answer: 'looks fine', 'seems to work', 'could be more readable'. None of that turns into a fix. A structured requirement — 'find exactly N issues with severity and line number' — forces specificity, and you get an actionable list. Two techniques work especially well. First — fix the output format upfront (JSON with an issues array, each with severity and a fix suggestion). Second — explicitly allow the critic to say 'no issues found': without that escape hatch the model invents fake problems just to 'complete the task'.

❌ Плохо:
  "Проверь этот код. Есть ли ошибки?"
  → Claude: "выглядит нормально"

✅ Хорошо:
  "Найди до 3 конкретных проблем в этом коде.
   Для каждой: [строка] [severity: low/med/high] [как исправить].
   Если проблем нет — верни {"done": true, "issues": []}."
  → Claude: структурированный список с severity и fix-предложением

Return the critique as JSON with severity levels. Stop the reflection loop when high-severity issues = 0, not when the critic simply goes quiet.

Result

You understand when reflection pays off and when it's just an expensive way to get the same answer. You know the critic needs a fresh context, the loop needs a hard cap, and the critic prompt needs structure plus an explicit escape hatch.

All Recipes

Reflection: The Agent That Critiques Itself

IntermediateAI Agents20 minClaude API, Any LLM, Prompt chaining

Draft → critic → revise: three chairs of one agent

Request

Draft

Critique

Revise

Done

The first draft is always bad — and that's fine. Reflection's job isn't to write perfectly on the first try, but to improve systematically.

When reflection helps, and when it just burns tokens

Where does reflection pay off?

Generating code with non-trivial logic

Long-form analysis or research report

Yes/no or single-class classification

Field extraction on a fixed schema

One-to-two-sentence summary

Creative writing — helps, but critic is subjective

If a task can be verified programmatically — reflection is overkill, use an eval. Reflection is for tasks where 'correct' can't be formalized.

The critic should be a different model — or at least a different prompt

❌ Same prompt, same context

Model already committed to the answer
Confirmation bias — "looks fine"
Finds cosmetics, misses substance

✅ Fresh context, critic role

No attachment to the prior answer
Explicit mandate to find problems
Finds real bugs and logical gaps

The infinite loop is reflection's biggest enemy

Log every iteration. If on the 3rd pass the critic is still finding 'problems' — the critic prompt is too strict, not the code bad.

The critic prompt isn't "review it" — it's "find exactly 3 problems"

❌ Плохо:
  "Проверь этот код. Есть ли ошибки?"
  → Claude: "выглядит нормально"

✅ Хорошо:
  "Найди до 3 конкретных проблем в этом коде.
   Для каждой: [строка] [severity: low/med/high] [как исправить].
   Если проблем нет — верни {"done": true, "issues": []}."
  → Claude: структурированный список с severity и fix-предложением

Return the critique as JSON with severity levels. Stop the reflection loop when high-severity issues = 0, not when the critic simply goes quiet.

Reflection: The Agent That Critiques Itself

Draft → critic → revise: three chairs of one agent

When reflection helps, and when it just burns tokens

Where does reflection pay off?

The critic should be a different model — or at least a different prompt

❌ Same prompt, same context

✅ Fresh context, critic role

The infinite loop is reflection's biggest enemy

The critic prompt isn't "review it" — it's "find exactly 3 problems"

Result

Related Theory

Reflection: The Agent That Critiques Itself

Draft → critic → revise: three chairs of one agent

When reflection helps, and when it just burns tokens

Where does reflection pay off?

The critic should be a different model — or at least a different prompt

❌ Same prompt, same context

✅ Fresh context, critic role

The infinite loop is reflection's biggest enemy

The critic prompt isn't "review it" — it's "find exactly 3 problems"

Result

Related Theory