Self-Refine
Generate → self-critique → revise
The Problem: A model's first draft is rarely its best. How can the same model improve its own output — with no extra training and no human feedback?
The Solution: Edit Your Own Draft
Self-Refine is a prompting technique where the same model generates an answer, critiques its own output, and revises it — looping until the result is good enough. One model plays three roles on one prompt: the generator writes a draft, the critic produces specific, actionable self-feedback, and the reviser rewrites the draft using that feedback. It needs no extra training and no human feedback, yet the original paper reports roughly 20% average gains across a range of tasks.
How the loop works
The cycle is simple: generate → self-feedback → refine, repeated. First the model produces an initial output. Then — and this is the key step — it writes concrete, specific self-critique rather than a vague "try again": "the function doesn't handle an empty list" or "the second paragraph repeats the first." Explicit, targeted feedback is what makes the next revision actually better; a bare retry just resamples and tends to reproduce the same flaws. Finally the model revises using that feedback, and the loop repeats.
When to stop — and how it relates to other techniques
You stop at a stopping criterion: a maximum number of iterations (usually 2–4) or a quality plateau where the critique stops finding meaningful issues. Over-refining can hurt: once an answer is already correct, extra rounds may add verbosity or second-guess a right answer into a wrong one, so bound the loop. Self-Refine differs from its cousins: Reflexion keeps verbal memory across episodes to learn from failures over time, while Chain of Verification narrowly fact-checks claims. Self-Refine instead polishes the quality of a single output within one episode.
Think of it like a writer editing their own draft — write, read it critically, rewrite, repeat:
- 1. Generate: Produce an initial output
- 2. Self-feedback: Give specific, concrete feedback on your own output
- 3. Refine: Revise the output using that feedback
- 4. Repeat or stop: Loop until a stopping criterion or quality plateau
Where Is This Used?
- Code Fixing & Refactoring: The model reviews its own code, finds bugs and smells, and rewrites it
- Writing Polish & Tone: Critiquing clarity, structure, and tone, then revising the draft
- Math Correction: Re-checking each step and fixing arithmetic or logic slips
- Structured Output Repair: Validating JSON/schema, then repairing fields that fail the rules
Fun Fact: Self-Refine works because the critique is explicit: a vague "improve this" barely helps, but specific feedback like "line 4 fails on an empty input" gives the reviser something concrete to fix — which is why a self-critique loop beats simply sampling the answer again.
Try It Yourself!
Use the interactive example below to watch a draft turn into a polished answer across self-refine iterations, with the quality meter rising each round.
🔁 One model plays three roles — generator, critic, reviser — looping generate → self-feedback → refine until the answer is good enough.
Write a one-line product tagline for a note-taking app.
Self-feedback that is specific and actionable is what makes the revision better. In this example, single-shot lands at 45%, while the self-refine loop reaches 92% — about 20% average gains across tasks, with no extra training.
Frequently asked questions
What is Self-Refine and how does the loop work?
Self-Refine is a prompting technique where the same model generates an answer, writes specific self-feedback (self-critique) on its own output, then revises it using that feedback — repeating the generate → self-feedback → refine loop. One model plays three roles: generator, critic, and reviser. It needs no extra training and no human feedback, yet improves quality by roughly 20% on average across tasks.
How is Self-Refine different from Reflexion and Chain-of-Verification?
All three add a self-improvement loop, but they target different things. Self-Refine polishes the quality of a single output through self-critique and revision within one episode. Reflexion keeps verbal memory across multiple episodes or attempts, learning from failures over time. Chain-of-Verification focuses narrowly on fact-checking — generating verification questions and answering them independently. Use Self-Refine for iterative quality improvement, Reflexion for multi-attempt learning, and CoVe for factual accuracy.
When should you stop refining, and can it ever hurt?
Stop at a clear stopping criterion: a maximum number of iterations (usually 2–4) or a quality plateau where feedback stops finding meaningful issues. Yes, over-refining can hurt — once the answer is already correct, extra rounds may add verbosity, second-guess a right answer into a wrong one, or drift from the original requirements. Bound the loop and stop when the critique returns 'no substantive changes needed.'
Try it yourself
Interactive demo of this technique
Write a one-line tagline for a note-taking app.
"A good app for your notes."
Draft 1: "A good app for your notes."
🔍 Self-critique: "A good app" is generic. No benefit or emotion for the reader.
Draft 2: "Capture every idea before it slips away."
🔍 Self-critique: There is a benefit now, but it can be tighter and punchier.
Final: "Never lose a thought again."
✅ The critique finds no meaningful issues — stopping.
Specific self-critique ("generic, no benefit") turns a bland draft into a punchy tagline in 2 iterations — and the loop stops at the plateau instead of running forever.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path