Lesson 22New

Self-Refine

Generate → self-critique → revise

The Problem: A model's first draft is rarely its best. How can the same model improve its own output — with no extra training and no human feedback?

The Solution: Edit Your Own Draft

Self-Refine is a prompting technique where the same model generates an answer, critiques its own output, and revises it — looping until the result is good enough. One model plays three roles on one prompt: the generator writes a draft, the critic produces specific, actionable self-feedback, and the reviser rewrites the draft using that feedback. It needs no extra training and no human feedback, yet the original paper reports roughly 20% average gains across a range of tasks.

How the loop works

The cycle is simple: generate → self-feedback → refine, repeated. First the model produces an initial output. Then — and this is the key step — it writes concrete, specific self-critique rather than a vague "try again": "the function doesn't handle an empty list" or "the second paragraph repeats the first." Explicit, targeted feedback is what makes the next revision actually better; a bare retry just resamples and tends to reproduce the same flaws. Finally the model revises using that feedback, and the loop repeats.

When to stop — and how it relates to other techniques

You stop at a stopping criterion: a maximum number of iterations (usually 2–4) or a quality plateau where the critique stops finding meaningful issues. Over-refining can hurt: once an answer is already correct, extra rounds may add verbosity or second-guess a right answer into a wrong one, so bound the loop. Self-Refine differs from its cousins: Reflexion keeps verbal memory across episodes to learn from failures over time, while Chain of Verification narrowly fact-checks claims. Self-Refine instead polishes the quality of a single output within one episode.

Think of it like a writer editing their own draft — write, read it critically, rewrite, repeat:

1. Generate: Produce an initial output
2. Self-feedback: Give specific, concrete feedback on your own output
3. Refine: Revise the output using that feedback
4. Repeat or stop: Loop until a stopping criterion or quality plateau

Where Is This Used?

Code Fixing & Refactoring: The model reviews its own code, finds bugs and smells, and rewrites it
Writing Polish & Tone: Critiquing clarity, structure, and tone, then revising the draft
Math Correction: Re-checking each step and fixing arithmetic or logic slips
Structured Output Repair: Validating JSON/schema, then repairing fields that fail the rules

Fun Fact: Self-Refine works because the critique is explicit: a vague "improve this" barely helps, but specific feedback like "line 4 fails on an empty input" gives the reviser something concrete to fix — which is why a self-critique loop beats simply sampling the answer again.

Try It Yourself!

Use the interactive example below to watch a draft turn into a polished answer across self-refine iterations, with the quality meter rising each round.

Self-Refine — Generate, Critique, Revise

🔁 One model plays three roles — generator, critic, reviser — looping generate → self-feedback → refine until the answer is good enough.

Choose example:

Mode:

Task:

Write a one-line product tagline for a note-taking app.

Quality—

🔄 The Self-Refine loop:

Generate

Self-feedback

Refine

Stop at plateau

Key Insight

Self-feedback that is specific and actionable is what makes the revision better. In this example, single-shot lands at 45%, while the self-refine loop reaches 92% — about 20% average gains across tasks, with no extra training.

Frequently asked questions

What is Self-Refine and how does the loop work?

Self-Refine is a prompting technique where the same model generates an answer, writes specific self-feedback (self-critique) on its own output, then revises it using that feedback — repeating the generate → self-feedback → refine loop. One model plays three roles: generator, critic, and reviser. It needs no extra training and no human feedback, yet improves quality by roughly 20% on average across tasks.

How is Self-Refine different from Reflexion and Chain-of-Verification?

All three add a self-improvement loop, but they target different things. Self-Refine polishes the quality of a single output through self-critique and revision within one episode. Reflexion keeps verbal memory across multiple episodes or attempts, learning from failures over time. Chain-of-Verification focuses narrowly on fact-checking — generating verification questions and answering them independently. Use Self-Refine for iterative quality improvement, Reflexion for multi-attempt learning, and CoVe for factual accuracy.

When should you stop refining, and can it ever hurt?

Stop at a clear stopping criterion: a maximum number of iterations (usually 2–4) or a quality plateau where feedback stops finding meaningful issues. Yes, over-refining can hurt — once the answer is already correct, extra rounds may add verbosity, second-guess a right answer into a wrong one, or drift from the original requirements. Bound the loop and stop when the critique returns 'no substantive changes needed.'

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskIntermediateWriting

Write a one-line tagline for a note-taking app.

Without technique

Prompt

Write a one-line tagline for a note-taking app.

Response

"A good app for your notes."

Tokens:18/12

Time:280ms

Quality:

With techniques-self-refine

With technique

Prompt

Write a one-line tagline for a note-taking app. Then apply self-refine (max 2 iterations): 1. Generator: write a draft 2. Critic: give specific self-critique (what exactly is weak) 3. Reviser: rewrite using the critique Stop when the critique finds no meaningful issues.

Response

Draft 1: "A good app for your notes."

🔍 Self-critique: "A good app" is generic. No benefit or emotion for the reader.

Draft 2: "Capture every idea before it slips away."

🔍 Self-critique: There is a benefit now, but it can be tighter and punchier.

Final: "Never lose a thought again."

✅ The critique finds no meaningful issues — stopping.

👁️Generate: generic tagline "A good app…"

🧠Self-critique: generic, no benefit

✏️Refine: added a benefit, then tightened to a punchy line

✅Plateau: critique finds no issues — stop

Tokens:78/130

Time:760ms

Quality:

Why this works

Specific self-critique ("generic, no benefit") turns a bland draft into a punchy tagline in 2 iterations — and the loop stops at the plateau instead of running forever.

1 / 2

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

Related lessons:Reflexion Chain Of Verification

This lesson is part of a structured LLM course.

My Learning Path

Lesson 22New

Self-Refine

Generate → self-critique → revise

The Problem: A model's first draft is rarely its best. How can the same model improve its own output — with no extra training and no human feedback?

The Solution: Edit Your Own Draft

How the loop works

When to stop — and how it relates to other techniques

Think of it like a writer editing their own draft — write, read it critically, rewrite, repeat:

1. Generate: Produce an initial output
2. Self-feedback: Give specific, concrete feedback on your own output
3. Refine: Revise the output using that feedback
4. Repeat or stop: Loop until a stopping criterion or quality plateau

Where Is This Used?

Code Fixing & Refactoring: The model reviews its own code, finds bugs and smells, and rewrites it
Writing Polish & Tone: Critiquing clarity, structure, and tone, then revising the draft
Math Correction: Re-checking each step and fixing arithmetic or logic slips
Structured Output Repair: Validating JSON/schema, then repairing fields that fail the rules

Try It Yourself!

Use the interactive example below to watch a draft turn into a polished answer across self-refine iterations, with the quality meter rising each round.

Self-Refine — Generate, Critique, Revise

🔁 One model plays three roles — generator, critic, reviser — looping generate → self-feedback → refine until the answer is good enough.

Choose example:

Mode:

Task:

Write a one-line product tagline for a note-taking app.

Quality—

🔄 The Self-Refine loop:

Generate

Self-feedback

Refine

Stop at plateau

Key Insight

Frequently asked questions

What is Self-Refine and how does the loop work?

How is Self-Refine different from Reflexion and Chain-of-Verification?

When should you stop refining, and can it ever hurt?

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskIntermediateWriting

Write a one-line tagline for a note-taking app.

Without technique

Prompt

Write a one-line tagline for a note-taking app.

Response

"A good app for your notes."

Tokens:18/12

Time:280ms

Quality:

With techniques-self-refine

With technique

Prompt

Response

Draft 1: "A good app for your notes."

🔍 Self-critique: "A good app" is generic. No benefit or emotion for the reader.

Draft 2: "Capture every idea before it slips away."

🔍 Self-critique: There is a benefit now, but it can be tighter and punchier.

Final: "Never lose a thought again."

✅ The critique finds no meaningful issues — stopping.

👁️Generate: generic tagline "A good app…"

🧠Self-critique: generic, no benefit

✏️Refine: added a benefit, then tightened to a punchy line

✅Plateau: critique finds no issues — stop

Tokens:78/130

Time:760ms

Quality:

Why this works

Specific self-critique ("generic, no benefit") turns a bland draft into a punchy tagline in 2 iterations — and the loop stops at the plateau instead of running forever.

1 / 2

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

Related lessons:Reflexion Chain Of Verification

This lesson is part of a structured LLM course.

My Learning Path