Self-Consistency — Improving Accuracy via Voting
Vote for the best answer
The Problem: AI can be inconsistent — ask the same question twice and you might get different answers. How can we increase confidence in the result?
The Solution: Ask Multiple Experts
Self-Consistency means generating multiple reasoning paths and picking the most common answer. Instead of trusting one response, you ask the AI to solve the problem several times and take a "vote" on the final answer. It builds on Chain-of-Thought by sampling multiple reasoning chains rather than relying on a single one.
Think of it like consulting multiple experts:
- 1. Expert 1: "I think the answer is 42, because..."
- 2. Expert 2: "I calculated 42 using a different method..."
- 3. Expert 3: "My approach gives 38, here's why..."
- 4. Consensus: Two out of three say 42 — that's our answer!
Where Is This Used?
- Math Problems: Complex calculations where mistakes are likely
- Medical Diagnosis: Getting second and third opinions
- Code Review: Multiple analyses of potential bugs
- High-Stakes Decisions: Any task where accuracy is critical
Fun Fact: Self-consistency can boost accuracy by 5-15% on reasoning tasks! The key is to use "temperature" (randomness) so each attempt takes a slightly different path. Usually 5-10 samples are enough.
Try It Yourself!
Use the interactive example below to see how multiple reasoning paths can lead to more reliable answers through majority voting.
Instead of relying on a single answer, generate multiple reasoning paths (5-40 samples) at high temperature, then pick the most common final answer via majority voting.
Each sample produces a reasoning chain → final answer. Answers are grouped by value. The answer appearing in the most samples wins. Ties are broken by confidence or the first occurrence.
5 samples: ~5x cost, moderate improvement. 10 samples: sweet spot for most tasks. 40 samples: marginal gains. Set temperature 0.7-1.0 for diverse paths.
Best for: math, logic puzzles, commonsense reasoning, coding challenges. Not worth it for: creative writing, open-ended questions, tasks where there's no single correct answer.
🗳️ Self-Consistency — an improvement over Chain of Thought! Generate multiple different reasoning paths and choose the most frequent answer through voting. This helps avoid random errors!
A store had 12 apples and 8 oranges. They sold 5 fruits. If 3 of the sold fruits were apples, how many oranges are left?
Total fruits: 12 + 8 = 20. Sold 5, left 20 - 5 = 15. There were 8 oranges, so... about 6?
6 oranges
⚠️ Error in reasoning!
Create 5+ different reasoning paths with temperature > 0
Extract final answer from each path
Choose the most frequent answer (majority vote)
- Multi-step math problems
- Logic reasoning tasks
- When high accuracy is critical (medicine, finance)
- Questions where one error changes the whole answer
Self-Consistency works because even if one reasoning path contains an error, the correct answer appears more often in other paths. It's "wisdom of crowds" for LLMs! Downside: requires more tokens (5× calls), but accuracy improves by 10-30% for complex problems.
How to implement Self-Consistency
Self-Consistency is NOT a special prompt! It's a method for aggregating multiple responses:
- Run the same prompt multiple times
- Use temperature > 0 for diversity
- Collect answers and choose the most common
Step 1: Base prompt with CoT
Solve the task step by step:
{task}
Show your reasoning and give the answer.Regular Chain-of-Thought prompt. Nothing special yet.
Step 2: Generate multiple responses
Call the LLM 3-5 times with the same prompt, but with temperature > 0 (e.g., 0.7).
Each time you'll get different reasoning and possibly different answers. This is normal!
Step 3: Aggregation (code)
from collections import Counter
responses = [call_llm(prompt, temp=0.7) for _ in range(5)]
answers = [extract_answer(r) for r in responses]
final_answer = Counter(answers).most_common(1)[0][0]Use Counter to count votes. The most frequent answer wins!
Concrete prompt example
Task: What is 17 × 24?
Solve step by step:
1. Break down into simple operations
2. Calculate each one
3. Give the final answer
Answer:Run this prompt 5 times with temp=0.7. Collect answers. Choose the most frequent.
Summary:
- Self-Consistency = regular CoT prompt + multiple calls + voting
- The prompt does NOT change! Only the number of calls and aggregation changes
- Temperature > 0 is needed for answer diversity
- Final answer = most popular through simple counting
The technique is described in the paper "Self-Consistency Improves Chain of Thought Reasoning in Language Models" (Wang et al., 2023). Showed 1-12% improvement on arithmetic and commonsense benchmarks compared to single-path CoT.
Try it yourself
Interactive demo of this technique
There are 23 apples in a basket. If you divide them equally among 5 children, how many apples will be left?
23 ÷ 5 = 4.6, so 3 apples will be left.
Method 1 (division with remainder): 23 ÷ 5 = 4 remainder 3 Left over: 3 apples
Method 2 (subtraction): 5×4 = 20 apples distributed 23 - 20 = 3 apples left
Method 3 (verification): 5×4 + 3 = 23 ✓ Left over: 3 apples
Final answer: 3 apples (all three methods gave the same result)
Self-Consistency verifies the answer multiple ways. When all methods give the same result, we're confident it's correct.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path