Reasoning Models
Models that think before answering
The Problem: Regular LLMs generate answers token by token, always moving forward. They can't pause, reconsider, or try a different approach. For simple questions this works fine, but for complex math, coding, and logic — they often fail on the first attempt.
The Solution: Let the Model Think
Reasoning models are LLMs that spend extra compute “thinking” before answering. Unlike regular models that generate the first plausible response, reasoning models produce internal chain-of-thought tokens — exploring hypotheses, checking their work, and backtracking when needed. These thinking tokens are billed but not shown in the final output. The key insight: accuracy improves logarithmically with the number of thinking tokens allowed — more “scratch paper” = better answers on hard problems.
Think of it like solving a math exam with scratch paper vs in your head:
- 1. Receive the problem: User sends a complex question — math, code, or multi-step reasoning
- 2. Generate thinking tokens: Model produces internal reasoning: exploring approaches, checking intermediate results, backtracking on dead ends
- 3. Self-verification: Model re-reads its own reasoning, catches errors, and corrects them before finalizing — like proofreading a draft
- 4. Output final answer: Only the clean, verified answer is returned to the user. Thinking tokens are hidden (but billed)
Thinking tokens are billed as output tokens but hidden from the user. A response showing 500 output tokens may have consumed 2,000+ thinking tokens internally — check your costs!
Where Reasoning Models Shine
- Complex Math & Science: Competition-level math (AIME, USAMO), physics derivations, formal proofs
- Advanced Coding: Multi-file refactors, algorithm design, debugging complex systems
- Multi-step Planning: Agent workflows, strategic decisions with trade-offs, architecture design
- Analysis & Reasoning: Legal document analysis, scientific paper review, complex data interpretation
Fun Fact: DeepSeek R1 was trained using pure reinforcement learning (GRPO) without any human-written reasoning examples. The model spontaneously developed chain-of-thought behavior — researchers observed "aha moments" where the model learned to re-evaluate and correct itself. The paper was published in Nature.
Try It Yourself!
Try the interactive comparison below — see how a reasoning model breaks down a problem step by step, while a regular model answers immediately.
Reasoning Model Landscape (2025)
OpenAI o1 / o3 / o4-mini
Pioneer of reasoning models. o1 (Sep 2024), o3 (2025), o4-mini (Apr 2025). Up to 200K context.
DeepSeek R1
Open-weights reasoning model trained with pure RL (GRPO). Published in Nature. Shows reasoning emerges without supervised examples.
Claude (Extended Thinking)
Anthropic's approach: adjustable "thinking budget" controls how many tokens Claude spends reasoning. Adaptive thinking in Claude 4.
Gemini 2.5 Flash / Pro (Thinking)
Google's hybrid reasoning: thinking can be toggled on/off with a budget (0-24K tokens). Deep Think mode for Pro on hardest problems.
When NOT to Use Reasoning Models
- Simple Q&A, translation, summarization — regular models are faster and cheaper
- Real-time chat — thinking tokens add latency (seconds to minutes)
- High-volume, low-complexity tasks — costs add up quickly from hidden thinking tokens
Don't do this
“Think step by step. First analyze the problem, then break it down into parts...”
→ Redundant! The model already thinks internally. Manual CoT can confuse it.
Do this instead
“Solve: find all primes p such that p² + 2 is also prime. Prove your answer.”
→ Direct and clear. Let the model decide HOW to think.
This lesson is part of a structured LLM course.
My Learning Path