Lesson 13New

Reasoning Models

Models that think before answering

The Problem: Regular LLMs generate answers token by token, always moving forward. They can't pause, reconsider, or try a different approach. For simple questions this works fine, but for complex math, coding, and logic — they often fail on the first attempt.

The Solution: Let the Model Think

Reasoning models are LLMs that spend extra compute “thinking” before answering. Unlike regular models that generate the first plausible response, reasoning models produce internal chain-of-thought tokens — exploring hypotheses, checking their work, and backtracking when needed. These thinking tokens are billed but not shown in the final output. The key insight: accuracy improves logarithmically with the number of thinking tokens allowed — more “scratch paper” = better answers on hard problems.

Think of it like solving a math exam with scratch paper vs in your head:

1. Receive the problem: User sends a complex question — math, code, or multi-step reasoning
2. Generate thinking tokens: Model produces internal reasoning: exploring approaches, checking intermediate results, backtracking on dead ends
3. Self-verification: Model re-reads its own reasoning, catches errors, and corrects them before finalizing — like proofreading a draft
4. Output final answer: Only the clean, verified answer is returned to the user. Thinking tokens are hidden (but billed)

Thinking tokens are billed as output tokens but hidden from the user. A response showing 500 output tokens may have consumed 2,000+ thinking tokens internally — check your costs!

Where Reasoning Models Shine

Complex Math & Science: Competition-level math (AIME, USAMO), physics derivations, formal proofs
Advanced Coding: Multi-file refactors, algorithm design, debugging complex systems
Multi-step Planning: Agent workflows, strategic decisions with trade-offs, architecture design
Analysis & Reasoning: Legal document analysis, scientific paper review, complex data interpretation

Fun Fact: DeepSeek R1 was trained using pure reinforcement learning (GRPO) without any human-written reasoning examples. The model spontaneously developed chain-of-thought behavior — researchers observed "aha moments" where the model learned to re-evaluate and correct itself. The paper was published in Nature.

Try It Yourself!

Try the interactive comparison below — see how a reasoning model breaks down a problem step by step, while a regular model answers immediately.

Reasoning Model Landscape (2025)

OpenAI o1 / o3 / o4-mini

Pioneer of reasoning models. o1 (Sep 2024), o3 (2025), o4-mini (Apr 2025). Up to 200K context.

DeepSeek R1

Open-weights reasoning model trained with pure RL (GRPO). Published in Nature. Shows reasoning emerges without supervised examples.

Claude (Extended Thinking)

Anthropic's approach: adjustable "thinking budget" controls how many tokens Claude spends reasoning. Adaptive thinking in Claude 4.

Gemini 2.5 Flash / Pro (Thinking)

Google's hybrid reasoning: thinking can be toggled on/off with a budget (0-24K tokens). Deep Think mode for Pro on hardest problems.

When NOT to Use Reasoning Models

Simple Q&A, translation, summarization — regular models are faster and cheaper
Real-time chat — thinking tokens add latency (seconds to minutes)
High-volume, low-complexity tasks — costs add up quickly from hidden thinking tokens

Don't do this

“Think step by step. First analyze the problem, then break it down into parts...”

→ Redundant! The model already thinks internally. Manual CoT can confuse it.

Do this instead

“Solve: find all primes p such that p² + 2 is also prime. Prove your answer.”

→ Direct and clear. Let the model decide HOW to think.

This lesson is part of a structured LLM course.

My Learning Path