Model Selection Guide
Choosing the right model
The Problem: There are dozens of LLMs available — GPT-5, Claude, o3, Gemini, Llama, DeepSeek, and more. Plus reasoning models that think before answering. How do you choose the right model for your specific use case?
The Solution: Choose the Right Tool for the Job
Model selection involves matching your requirements (speed, cost, accuracy, capabilities) with the right model. It's like choosing a vehicle — sometimes you need a sports car, sometimes a truck, sometimes a bicycle. Use benchmarks to compare quality, and balance latency against cost.
Think of it like choosing a vehicle for different tasks:
- 1. Latency < 500ms AND quality critical: Use GPT-4o or Claude Sonnet — best balance of speed and intelligence
- 2. Cost < $0.01/request AND simple task: Use GPT-4o Mini or Claude Haiku — 10-20x cheaper, great for classification, extraction, FAQ
- 3. Context > 100K tokens: Use Claude (200K) or Gemini (1M+) — other models require document chunking
- 4. Complex math / logic / hard reasoning: Use reasoning models (o3, o4-mini) — they use thinking tokens for step-by-step reasoning, but cost more due to hidden token usage
- 5. On-premise / data privacy required: Use Llama or Mistral — open-weight models you can host yourself
- 6. Always: test on YOUR data: Run 50-100 real examples through each candidate model before committing — benchmarks lie, your evals don't
Key Selection Criteria
- Quality: Benchmark scores (MMLU, HumanEval) matter less than eval on YOUR data — always test with real examples from your domain
- Model Routing: Use a lightweight classifier to route easy tasks (FAQ, extraction) to cheap models and hard tasks (reasoning, coding) to flagship models — saves 60-80% with minimal quality loss
- Cost vs Latency: Flagship models are 10-30x more expensive and 2-5x slower — justify the upgrade with measurable quality difference on your evals
- Context Window: Need 100K+ tokens? Only Claude (200K) and Gemini (1M+) support it natively — others require chunking strategies
Fun Fact: A/B testing model routing in production showed that sending 80% of support tickets to Haiku saved 85% of costs with only a 2% quality drop. The remaining 20% of complex cases went to Sonnet — total cost reduction of 70% with near-identical user satisfaction.
Try It Yourself!
Explore different models and their trade-offs for various use cases.
Select your use case:
| Model | Context | Price (in) | Vision | Tools | Quality | Best for |
|---|---|---|---|---|---|---|
GPT-5 OpenAI | 400K | $1.25/1M | Top | General purposeAgents | ||
Claude Opus 4.5 Anthropic | 200K | $15.00/1M | Top | ResearchComplex analysis | ||
Claude Sonnet 4 Anthropic | 200K | $3.00/1M | Top | CodingAnalysis | ||
o3 OpenAI | 200K | $2.00/1M | Top | Complex reasoningMath | ||
GPT-4o OpenAI | 128K | $2.50/1M | High | ChatVision | ||
Gemini 2.5 Pro Google | 1M | $1.25/1M | High | Long documentsReasoning | ||
DeepSeek V3OSS DeepSeek | 128K | $0.27/1M | High | Budget projectsCoding | ||
Qwen 2.5 72BOSS Alibaba | 128K | Self-hosted | High | Asian languagesSelf-hosted | ||
Mistral Large 2 Mistral | 128K | $2.00/1M | High | EU complianceCost-effective | ||
Llama 3.3 70BOSS Meta | 128K | Self-hosted | High | Privacy-sensitiveFine-tuning | ||
o4-mini OpenAI | 200K | $1.10/1M | High | Budget reasoningMath | ||
Gemini 2.5 Flash Google | 1M | $0.30/1M | Medium | High volumeLong documents | ||
GPT-4o mini OpenAI | 128K | $0.15/1M | Medium | High volumeSimple tasks | ||
Claude 3.5 Haiku Anthropic | 200K | $0.80/1M | Medium | ClassificationSimple tasks |
Quick Decision
- Need the best: Claude Opus 4.5 / GPT-5
- Hard reasoning: o3 / o4-mini
- Best for coding: Claude Sonnet 4
- Save money: DeepSeek V3 / GPT-4o mini / Gemini 2.5 Flash
- Long docs: Gemini 2.5 Pro (1M tokens)
- Privacy: Llama 3.3 / Qwen 2.5 (self-hosted)
Try it yourself
Interactive demo of this technique
Choose the right model for support ticket categorization task
This request belongs to the "auth" (authentication) category. The user is experiencing difficulty logging in due to a lost password. This is a typical authentication issue that is resolved through the password reset mechanism. I recommend sending the user a password recovery link and checking if the account is locked.
auth
Not every task requires the most powerful model. For simple classification, a lightweight model gives the same result 15x faster and 100x cheaper.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path