Cost Optimization
Reduce API costs
The Problem: LLM API costs can spiral out of control quickly. A popular app can cost thousands per day. How do you keep AI affordable?
The Solution: Be Energy-Efficient
Cost optimization involves reducing token usage, choosing cheaper models where appropriate, and caching results. It's like managing electricity usage at home — turn off lights in empty rooms, use efficient appliances. Techniques like prompt caching and smart model selection can cut costs dramatically.
Think of it like saving electricity:
- 1. Audit current costs: Measure first! Log every request with token count and cost — you can't optimize what you don't measure
- 2. Compress system prompts: Remove filler words, reduce examples from 5 to 2-3, use bullet points instead of paragraphs — target 40-60% reduction
- 3. Add semantic cache: 60%+ of FAQ requests are near-duplicates — semantic cache finds similar questions and returns stored responses without calling the LLM
- 4. Route by complexity: 80% of tasks don't need the flagship model — use a classifier to route simple tasks to mini/haiku (10-20x cheaper)
- 5. Monitor and iterate: Set up cost dashboards, track cost-per-conversation, and review weekly — optimization is continuous, not one-time
Example: System prompt 2,000 tokens + user context 500 tokens x 10,000 requests/day x $10/1M tokens = $75/day ($2,250/month). With caching + routing: $18/day — 76% savings.
Key Strategies
- Prompt Compression: Remove filler words, shorten examples, use structured formats — a 2,000-token system prompt can often be compressed to 800 tokens with zero quality loss
- Prompt Caching: Anthropic prompt caching: first request costs 1.25x, but cached requests cost only 0.1x — a 90% discount for repeated system prompts across conversations
- Model Routing: 80% of tasks (FAQ, extraction, classification) don't need the flagship model — route them to mini/haiku and save 10-20x per request
- Semantic Caching: 60%+ of FAQ requests are near-duplicates — semantic cache matches similar (not identical) questions and returns stored responses instantly
Fun Fact: Real-world example: System prompt 2,000 tokens + user context 500 tokens at 10,000 requests/day at $10/1M tokens = $75/day ($2,250/month). After applying caching + routing + compression: $18/day — that's a 76% reduction, saving $1,710/month from just three optimizations.
Try It Yourself!
Use the cost calculator below to estimate your LLM costs and see how different optimization strategies reduce expenses.
Cost Breakdown
Per request
$0.0033
Daily
$3.25
Monthly
$97.50
Yearly
$1170
Potential Savings
Switching to Gemini 2.0 Flash could save ~$94.58/mo
Model Cost Comparison
Optimization Tips
- •Use smaller models (GPT-4o mini, Claude Haiku) for simple tasks
- •Cache repeated prompts to avoid redundant API calls
- •Batch requests when possible to reduce overhead
- •Optimize prompts: shorter = cheaper
- •Use streaming to fail fast on bad responses
- •Implement token limits to cap costs per request
Try it yourself
Interactive demo of this technique
Reduce token usage in code generation without quality loss
Of course! I'd be happy to help you! Here's a JavaScript function that does exactly what you asked:
// Function to remove duplicates from an array of numbers
function removeDuplicates(numbers) {
// Create a new array to store unique values
const uniqueNumbers = [];
// Iterate through each element of the original array
for (let i = 0; i < numbers.length; i++) {
// Check if the current element is already in the unique array
if (!uniqueNumbers.includes(numbers[i])) {
uniqueNumbers.push(numbers[i]);
}
}
return uniqueNumbers;
}
I hope this helps! If you have any additional questions, don't hesitate to ask!
const unique = (nums) => [...new Set(nums)];
A concise prompt + "code only" instruction saves up to 90% of tokens. In production with thousands of calls, that's tens of dollars per day.
Create a free account to solve challenges
4 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path