LLM Settings
Temperature, Top-p & more
The Problem: You ask ChatGPT the same question twice, but get different answers. Sometimes creative, sometimes dry. What's going on?
The Solution: Control Knobs on a Mixing Console
Imagine a DJ mixing console with sliders. Each slider affects the sound differently: bass, treble, volume. LLMs have similar "sliders" that control how text is generated during inference!
How to Choose Settings?
The two most important knobs are temperature and Top-P. Different tasks need different settings:
- Math problems: temperature = 0 (need exact answers, no creativity)
- Story writing: temperature = 0.8-1.2 (need imagination)
- Code generation: temperature = 0.2-0.4 (creative but not crazy)
- Brainstorming: temperature = 1.5+ with high presence penalty
Think of it like a DJ mixing console with control sliders:
- 1. Temperature (0-2): "creativity knob". Low = predictable and focused. High = wild and creative
- 2. Top-P (0-1): "vocabulary width". Low = only the safest words. High = considers rare options too
- 3. Max Tokens: "response length limit". How many words the model can generate
- 4. Frequency Penalty: "repetition punisher". Makes the model avoid repeating the same words
- 5. Presence Penalty: "new topic encourager". Pushes the model to bring up new topics
Where Is This Used?
Fun Fact: Temperature = 0 doesn't guarantee identical answers! There are other sources of randomness in the system. For truly deterministic output, you also need to set a fixed "seed" value.
Try It Yourself!
Move the sliders and watch how the model's behavior changes. Try temperature 0 vs 2 — the difference is dramatic!
Move the sliders and watch how each parameter affects next-token selection and generation in real time.
Click a chip to activate a stop sequence
More creativity — less likely words get a chance
{
"temperature": 0.7,
"top_p": 0.90,
"max_tokens": 10Temperature and Top-P work together: low temperature (0.1-0.3) for facts and code, medium (0.5-0.7) for most tasks, high (0.8-1.2) for creative work. Avoid temperature > 1.5 — responses become nonsensical.
Try it yourself
Interactive demo of this technique
Write a function to calculate the factorial of a number
def factorial(n):
if n == 0:
return 1
return n * factorial(n-1)
def factorial(n: int) -> int:
"""Calculate factorial of n.
Args:
n: Non-negative integer
Returns:
Factorial of n
Raises:
ValueError: If n is negative
"""
if n < 0:
raise ValueError("n must be non-negative")
if n == 0 or n == 1:
return 1
return n * factorial(n - 1)
Low temperature (0.1) for code ensures more consistent, documented, and safe results.
Create a free account to solve challenges
4 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path