Lesson 6New

Decoding Strategies

Token selection methods

The Problem: The model knows that after "The weather today is" several words could fit: "sunny", "rainy", "cold", "nice"... How does it pick just one?

The Solution: Autocomplete on Your Phone

Remember how your phone suggests words when you type a message? It shows 3 options, and you tap one. LLMs work similarly, but they choose automatically! Decoding is the final step of inference where a probability distribution becomes an actual token.

Why Does This Matter?

The decoding strategy determines the "personality" of the model's output:

Customer support bot: greedy decoding for consistent, predictable answers
Creative writing: sampling with high temperature for unexpected ideas
Code generation: beam search to find the most correct solution
Translation: beam search to maintain meaning and grammar

Think of it like phone autocomplete with different selection strategies:

1. Greedy: always pick the most likely word. Fast, but boring — texts become repetitive
2. Random (sampling): roll a dice among top options. More creative, but can go off-topic
3. Top-K: consider only the K most likely words, then pick randomly. A balance between control and creativity
4. Top-P (nucleus): consider words until their total probability reaches P%. Smart adaptation to context
5. Beam Search: try several paths in parallel, choose the best overall. Used for translation and important tasks

Where Is This Used?

Fun Fact: ChatGPT uses Top-P sampling with P=1 by default, meaning it considers almost all options but favors more likely ones. That's why the same question can get different answers each time!

Try It Yourself!

See how different decoding strategies affect the generated text. Try greedy vs random — notice the difference!

Decoding Strategies

How does the model choose the next token? Different strategies produce different outputs from the same probability distribution.

Select Strategy

Greedy

Always pick the most likely token. Deterministic but can be repetitive.

Token Probability Distribution

Paris

45%

the

20%Excluded

France

12%Excluded

city

8%Excluded

6%Excluded

capital

4%Excluded

beautiful

3%Excluded

known

2%Excluded

Eligible: 1 / 8

Strategy Comparison

Strategy	Deterministic	Diversity	Use Case
Greedy	✓	Low	Facts, code
Beam Search	✓	Medium	Translation, summarization
Temperature	✗	Adjustable	Everything
Top-k	✗	Medium	Creative
Top-p	✗	Adaptive	Chat, creative

Key Insight

Greedy is best for factual queries. Top-p (nucleus) sampling is the most popular for creative tasks — it adapts to each distribution rather than using a fixed cutoff like Top-k.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerWriting

Write the opening of a sci-fi story

Without technique

Prompt

Write the opening of a sci-fi story about space. (temperature: 0, top_p: 1.0)

Response

The spaceship "Hope" flew through the vast expanse of space. Captain Johnson sat in his chair and looked at the stars. A long journey lay ahead. The crew was preparing to land on a new planet.

Tokens:20/45

Time:420ms

Quality:

With Decoding

With technique

Prompt

Write the opening of a sci-fi story about space. Surprise me with an unusual plot twist. (temperature: 0.9, top_p: 0.95)

Response

The beacon on Enceladus went silent on Thursday. Not that anyone noticed — automated stations go quiet and reboot all the time. But this beacon had been transmitting the same phrase for 340 years straight. And now it started transmitting a different one. In a language that did not exist in humanity's databases.

👁️Greedy decoding (temperature=0) always picks the most likely token → predictable, template text

🧠Nucleus sampling (top_p=0.95, temperature=0.9) lets less likely tokens "break through" → unexpected twists

✅For creative tasks randomization = creativity. For factual tasks — greedy decoding is more reliable

Tokens:25/65

Time:520ms

Quality:

Why this works

Greedy decoding (temperature=0) produces predictable text. Nucleus sampling (temperature=0.9, top_p=0.95) lets the model "take risks" — ideal for creative tasks where cliches are unacceptable.

1 / 2

Practice Challenges

Create a free account to solve challenges

4 AI-verified challenges for this lesson

Related lessons:Inference Llm Settings

This lesson is part of a structured LLM course.

My Learning Path

Lesson 6New

Decoding Strategies

Token selection methods

The Problem: The model knows that after "The weather today is" several words could fit: "sunny", "rainy", "cold", "nice"... How does it pick just one?

The Solution: Autocomplete on Your Phone

Why Does This Matter?

The decoding strategy determines the "personality" of the model's output:

Customer support bot: greedy decoding for consistent, predictable answers
Creative writing: sampling with high temperature for unexpected ideas
Code generation: beam search to find the most correct solution
Translation: beam search to maintain meaning and grammar

Think of it like phone autocomplete with different selection strategies:

1. Greedy: always pick the most likely word. Fast, but boring — texts become repetitive
2. Random (sampling): roll a dice among top options. More creative, but can go off-topic
3. Top-K: consider only the K most likely words, then pick randomly. A balance between control and creativity
4. Top-P (nucleus): consider words until their total probability reaches P%. Smart adaptation to context
5. Beam Search: try several paths in parallel, choose the best overall. Used for translation and important tasks

Where Is This Used?

Fun Fact: ChatGPT uses Top-P sampling with P=1 by default, meaning it considers almost all options but favors more likely ones. That's why the same question can get different answers each time!

Try It Yourself!

See how different decoding strategies affect the generated text. Try greedy vs random — notice the difference!

Decoding Strategies

How does the model choose the next token? Different strategies produce different outputs from the same probability distribution.

Select Strategy

Greedy

Always pick the most likely token. Deterministic but can be repetitive.

Token Probability Distribution

Paris

45%

the

20%Excluded

France

12%Excluded

city

8%Excluded

6%Excluded

capital

4%Excluded

beautiful

3%Excluded

known

2%Excluded

Eligible: 1 / 8

Strategy Comparison

Strategy	Deterministic	Diversity	Use Case
Greedy	✓	Low	Facts, code
Beam Search	✓	Medium	Translation, summarization
Temperature	✗	Adjustable	Everything
Top-k	✗	Medium	Creative
Top-p	✗	Adaptive	Chat, creative

Key Insight

Greedy is best for factual queries. Top-p (nucleus) sampling is the most popular for creative tasks — it adapts to each distribution rather than using a fixed cutoff like Top-k.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerWriting

Write the opening of a sci-fi story

Without technique

Prompt

Write the opening of a sci-fi story about space. (temperature: 0, top_p: 1.0)

Response

The spaceship "Hope" flew through the vast expanse of space. Captain Johnson sat in his chair and looked at the stars. A long journey lay ahead. The crew was preparing to land on a new planet.

Tokens:20/45

Time:420ms

Quality:

With Decoding

With technique

Prompt

Write the opening of a sci-fi story about space. Surprise me with an unusual plot twist. (temperature: 0.9, top_p: 0.95)

Response

👁️Greedy decoding (temperature=0) always picks the most likely token → predictable, template text

🧠Nucleus sampling (top_p=0.95, temperature=0.9) lets less likely tokens "break through" → unexpected twists

✅For creative tasks randomization = creativity. For factual tasks — greedy decoding is more reliable

Tokens:25/65

Time:520ms

Quality:

Why this works

Greedy decoding (temperature=0) produces predictable text. Nucleus sampling (temperature=0.9, top_p=0.95) lets the model "take risks" — ideal for creative tasks where cliches are unacceptable.

1 / 2

Practice Challenges

Create a free account to solve challenges

4 AI-verified challenges for this lesson

Related lessons:Inference Llm Settings

This lesson is part of a structured LLM course.

My Learning Path