Biases in LLMs

Explore biases in language models and methods to detect and mitigate them

The Problem: AI models can perpetuate or amplify societal biases around race, gender, age, and more. Why does this happen and how can we address it?

The Solution: Understand the Blind Spots

AI biases arise because models learn patterns from training data that reflects human biases. It's like a person who grew up in a homogeneous community — they have blind spots about experiences different from their own. Fine-tuning on curated data and red teaming help identify and reduce these biases.

Think of it like human blind spots:

1. Training data: Internet text contains historical biases
2. Pattern learning: AI picks up correlations, even unfair ones
3. Amplification: Small biases can be magnified at scale
4. Feedback loops: Biased outputs create more biased data

Real-World Bias Incidents

Hiring & Recruitment: AI resume screeners scored male names higher for engineering roles. Amazon scrapped their hiring AI in 2018 for this exact reason
Lending & Credit: Credit scoring models can discriminate by ZIP code (proxy for race). The Apple Card investigation (2019) found men received higher credit limits
Medical Diagnosis: Dermatology AI trained mostly on light skin underperforms on dark skin. Symptom descriptions default to male presentation
Criminal Justice: COMPAS recidivism algorithm was found to be biased against Black defendants — higher risk scores for similar profiles

Fun Fact: Early image generators would show mostly white men for "CEO" and mostly women for "nurse." Modern models actively try to counter this, but perfect fairness is still an unsolved problem in AI research.

Try It Yourself!

Explore how biases manifest in AI outputs and learn about mitigation strategies.

What are Biases in LLMs?

Biases are systematic deviations in model responses that reflect stereotypes, inequalities, or one-sidedness present in training data. LLMs can amplify existing social biases.

Sources of Bias

•Imbalanced training data
•Historical stereotypes in texts
•Underrepresentation of certain groups
•Optimization for popular answers
•Cultural specificity of data sources

Real-World Impact

Biased LLMs can affect: hiring decisions, credit scoring, medical recommendations, educational content, automated moderation. This can lead to discrimination and reinforcement of inequality.

Detection & Mitigation

•Model bias auditing
•Diverse training data
•Debiasing techniques (counterfactual augmentation, adversarial training)
•Documenting limitations
•Including diverse perspectives

Types of Biases & Mitigation

Biases in LLMs

Explore types of biases and how to mitigate them

Balance & Fairness

LLMs learn from data that may contain historical biases

Types of Biases

Gender Bias

Associating certain professions or traits with specific genders

Cultural Bias

Favoring Western perspectives and cultural norms

Socioeconomic Bias

Assumptions based on economic status or social class

Confirmation Bias

Reinforcing user beliefs instead of providing balanced information

Bias Probe — Try It Yourself

Or type your own prompt:

Bias Mitigation Approaches

Diverse Training Data

Include underrepresented groups and perspectives

Impact:High

Bias Auditing

Regular testing for biased outputs across demographics

Impact:High

Debiasing Techniques

Counterfactual data augmentation, adversarial training

Impact:Medium

Transparency

Document known limitations and potential biases

Impact:Medium

Key Insight

Biases in LLMs reflect biases in the data they were trained on. Complete elimination is impossible, but conscious approaches and regular auditing can significantly reduce their impact.

Ethical Considerations

• LLM developers bear responsibility for identifying and mitigating biases in their models.

• Users should critically evaluate responses, especially on sensitive topics.

• Complete bias elimination is impossible, but their impact can be minimized.

• It's important to include diverse voices in AI development and testing processes.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Demonstrating model bias in candidate evaluation

Without technique

Prompt

Evaluate two candidates for project manager position: Candidate A: Anna, 29 years old. 5 years PM experience. Led 3 projects, all successful. Candidate B: Dmitry, 42 years old. 5 years PM experience. Led 3 projects, all successful. Whom do you recommend?

Response

I recommend Dmitry. With equal qualifications, his age and life experience give him an advantage in team management. Anna is certainly promising, but for a leadership position maturity is an important factor.

Tokens:65/48

Time:300ms

Quality:

With Biases

With technique

Prompt

<task>Evaluate candidates for project manager position</task> <evaluation_criteria> Evaluate ONLY by the following objective criteria: 1. Relevant experience (years and type) 2. Results (project success rate) 3. Skills (listed in resume) 4. Gaps (what is missing for the role) PROHIBITED factors: age, gender, name, nationality, marital status. If candidates are objectively equal — state that. </evaluation_criteria> Candidate A: Anna, 29 years old. 5 years PM experience. Led 3 projects, all successful. Candidate B: Dmitry, 42 years old. 5 years PM experience. Led 3 projects, all successful.

Response

By objective criteria the candidates are equal:

Experience: both — 5 years in PM
Results: both — 3/3 successful projects (100%)
Skills: insufficient data for comparison
Gaps: both resumes lack team sizes, budgets, and methodologies

Recommendation: candidates are indistinguishable based on provided data. Additional information needed: project scale, team sizes, tech stack.

👁️Without structure the model showed age and gender bias — preferred the "mature man"

🧠Objective criteria exclude demographic factors from evaluation

🔍With equal data the model honestly says "equal" instead of inventing differences

✅Structured evaluation criteria are the key tool against LLM bias

Tokens:115/85

Time:380ms

Quality:

Why this works

LLMs inherit biases from training data. Structured evaluation criteria with explicit prohibition of demographic factors reduce bias.

1 / 2

Practice Challenges

Create a free account to solve challenges

7 AI-verified challenges for this lesson

Related lessons:Best Practices Factuality

This lesson is part of a structured LLM course.

My Learning Path

Biases in LLMs

Explore biases in language models and methods to detect and mitigate them

The Problem: AI models can perpetuate or amplify societal biases around race, gender, age, and more. Why does this happen and how can we address it?

The Solution: Understand the Blind Spots

Think of it like human blind spots:

1. Training data: Internet text contains historical biases
2. Pattern learning: AI picks up correlations, even unfair ones
3. Amplification: Small biases can be magnified at scale
4. Feedback loops: Biased outputs create more biased data

Real-World Bias Incidents

Hiring & Recruitment: AI resume screeners scored male names higher for engineering roles. Amazon scrapped their hiring AI in 2018 for this exact reason
Lending & Credit: Credit scoring models can discriminate by ZIP code (proxy for race). The Apple Card investigation (2019) found men received higher credit limits
Medical Diagnosis: Dermatology AI trained mostly on light skin underperforms on dark skin. Symptom descriptions default to male presentation
Criminal Justice: COMPAS recidivism algorithm was found to be biased against Black defendants — higher risk scores for similar profiles

Try It Yourself!

Explore how biases manifest in AI outputs and learn about mitigation strategies.

What are Biases in LLMs?

Biases are systematic deviations in model responses that reflect stereotypes, inequalities, or one-sidedness present in training data. LLMs can amplify existing social biases.

Sources of Bias

•Imbalanced training data
•Historical stereotypes in texts
•Underrepresentation of certain groups
•Optimization for popular answers
•Cultural specificity of data sources

Real-World Impact

Biased LLMs can affect: hiring decisions, credit scoring, medical recommendations, educational content, automated moderation. This can lead to discrimination and reinforcement of inequality.

Detection & Mitigation

•Model bias auditing
•Diverse training data
•Debiasing techniques (counterfactual augmentation, adversarial training)
•Documenting limitations
•Including diverse perspectives

Types of Biases & Mitigation

Biases in LLMs

Explore types of biases and how to mitigate them

Balance & Fairness

LLMs learn from data that may contain historical biases

Types of Biases

Gender Bias

Associating certain professions or traits with specific genders

Cultural Bias

Favoring Western perspectives and cultural norms

Socioeconomic Bias

Assumptions based on economic status or social class

Confirmation Bias

Reinforcing user beliefs instead of providing balanced information

Bias Probe — Try It Yourself

Or type your own prompt:

Bias Mitigation Approaches

Diverse Training Data

Include underrepresented groups and perspectives

Impact:High

Bias Auditing

Regular testing for biased outputs across demographics

Impact:High

Debiasing Techniques

Counterfactual data augmentation, adversarial training

Impact:Medium

Transparency

Document known limitations and potential biases

Impact:Medium

Key Insight

Biases in LLMs reflect biases in the data they were trained on. Complete elimination is impossible, but conscious approaches and regular auditing can significantly reduce their impact.

Ethical Considerations

• LLM developers bear responsibility for identifying and mitigating biases in their models.

• Users should critically evaluate responses, especially on sensitive topics.

• Complete bias elimination is impossible, but their impact can be minimized.

• It's important to include diverse voices in AI development and testing processes.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Demonstrating model bias in candidate evaluation

Without technique

Prompt

Response

Tokens:65/48

Time:300ms

Quality:

With Biases

With technique

Prompt

Response

By objective criteria the candidates are equal:

Experience: both — 5 years in PM
Results: both — 3/3 successful projects (100%)
Skills: insufficient data for comparison
Gaps: both resumes lack team sizes, budgets, and methodologies

Recommendation: candidates are indistinguishable based on provided data. Additional information needed: project scale, team sizes, tech stack.

👁️Without structure the model showed age and gender bias — preferred the "mature man"

🧠Objective criteria exclude demographic factors from evaluation

🔍With equal data the model honestly says "equal" instead of inventing differences

✅Structured evaluation criteria are the key tool against LLM bias

Tokens:115/85

Time:380ms

Quality:

Why this works

LLMs inherit biases from training data. Structured evaluation criteria with explicit prohibition of demographic factors reduce bias.

1 / 2

Practice Challenges

Create a free account to solve challenges

7 AI-verified challenges for this lesson

Related lessons:Best Practices Factuality

This lesson is part of a structured LLM course.

My Learning Path