Technique 10New

RAG — Retrieval-Augmented Generation

Retrieval Augmented Generation

The Problem: AI has a knowledge cutoff date and doesn't know your private data. How can we give AI access to current, relevant information without retraining the entire model?

The Solution: Give AI a Library Card

RAG (Retrieval-Augmented Generation) searches a knowledge base for relevant information and includes it in the prompt. It's like a librarian who looks up facts before answering your question. Instead of relying solely on what the model memorized during training (parametric knowledge), RAG grounds the response in real, up-to-date data — and dramatically reduces hallucinations.

How does the search work? RAG uses embeddings — numerical vectors that capture the meaning of text. Both your documents and the user's question are converted into these vectors. Then a similarity search finds the documents closest in meaning to the query. These results are stored in a vector database for fast retrieval.

Why does chunk size matter? Before indexing, documents are split into chunks. This is a key trade-off: chunks that are too large dilute relevance, while chunks that are too small lose important context. For example, a 50-token chunk might capture a single sentence perfectly but miss the surrounding explanation, while a 2000-token chunk gives plenty of context but may match less precisely. Finding the right balance is one of the most impactful decisions in a RAG pipeline.

Unlike fine-tuning, RAG doesn't change the model at all — it simply provides the right context at inference time. This makes it much cheaper to maintain: when your data changes, you just update the index, not retrain the model. You can also combine RAG with Chain-of-Thought for step-by-step reasoning over retrieved documents.

Think of it like a librarian with a catalog:

1. User asks: "What's our return policy?"
2. Embed the query: Convert the question into a numerical vector that captures its meaning
3. Retrieve: Find the most similar document chunks in the vector database
4. Augment: Add the retrieved chunks to the prompt as context
5. Generate: AI answers based on actual policy text, not guesses

Before all this, documents are pre-processed: split into chunks, converted to embeddings, and stored in a vector database. This happens once, not on every query.

Where Is This Used?

Customer Support: Answering questions from company knowledge bases with cited sources
Enterprise Search: Finding and summarizing information across thousands of internal documents
Legal Research: Searching case law and regulations with semantic understanding
Technical Docs: Chatbots that answer questions about APIs, code, and product features

Fun Fact: RAG is the #1 technique used in production AI systems today. It's much cheaper than fine-tuning: when your data changes, you just re-index the new documents instead of retraining the whole model. Companies like Notion, Stripe, and GitHub all use RAG under the hood!

Try It Yourself!

Use the interactive example below to see how retrieving relevant documents improves the quality and accuracy of AI responses.

RAG — Retrieval Augmented Generation

RAG allows LLMs to use up-to-date information from external sources. Instead of relying only on trained knowledge, the model receives relevant documents and uses them to answer.

Choose scenario:

RAG Pipeline:

Query

Retrieve

Context

Generate

Enterprise Knowledge

Policies, guidelines, procedures

Documentation

APIs, products, technical specs

Fresh Data

News, prices, inventory

Key Insight

RAG solves LLM's main problem — outdated and incomplete knowledge. The model receives current information from your documents and gives accurate answers about your company, products, and data.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Answer a question about internal company policy

Without technique

Prompt

How many vacation days are employees entitled to?

Response

Standard vacation entitlement varies by country and company. In the US, it's typically 10-15 days for new employees. Please check your employment contract or HR policy.

Tokens:15/45

Time:280ms

Quality:

With RAG

With technique

Prompt

Context from knowledge base: --- Company employees are entitled to 25 days of paid vacation. An additional 2 days are granted for every 3 years of service. Requests must be submitted 2 weeks in advance. --- Question: How many vacation days are employees entitled to? Answer based only on the provided context.

Response

Employees are entitled to 25 days of paid vacation. An additional 2 days are granted for every 3 years of service at the company.

👁️RAG added relevant document to context

🧠Answer is based on specific company policy

✅Exact numbers instead of general recommendations

Tokens:95/38

Time:320ms

Quality:

Why this works

Without RAG the model gives general information. With RAG — exact answer for the specific company.

1 / 3

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

Related lessons:Rag Structured Output

This lesson is part of a structured LLM course.

My Learning Path

Technique 10New

RAG — Retrieval-Augmented Generation

Retrieval Augmented Generation

The Problem: AI has a knowledge cutoff date and doesn't know your private data. How can we give AI access to current, relevant information without retraining the entire model?

The Solution: Give AI a Library Card

Think of it like a librarian with a catalog:

1. User asks: "What's our return policy?"
2. Embed the query: Convert the question into a numerical vector that captures its meaning
3. Retrieve: Find the most similar document chunks in the vector database
4. Augment: Add the retrieved chunks to the prompt as context
5. Generate: AI answers based on actual policy text, not guesses

Before all this, documents are pre-processed: split into chunks, converted to embeddings, and stored in a vector database. This happens once, not on every query.

Where Is This Used?

Customer Support: Answering questions from company knowledge bases with cited sources
Enterprise Search: Finding and summarizing information across thousands of internal documents
Legal Research: Searching case law and regulations with semantic understanding
Technical Docs: Chatbots that answer questions about APIs, code, and product features

Try It Yourself!

Use the interactive example below to see how retrieving relevant documents improves the quality and accuracy of AI responses.

RAG — Retrieval Augmented Generation

RAG allows LLMs to use up-to-date information from external sources. Instead of relying only on trained knowledge, the model receives relevant documents and uses them to answer.

Choose scenario:

RAG Pipeline:

Query

Retrieve

Context

Generate

Enterprise Knowledge

Policies, guidelines, procedures

Documentation

APIs, products, technical specs

Fresh Data

News, prices, inventory

Key Insight

RAG solves LLM's main problem — outdated and incomplete knowledge. The model receives current information from your documents and gives accurate answers about your company, products, and data.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Answer a question about internal company policy

Without technique

Prompt

How many vacation days are employees entitled to?

Response

Standard vacation entitlement varies by country and company. In the US, it's typically 10-15 days for new employees. Please check your employment contract or HR policy.

Tokens:15/45

Time:280ms

Quality:

With RAG

With technique

Prompt

Response

Employees are entitled to 25 days of paid vacation. An additional 2 days are granted for every 3 years of service at the company.

👁️RAG added relevant document to context

🧠Answer is based on specific company policy

✅Exact numbers instead of general recommendations

Tokens:95/38

Time:320ms

Quality:

Why this works

Without RAG the model gives general information. With RAG — exact answer for the specific company.

1 / 3

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

Related lessons:Rag Structured Output

This lesson is part of a structured LLM course.

My Learning Path