RAG — Retrieval-Augmented Generation
Retrieval Augmented Generation
The Problem: AI has a knowledge cutoff date and doesn't know your private data. How can we give AI access to current, relevant information without retraining the entire model?
The Solution: Give AI a Library Card
RAG (Retrieval-Augmented Generation) searches a knowledge base for relevant information and includes it in the prompt. It's like a librarian who looks up facts before answering your question. Instead of relying solely on what the model memorized during training (parametric knowledge), RAG grounds the response in real, up-to-date data — and dramatically reduces hallucinations.
How does the search work? RAG uses embeddings — numerical vectors that capture the meaning of text. Both your documents and the user's question are converted into these vectors. Then a similarity search finds the documents closest in meaning to the query. These results are stored in a vector database for fast retrieval.
Why does chunk size matter? Before indexing, documents are split into chunks. This is a key trade-off: chunks that are too large dilute relevance, while chunks that are too small lose important context. For example, a 50-token chunk might capture a single sentence perfectly but miss the surrounding explanation, while a 2000-token chunk gives plenty of context but may match less precisely. Finding the right balance is one of the most impactful decisions in a RAG pipeline.
Unlike fine-tuning, RAG doesn't change the model at all — it simply provides the right context at inference time. This makes it much cheaper to maintain: when your data changes, you just update the index, not retrain the model. You can also combine RAG with Chain-of-Thought for step-by-step reasoning over retrieved documents.
Think of it like a librarian with a catalog:
- 1. User asks: "What's our return policy?"
- 2. Embed the query: Convert the question into a numerical vector that captures its meaning
- 3. Retrieve: Find the most similar document chunks in the vector database
- 4. Augment: Add the retrieved chunks to the prompt as context
- 5. Generate: AI answers based on actual policy text, not guesses
Before all this, documents are pre-processed: split into chunks, converted to embeddings, and stored in a vector database. This happens once, not on every query.
Where Is This Used?
- Customer Support: Answering questions from company knowledge bases with cited sources
- Enterprise Search: Finding and summarizing information across thousands of internal documents
- Legal Research: Searching case law and regulations with semantic understanding
- Technical Docs: Chatbots that answer questions about APIs, code, and product features
Fun Fact: RAG is the #1 technique used in production AI systems today. It's much cheaper than fine-tuning: when your data changes, you just re-index the new documents instead of retraining the whole model. Companies like Notion, Stripe, and GitHub all use RAG under the hood!
Try It Yourself!
Use the interactive example below to see how retrieving relevant documents improves the quality and accuracy of AI responses.
RAG allows LLMs to use up-to-date information from external sources. Instead of relying only on trained knowledge, the model receives relevant documents and uses them to answer.
Policies, guidelines, procedures
APIs, products, technical specs
News, prices, inventory
RAG solves LLM's main problem — outdated and incomplete knowledge. The model receives current information from your documents and gives accurate answers about your company, products, and data.
Try it yourself
Interactive demo of this technique
Answer a question about internal company policy
Standard vacation entitlement varies by country and company. In the US, it's typically 10-15 days for new employees. Please check your employment contract or HR policy.
Employees are entitled to 25 days of paid vacation. An additional 2 days are granted for every 3 years of service at the company.
Without RAG the model gives general information. With RAG — exact answer for the specific company.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path