Question Answering
Q&A systems
The Problem: Users have questions, and they expect accurate, helpful answers quickly. How can AI provide reliable question-answering at scale?
The Solution: An Expert Hotline
Question answering (QA) is the task of taking a natural-language question and returning a direct, correct answer instead of a list of links or a long document for the user to read. A large language model does this by encoding the question and any supporting text into vectors, attending over that context, and generating the answer token by token. The crucial design decision is where the facts come from, and that splits QA into two modes that behave very differently in production.
Open-book vs. closed-book
In closed-book QA the model answers from its parameters alone — the knowledge baked in during training. This is fast and needs no extra infrastructure, and it works well for stable general knowledge (“What is the capital of France?”). But it has a hard ceiling: the model cannot know anything after its training cutoff, it has no access to your private data, and when it is unsure it tends to hallucinate — stating a wrong fact with full confidence. In open-book QA you instead use RAG (retrieval-augmented generation): you convert documents into embeddings, store them in a vector index, retrieve the passages most relevant to the question, and paste them into the prompt as context. The model then answers from those passages and cites them — a property called grounding. Open-book is the right choice whenever answers must be current, domain-specific, or verifiable.
Tradeoffs, pitfalls, and a worked example
Each mode has costs. Closed-book is cheap but unreliable for facts; open-book is reliable but only as good as your retrieval — if the right passage is never retrieved, the model either guesses or, ideally, says “I don’t know.” Common pitfalls are noisy chunks that drown the real answer, questions that need information from several documents (multi-hop), and answers that sound grounded but quietly mix in unsupported claims. Mitigations include better chunking, re-ranking retrieved passages, and asking the model to quote its sources. Chain-of-thought prompting helps with multi-step questions in both modes. For example, ask “What were the main causes of the 2008 financial crisis?”: a closed-book model produces a plausible essay you must take on trust, while an open-book system retrieves passages on subprime lending, securitization, and rating-agency failures, and answers with citations you can click and check — which is exactly why production QA over real knowledge almost always uses retrieval.
Think of it like an expert hotline:
- 1. User asks a question: "What were the main causes of the 2008 financial crisis?"
- 2. Decide: open-book or closed-book: Does this need retrieved docs (factual, recent, domain-specific) or model knowledge (general, stable)?
- 3. If open-book: search and retrieve: Search documents, chunk relevant sections, retrieve the best matches
- 4. Generate a grounded answer: Synthesize a clear answer with citations pointing to source documents
- 5. Verify accuracy: Cross-check claims against sources — flag anything unsupported as uncertain
Where Is This Used?
- Customer Support: Answering FAQs automatically
- Search Engines: Direct answers instead of just links
- Education: Tutoring and homework help
- Enterprise: Internal knowledge base queries
- Closed-Book Risk: When the model answers from training data alone, it may hallucinate facts, provide outdated info, or confidently state something wrong — always verify critical answers against sources
Fun Fact: The best Q&A systems combine LLMs with retrieval (RAG)! The LLM understands the question and generates natural answers, while retrieval ensures the information is accurate and up-to-date.
Try It Yourself!
Use the interactive example below to ask questions and see how AI formulates accurate, helpful responses.
Product: CloudSync Pro Version: 3.2.1 Release Date: January 2024 Features: - Automatic file synchronization across devices - End-to-end encryption (AES-256) - Maximum file size: 10GB - Supported formats: All file types - Storage limit: 2TB per account Pricing: - Free tier: 15GB storage, 3 devices - Pro: $9.99/month, 500GB, unlimited devices - Enterprise: Custom pricing, 2TB+, priority support Known Issues: - Large files (>5GB) may take longer to sync - Some antivirus software may interfere with sync agent
What encryption does CloudSync use?
CloudSync Pro uses end-to-end AES-256 encryption.
- • Extractive — copies the answer directly from the text. Most reliable, but limited to what’s stated.
- • Abstractive — reasons from context to form a new answer. Flexible, but risk of hallucination.
- • Multi-hop — combines facts from different parts of the document. Powerful, but hardest for the model.
Frequently asked questions
What's the difference between extractive and abstractive QA?
Extractive QA finds and returns exact text spans from source documents. Abstractive QA synthesizes new answers by reasoning over the source material, potentially combining information from multiple passages.
What is multi-hop reasoning?
Multi-hop requires combining facts from multiple sources or passages to derive an answer. For example: 'Who is older, the CEO of Apple or the CEO of Microsoft?' requires finding both CEOs and their birth dates separately.
How do I reduce hallucinations in QA answers?
Ground answers in retrieved documents (RAG), ask the model to cite sources, use low temperature, and add instructions like 'If the answer isn't in the context, say I don't know'.
When do I need RAG vs in-context learning?
Use RAG when the knowledge base is large (>50 pages) or frequently updated. In-context learning works when all relevant context fits in one prompt and the information is static.
Try it yourself
Interactive demo of this technique
Answer a question strictly based on the provided text
Advantages of solar energy:
- Environmentally clean energy source
- No fuel required
- Quiet operation
- Minimal maintenance
- Reduces grid dependency
- Pays off in 6-8 years in Krasnodar region
- Investment payoff: ["pays off in 6-8 years"]
- Good energy output: ["average output — 1200 kWh/year per 1 kW capacity"]
- Government support: ["the state compensates up to 30% of equipment cost"]
The text does not mention environmental benefits, maintenance costs, or comparison with other energy sources.
Citation requirement and ban on external knowledge transform a hallucinating answer with invented facts into grounded QA anchored to the text.
Create a free account to solve challenges
5 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path