Agentic RAG — Let the Agent Decide What to Retrieve

Classic RAG is a brainless conveyor: query, retrieve, answer. Agentic RAG is when the agent decides whether to search, what to search for, and whether the results are good enough. We break down how to turn a linear pipeline into a decision loop — using LangGraph and ChromaDB.

IntermediateRAG & Data30 minLangGraph, Python, ChromaDB

Simple question — conveyor. Complex one — you need a detective

'What is the return policy?' — one search, one answer, RAG handles it. 'Compare two team approaches and decide which is better for our case' — and the conveyor breaks. Not because the model is weak or the database is bad. Because the architecture cannot think: 'Did I find enough? Did I understand the question correctly?' Agentic RAG adds exactly these checkpoints. The result — an agent that searches iteratively, not blindly.

❌ Classic RAG

Query → retrieve → answer (always)
No evaluation of retrieval quality
One search for the entire answer

✅ Agentic RAG

Plan first, then retrieve
Evaluates results before answering
Can reformulate and search again

80% of RAG quality problems are not model problems. They are architecture problems: bad chunking, single-pass retrieval where iterative search is needed.

Plan → retrieve → evaluate → decide. That is the whole secret

All of Agentic RAG comes down to one loop with four steps. At each step the agent makes a decision: search more, reformulate the query, or answer. LangGraph implements this literally — graph nodes and conditional edges between them.

Plan

Retrieve

Evaluate

sufficient

Answer

insufficient

Reformulate

цикл агента:
  1. план → разбить вопрос на подзапросы
  2. поиск → найти документы по каждому подзапросу
  3. оценка → документы релевантны? (порог > 0.7)
  4. решение:
     достаточно → ответить
     мало / нерелевантно → переформулировать → п.2
     3 итерации без результата → "нет данных"

Add a hard iteration limit (e.g. 3 cycles). Without it the agent may search forever for the 'perfect answer' — especially when the knowledge base doesn't cover the topic.

Document chunking decides everything before the agent even runs

You can build a perfect agent — and it will still fail if documents are chunked wrong. The information is in the database, but split so that search cannot find it. This is the most common and most ignored cause of poor RAG. There is no universal strategy. Fixed-size breaks semantic units. Semantic chunking is more precise but expensive. Hierarchical (parent-child) is great for long documents but harder to implement. The choice depends on your documents and queries — and this decision should come before writing agent code.

Strategy	Best for	Weakness
Fixed-size	Uniform text (logs, articles)	Breaks semantic units
By paragraph	Structured documents	Loses cross-section context
Hierarchical (parent-child)	Long documents, sections	More complex to implement
Semantic	Heterogeneous content	Slower and more expensive

The "small-to-big" strategy: index small chunks for precise retrieval, but return the full parent block to the agent. Retrieval precision + sufficient context.

An agent that can't say 'I don't know' is dangerous

The most dangerous RAG mode is a confident answer based on irrelevant documents. A classic pipeline always returns something, even if it's completely off-topic. The solution is an explicit grading step before answering. A separate LLM call: 'Does this document answer the question? Yes / No / Partially'. Based on the aggregate, the agent decides: answer, search more, or honestly say 'no data'. Two signals to distinguish: 'found irrelevant content' (reformulate query) vs 'found too little' (expand search). Different problems — different actions.

для каждого найденного документа:
  вопрос: "Этот документ отвечает на вопрос?"
  ответ: да | частично | нет

  нет → отбросить
  да/частично → оставить

если осталось ≥ 2 релевантных → достаточно для ответа
если 0 → переформулировать запрос и искать заново

Don't merge relevance grading and completeness check into one prompt. Relevance: 'is this on-topic?' Completeness: 'is this enough to answer?' Two tasks — two calls.

Enough for a full answer — or only partially?

Relevance grading and completeness assessment are different things. Three documents may be on-topic, but if the key fact is missing — the answer will be incomplete. Self-reflection is the final check before generation: the agent evaluates the full picture, not individual pieces. If after 2-3 iterations the agent hasn't gathered enough — it's better to answer partially or honestly say 'no data' than to hallucinate.

Context is sufficient when...

≥2 relevant documents on the topic

Documents cover all parts of the question

No contradictions between sources

Only one document but it directly answers

After 3 iterations nothing useful — exit with 'I don't know'

If you can't explain the answer from logs — observability is missing

A black-box agent is not a feature, it's technical debt. Bad answer? You need to understand: was the problem in retrieval, grading, or the final prompt? Without traces it's reading tea leaves. LangGraph helps: each graph node is a logging point. Save at every step: original question, sub-queries, retrieved documents with grades, the agent's decision and reasoning. Add metadata to the final answer — how many iterations, how many documents were discarded. After a week of real queries, this data will show where the system breaks most often.

на каждом шаге сохраняй:
  номер_итерации, решение, причина,
  найдено_документов, из_них_релевантных

в итоговый ответ добавь:
  сколько итераций, какие подзапросы,
  сколько отброшено — "рентген" для отладки

Store traces in a database, not just logs. After a week you will see patterns: which questions need more iterations, where the agent fails most often. This is data for improving prompts and chunking.

Result

An agentic RAG agent on LangGraph with a plan → retrieve → evaluate → decide loop. It can reformulate queries, grade relevance and completeness, honestly say 'I don't know', and explain every decision through traces.

All Recipes

Agentic RAG — Let the Agent Decide What to Retrieve

IntermediateRAG & Data30 minLangGraph, Python, ChromaDB

Simple question — conveyor. Complex one — you need a detective

❌ Classic RAG

Query → retrieve → answer (always)
No evaluation of retrieval quality
One search for the entire answer

✅ Agentic RAG

Plan first, then retrieve
Evaluates results before answering
Can reformulate and search again

80% of RAG quality problems are not model problems. They are architecture problems: bad chunking, single-pass retrieval where iterative search is needed.

Plan → retrieve → evaluate → decide. That is the whole secret

Plan

Retrieve

Evaluate

sufficient

Answer

insufficient

Reformulate

цикл агента:
  1. план → разбить вопрос на подзапросы
  2. поиск → найти документы по каждому подзапросу
  3. оценка → документы релевантны? (порог > 0.7)
  4. решение:
     достаточно → ответить
     мало / нерелевантно → переформулировать → п.2
     3 итерации без результата → "нет данных"

Add a hard iteration limit (e.g. 3 cycles). Without it the agent may search forever for the 'perfect answer' — especially when the knowledge base doesn't cover the topic.

Document chunking decides everything before the agent even runs

Strategy	Best for	Weakness
Fixed-size	Uniform text (logs, articles)	Breaks semantic units
By paragraph	Structured documents	Loses cross-section context
Hierarchical (parent-child)	Long documents, sections	More complex to implement
Semantic	Heterogeneous content	Slower and more expensive

The "small-to-big" strategy: index small chunks for precise retrieval, but return the full parent block to the agent. Retrieval precision + sufficient context.

An agent that can't say 'I don't know' is dangerous

для каждого найденного документа:
  вопрос: "Этот документ отвечает на вопрос?"
  ответ: да | частично | нет

  нет → отбросить
  да/частично → оставить

если осталось ≥ 2 релевантных → достаточно для ответа
если 0 → переформулировать запрос и искать заново

Don't merge relevance grading and completeness check into one prompt. Relevance: 'is this on-topic?' Completeness: 'is this enough to answer?' Two tasks — two calls.

Enough for a full answer — or only partially?

Context is sufficient when...

≥2 relevant documents on the topic

Documents cover all parts of the question

No contradictions between sources

Only one document but it directly answers

After 3 iterations nothing useful — exit with 'I don't know'

If you can't explain the answer from logs — observability is missing

на каждом шаге сохраняй:
  номер_итерации, решение, причина,
  найдено_документов, из_них_релевантных

в итоговый ответ добавь:
  сколько итераций, какие подзапросы,
  сколько отброшено — "рентген" для отладки

Store traces in a database, not just logs. After a week you will see patterns: which questions need more iterations, where the agent fails most often. This is data for improving prompts and chunking.

Agentic RAG — Let the Agent Decide What to Retrieve

Simple question — conveyor. Complex one — you need a detective

❌ Classic RAG

✅ Agentic RAG

Plan → retrieve → evaluate → decide. That is the whole secret

Document chunking decides everything before the agent even runs

An agent that can't say 'I don't know' is dangerous

Enough for a full answer — or only partially?

Context is sufficient when...

If you can't explain the answer from logs — observability is missing

Result

Related Theory

Agentic RAG — Let the Agent Decide What to Retrieve

Simple question — conveyor. Complex one — you need a detective

❌ Classic RAG

✅ Agentic RAG

Plan → retrieve → evaluate → decide. That is the whole secret

Document chunking decides everything before the agent even runs

An agent that can't say 'I don't know' is dangerous

Enough for a full answer — or only partially?

Context is sufficient when...

If you can't explain the answer from logs — observability is missing

Result

Related Theory