AI Document Processing Pipeline: Classify → Extract → Route

Every company drowns in documents: invoices, contracts, requests, letters. AI can read them — but for this to work reliably, you need not one magic button, but a pipeline: first understand what the document is, then extract the needed data, then decide what to do with it. We break down how to build such a pipeline in n8n with the Claude API.

BeginnerAutomation20 minClaude API, n8n, Google Sheets

A pipeline beats a generalist — here is why

A single prompt that says "parse this document and decide what to do" mixes three tasks: identify the type, extract data, choose a route. If anything breaks, you cannot tell where. A pipeline separates concerns: each step does one thing and passes the result forward. You can eyeball intermediate output, localize an error in a minute, and swap out one step without touching the rest.

Incoming document

Classification

type known

Data extraction

Validation

data complete

Routing

A good sign of correct decomposition: the intermediate result can be printed and shown to an accountant. If the classification result makes sense to a human without context — you are on the right track.

Classifier: one word — no explanations allowed

Classification is the foundation of the whole pipeline: it determines which extraction prompt to use next. The principle is simple: the classifier returns exactly one word from a closed list — invoice, contract, request, unknown. No summaries, no justifications. This lets n8n switch on the value and branch without any parsing. Add one example per type directly in the prompt — that alone gives +20-30% accuracy for free.

❌ One prompt for everything

Mixes classification and extraction
Cannot debug independently
Hard to tell where error occurred

✅ Prompt pipeline

Classifier → document type only
Extractor → fields for that type only
Each step tested independently

классификатор → ОДНО слово из списка:
  invoice | contract | request | unknown

правила определения типа:
  invoice: счёт, акт, накладная с суммой
  contract: договор, соглашение, оферта
  request: заявка, обращение, запрос
  unknown: всё остальное

никаких пояснений — только тип → switch по значению

Data extraction: JSON schema and honest null

You know the document type — now you need to extract specific fields. For an invoice: amount, date, vendor. For a contract: parties, subject, term. This is exactly why classification comes first: without the type, you cannot write a precise extraction prompt. The key to reliability is demanding JSON with a strict schema. Not "tell me about the invoice" but "return JSON with fields amount, currency, vendor, due_date". The critical rule: if a field is not found — null, do not invent. Models love to hallucinate missing data, and an explicit instruction "null, do not make things up" is the only defense. In n8n the JSON result is immediately available as an object: you access fields directly in subsequent nodes, no regex or string parsing needed. This is another reason to demand a strict schema — automation breaks if the format drifts.

экстрактор для типа "invoice":
  ВХОД: текст документа
  ВЫХОД: { сумма, валюта, поставщик, дата, номер }
  ПРАВИЛО: если поле не найдено → null, не выдумывать

для каждого типа — свой набор полей:
  invoice → сумма, валюта, поставщик, дата
  contract → стороны, предмет, срок
  request → тема, автор, срочность

Ask the model to add a confidence field (0-1) for each value. If confidence < 0.7 — the document goes to manual review. A cheap filter for edge cases.

Do not remove the human — remove the boredom

Full automation from day one is a sure way to lose trust in the system after the first mistake. The rule: auto-process what AI gets right in 95%+ of cases. Everything else goes to human review. At the start this will be less than half of documents, after a month — most. Escalation signals: low model confidence, unknown type, missing critical fields. A Slack notification or email with a link to the document.

Auto-process or escalate?

Document type recognized with confidence (> 0.8)

All required fields extracted (not null)

Amount within expected range

Type is unknown or confidence < 0.8

Critical fields are null (amount, date, counterparty)

First 2 weeks — log everything and spot-check

Collect all escalation cases in a separate Google Sheets tab. After two weeks, look at patterns: if 60% of escalations are one document type, the extractor for it needs refinement.

Google Sheets is not a crutch — it is an MVP

The final step is putting data where people can see it. Google Sheets works perfectly as the first storage layer: errors are obvious without special tools, the accountant reviews right in the browser. Routing in n8n is a Switch node by document type: invoices to one sheet, contracts to another. Add a status column (auto / escalated / processed) — after a month you will have real statistics to justify integration with 1C or a CRM.

Integrations (1C, CRM, ERP)

Next step after MVP

Google Sheets (MVP)

Visible immediately, easy to review

n8n Switch → route by type

invoice / contract / request / escalate

Do not start with production system integration. The first two weeks Google Sheets will give you data to calibrate prompts. Integrating a broken pipeline into a CRM multiplies problems across the entire business.

Result

A working n8n pipeline: incoming document → Claude classification → structured data extraction → automatic routing or human escalation → Google Sheets record. First documents pass through within a day of setup.

All Recipes

AI Document Processing Pipeline: Classify → Extract → Route

BeginnerAutomation20 minClaude API, n8n, Google Sheets

A pipeline beats a generalist — here is why

Incoming document

Classification

type known

Data extraction

Validation

data complete

Routing

Classifier: one word — no explanations allowed

❌ One prompt for everything

Mixes classification and extraction
Cannot debug independently
Hard to tell where error occurred

✅ Prompt pipeline

Classifier → document type only
Extractor → fields for that type only
Each step tested independently

классификатор → ОДНО слово из списка:
  invoice | contract | request | unknown

правила определения типа:
  invoice: счёт, акт, накладная с суммой
  contract: договор, соглашение, оферта
  request: заявка, обращение, запрос
  unknown: всё остальное

никаких пояснений — только тип → switch по значению

Data extraction: JSON schema and honest null

экстрактор для типа "invoice":
  ВХОД: текст документа
  ВЫХОД: { сумма, валюта, поставщик, дата, номер }
  ПРАВИЛО: если поле не найдено → null, не выдумывать

для каждого типа — свой набор полей:
  invoice → сумма, валюта, поставщик, дата
  contract → стороны, предмет, срок
  request → тема, автор, срочность

Ask the model to add a confidence field (0-1) for each value. If confidence < 0.7 — the document goes to manual review. A cheap filter for edge cases.

Do not remove the human — remove the boredom

Auto-process or escalate?

Document type recognized with confidence (> 0.8)

All required fields extracted (not null)

Amount within expected range

Type is unknown or confidence < 0.8

Critical fields are null (amount, date, counterparty)

First 2 weeks — log everything and spot-check

Collect all escalation cases in a separate Google Sheets tab. After two weeks, look at patterns: if 60% of escalations are one document type, the extractor for it needs refinement.

Google Sheets is not a crutch — it is an MVP

Integrations (1C, CRM, ERP)

Next step after MVP

Google Sheets (MVP)

Visible immediately, easy to review

n8n Switch → route by type

invoice / contract / request / escalate

AI Document Processing Pipeline: Classify → Extract → Route

A pipeline beats a generalist — here is why

Classifier: one word — no explanations allowed

❌ One prompt for everything

✅ Prompt pipeline

Data extraction: JSON schema and honest null

Do not remove the human — remove the boredom

Auto-process or escalate?

Google Sheets is not a crutch — it is an MVP

Result

Related Theory

AI Document Processing Pipeline: Classify → Extract → Route

A pipeline beats a generalist — here is why

Classifier: one word — no explanations allowed

❌ One prompt for everything

✅ Prompt pipeline

Data extraction: JSON schema and honest null

Do not remove the human — remove the boredom

Auto-process or escalate?

Google Sheets is not a crutch — it is an MVP

Result

Related Theory