AI Document Processing Pipeline: Classify → Extract → Route
Every company drowns in documents: invoices, contracts, requests, letters. AI can read them — but for this to work reliably, you need not one magic button, but a pipeline: first understand what the document is, then extract the needed data, then decide what to do with it. We break down how to build such a pipeline in n8n with the Claude API.
BeginnerAutomation20 minClaude API, n8n, Google Sheets
1
A pipeline beats a generalist — here is why
A single prompt that says "parse this document and decide what to do" mixes three tasks: identify the type, extract data, choose a route. If anything breaks, you cannot tell where. A pipeline separates concerns: each step does one thing and passes the result forward. You can eyeball intermediate output, localize an error in a minute, and swap out one step without touching the rest.
Incoming document
Classification
type known
Data extraction
Validation
data complete
Routing
A good sign of correct decomposition: the intermediate result can be printed and shown to an accountant. If the classification result makes sense to a human without context — you are on the right track.
2
Classifier: one word — no explanations allowed
Classification is the foundation of the whole pipeline: it determines which extraction prompt to use next. The principle is simple: the classifier returns exactly one word from a closed list — invoice, contract, request, unknown. No summaries, no justifications. This lets n8n switch on the value and branch without any parsing. Add one example per type directly in the prompt — that alone gives +20-30% accuracy for free.
❌ One prompt for everything
- Mixes classification and extraction
- Cannot debug independently
- Hard to tell where error occurred
✅ Prompt pipeline
- Classifier → document type only
- Extractor → fields for that type only
- Each step tested independently
классификатор → ОДНО слово из списка:
invoice | contract | request | unknown
правила определения типа:
invoice: счёт, акт, накладная с суммой
contract: договор, соглашение, оферта
request: заявка, обращение, запрос
unknown: всё остальное
никаких пояснений — только тип → switch по значению3
Data extraction: JSON schema and honest null
You know the document type — now you need to extract specific fields. For an invoice: amount, date, vendor. For a contract: parties, subject, term. This is exactly why classification comes first: without the type, you cannot write a precise extraction prompt.
The key to reliability is demanding JSON with a strict schema. Not "tell me about the invoice" but "return JSON with fields amount, currency, vendor, due_date". The critical rule: if a field is not found — null, do not invent. Models love to hallucinate missing data, and an explicit instruction "null, do not make things up" is the only defense.
In n8n the JSON result is immediately available as an object: you access fields directly in subsequent nodes, no regex or string parsing needed. This is another reason to demand a strict schema — automation breaks if the format drifts.
экстрактор для типа "invoice":
ВХОД: текст документа
ВЫХОД: { сумма, валюта, поставщик, дата, номер }
ПРАВИЛО: если поле не найдено → null, не выдумывать
для каждого типа — свой набор полей:
invoice → сумма, валюта, поставщик, дата
contract → стороны, предмет, срок
request → тема, автор, срочностьAsk the model to add a confidence field (0-1) for each value. If confidence < 0.7 — the document goes to manual review. A cheap filter for edge cases.
4
Do not remove the human — remove the boredom
Full automation from day one is a sure way to lose trust in the system after the first mistake. The rule: auto-process what AI gets right in 95%+ of cases. Everything else goes to human review. At the start this will be less than half of documents, after a month — most. Escalation signals: low model confidence, unknown type, missing critical fields. A Slack notification or email with a link to the document.
Auto-process or escalate?
Document type recognized with confidence (> 0.8)
All required fields extracted (not null)
Amount within expected range
Type is unknown or confidence < 0.8
Critical fields are null (amount, date, counterparty)
First 2 weeks — log everything and spot-check
Collect all escalation cases in a separate Google Sheets tab. After two weeks, look at patterns: if 60% of escalations are one document type, the extractor for it needs refinement.
5
Google Sheets is not a crutch — it is an MVP
The final step is putting data where people can see it. Google Sheets works perfectly as the first storage layer: errors are obvious without special tools, the accountant reviews right in the browser. Routing in n8n is a Switch node by document type: invoices to one sheet, contracts to another. Add a status column (auto / escalated / processed) — after a month you will have real statistics to justify integration with 1C or a CRM.
Integrations (1C, CRM, ERP)
Next step after MVP
Google Sheets (MVP)
Visible immediately, easy to review
n8n Switch → route by type
invoice / contract / request / escalate
Do not start with production system integration. The first two weeks Google Sheets will give you data to calibrate prompts. Integrating a broken pipeline into a CRM multiplies problems across the entire business.
Result
A working n8n pipeline: incoming document → Claude classification → structured data extraction → automatic routing or human escalation → Google Sheets record. First documents pass through within a day of setup.