Structured Output
JSON, schemas & validated responses
The Problem: LLMs produce beautiful prose, but your code needs JSON. Free-text output breaks parsers, varies in format, and requires fragile regex to extract data. How do you get reliable, machine-readable output?
The Solution: Taming Free Text into Structured Data
Structured output is a way of constraining an LLM's response so it follows a specific schema — a JSON object, a typed set of function parameters, or any validated data structure — instead of producing free-form prose. Think of it as giving the model a form to fill out rather than a blank page. A normal prompt might come back as "Sure! The customer's name is Anna and her order total was about $42." — perfectly readable for a human, but a nightmare for your code. Structured output forces that same answer into something like { "name": "Anna", "total": 42.0 }, every single time.
How it works
Under the hood there are two main mechanisms. JSON mode tells the model that its entire reply must be a single valid JSON document, which removes the "helpful" chatter around the data. Function calling (also called tool calling) goes a step further: you describe a function and its parameters as a schema, and the API constrains generation so the model can only emit arguments that match the field names and types you defined. Many providers implement this with constrained decoding — at each step the model is only allowed to pick tokens that keep the output syntactically valid against the schema, so a malformed response becomes structurally impossible rather than merely unlikely.
When to use it — and the catch
Reach for structured output whenever an LLM's answer feeds directly into other code: extraction pipelines, API responses, database writes, agent tool calls, or config generation. The big payoff is that you can drop the fragile regex and string-parsing and trust the shape of the data. The crucial pitfall: valid JSON is not the same as correct data. A schema guarantees the shape (a field called total that is a number) but never the meaning — the model can still hallucinate a wrong number or invert a boolean. For example, ask for an invoice as { "amount": number, "currency": string, "paid": boolean } and you might get perfectly valid JSON where amount is off by a digit or paid is flipped. Always validate values with a tool like Pydantic or Zod, not just the schema, and on failure feed the error back to the model for one or two retries.
Think of it like filling out a form instead of writing an essay:
- 1. Define output schema: Write a JSON Schema, Pydantic model, or TypeScript interface that describes the exact shape
- 2. Enable JSON mode or function calling: Pass the schema to the API; the model is now constrained to output matching that structure
- 3. LLM generates schema-compliant output: Model fills in fields like a form — field names, types, and nesting are guaranteed
- 4. Parse and validate against schema: Deserialise and run Pydantic/Zod validation to catch semantic errors, not just format errors
- 5. Retry with error feedback if invalid: On validation failure, feed the error message back to the model for self-correction (max 2 retries)
Where Is This Used?
- API Response Formatting: Guaranteed JSON responses that parsers can consume without error handling for format issues
- Data Extraction Pipelines: Pulling structured fields (name, date, amount) from unstructured documents at scale
- Form Filling Automation: Converting free-text intake forms or emails into validated database records
- Configuration Generation: Turning natural language feature descriptions into typed config objects or infrastructure-as-code
- Common Pitfall: Semantically Wrong Values: Even with JSON mode, LLMs may produce valid JSON with wrong values (hallucinated dates, inverted booleans) — always validate values, not just schema shape, with Pydantic or Zod
Fun Fact: Without JSON mode, LLMs add "helpful" text around JSON about 30% of the time: "Here's the JSON you requested: {...}". With JSON mode, this drops to 0%. Function calling goes further — it guarantees not just valid JSON, but valid JSON matching your exact schema. The difference in downstream parsing errors: 30% → 0.5%.
Try It Yourself!
Use the interactive demo below to see the difference between free-text and structured output, and build your own schemas to extract data reliably.
Structured Output Explorer
InteractiveInput text
Hi, I'm Sarah Chen from Acme Corp. You can reach me at sarah@acme.com or call 555-0123. I'm based in San Francisco and I'd love to discuss the Q3 partnership proposal.
The sender is Sarah Chen who works at Acme Corp. Her email is sarah@acme.com and her phone number is 555-0123. She's located in San Francisco and wants to discuss a Q3 partnership proposal.
- • JSON mode eliminates 30% of responses that wrap JSON in "helpful" prose.
- • Function calling guarantees YOUR schema compliance — fields, types, required/optional.
- • Always validate + retry: most format errors resolve in 1 retry with error feedback.
Frequently asked questions
What is JSON mode and when should I use it?
JSON mode forces the LLM to output valid JSON. Use it whenever you need machine-readable output — API responses, data extraction, form filling. Most providers support it via a parameter (response_format: json_object).
How does function calling help with structured output?
Function calling lets you define an exact schema (parameters with types, required fields, enums). The LLM fills in the schema rather than generating free text. This guarantees format compliance and enables type-safe integration.
What if the LLM returns invalid structured output?
Implement validation + retry: parse the output, validate against your schema, and if invalid, send the error message back to the LLM asking it to fix. Most failures resolve in 1 retry. Set a max retry limit (2–3).
Can I get nested or complex structures from LLMs?
Yes. Define nested schemas clearly with examples. For very complex structures, break into multiple calls — extract top-level fields first, then details for each. Pydantic models are excellent for defining and validating complex schemas.
Try it yourself
Interactive demo of this technique
Extract contact info as JSON from an email signature
Name: John Smith Title: Marketing Director Email: john@techcorp.com Phone: +1 (555) 123-4567
{ "name": "John Smith", "title": "Marketing Director", "company": "TechCorp LLC", "email": "john@techcorp.com", "phone": "+1 (555) 123-4567", "website": "https://techcorp.com" }
For structured data extraction, give the model an exact JSON schema with types and a null rule for absent fields — this makes the output programmatically parseable without post-processing.
Create a free account to solve challenges
1 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path