ApplicationJSON

Structured Output

JSON, schemas & validated responses

The Problem: LLMs produce beautiful prose, but your code needs JSON. Free-text output breaks parsers, varies in format, and requires fragile regex to extract data. How do you get reliable, machine-readable output?

The Solution: Taming Free Text into Structured Data

Structured output is a way of constraining an LLM's response so it follows a specific schema — a JSON object, a typed set of function parameters, or any validated data structure — instead of producing free-form prose. Think of it as giving the model a form to fill out rather than a blank page. A normal prompt might come back as "Sure! The customer's name is Anna and her order total was about $42." — perfectly readable for a human, but a nightmare for your code. Structured output forces that same answer into something like { "name": "Anna", "total": 42.0 }, every single time.

How it works

Under the hood there are two main mechanisms. JSON mode tells the model that its entire reply must be a single valid JSON document, which removes the "helpful" chatter around the data. Function calling (also called tool calling) goes a step further: you describe a function and its parameters as a schema, and the API constrains generation so the model can only emit arguments that match the field names and types you defined. Many providers implement this with constrained decoding — at each step the model is only allowed to pick tokens that keep the output syntactically valid against the schema, so a malformed response becomes structurally impossible rather than merely unlikely.

When to use it — and the catch

Reach for structured output whenever an LLM's answer feeds directly into other code: extraction pipelines, API responses, database writes, agent tool calls, or config generation. The big payoff is that you can drop the fragile regex and string-parsing and trust the shape of the data. The crucial pitfall: valid JSON is not the same as correct data. A schema guarantees the shape (a field called total that is a number) but never the meaning — the model can still hallucinate a wrong number or invert a boolean. For example, ask for an invoice as { "amount": number, "currency": string, "paid": boolean } and you might get perfectly valid JSON where amount is off by a digit or paid is flipped. Always validate values with a tool like Pydantic or Zod, not just the schema, and on failure feed the error back to the model for one or two retries.

Think of it like filling out a form instead of writing an essay:

1. Define output schema: Write a JSON Schema, Pydantic model, or TypeScript interface that describes the exact shape
2. Enable JSON mode or function calling: Pass the schema to the API; the model is now constrained to output matching that structure
3. LLM generates schema-compliant output: Model fills in fields like a form — field names, types, and nesting are guaranteed
4. Parse and validate against schema: Deserialise and run Pydantic/Zod validation to catch semantic errors, not just format errors
5. Retry with error feedback if invalid: On validation failure, feed the error message back to the model for self-correction (max 2 retries)

Where Is This Used?

API Response Formatting: Guaranteed JSON responses that parsers can consume without error handling for format issues
Data Extraction Pipelines: Pulling structured fields (name, date, amount) from unstructured documents at scale
Form Filling Automation: Converting free-text intake forms or emails into validated database records
Configuration Generation: Turning natural language feature descriptions into typed config objects or infrastructure-as-code
Common Pitfall: Semantically Wrong Values: Even with JSON mode, LLMs may produce valid JSON with wrong values (hallucinated dates, inverted booleans) — always validate values, not just schema shape, with Pydantic or Zod

Fun Fact: Without JSON mode, LLMs add "helpful" text around JSON about 30% of the time: "Here's the JSON you requested: {...}". With JSON mode, this drops to 0%. Function calling goes further — it guarantees not just valid JSON, but valid JSON matching your exact schema. The difference in downstream parsing errors: 30% → 0.5%.

Try It Yourself!

Use the interactive demo below to see the difference between free-text and structured output, and build your own schemas to extract data reliably.

Structured Output Explorer

Interactive

Input text

Hi, I'm Sarah Chen from Acme Corp. You can reach me at sarah@acme.com or call 555-0123. I'm based in San Francisco and I'd love to discuss the Q3 partnership proposal.

Free Text Mode

The sender is Sarah Chen who works at Acme Corp. Her email is sarah@acme.com and her phone number is 555-0123. She's located in San Francisco and wants to discuss a Q3 partnership proposal.

How do you extract the email? regex? string splitting?

Field order varies response to response

Extra prose makes parsing fragile

JSON Mode

Key Insight

• JSON mode eliminates 30% of responses that wrap JSON in "helpful" prose.
• Function calling guarantees YOUR schema compliance — fields, types, required/optional.
• Always validate + retry: most format errors resolve in 1 retry with error feedback.

Frequently asked questions

What is JSON mode and when should I use it?

JSON mode forces the LLM to output valid JSON. Use it whenever you need machine-readable output — API responses, data extraction, form filling. Most providers support it via a parameter (response_format: json_object).

How does function calling help with structured output?

Function calling lets you define an exact schema (parameters with types, required fields, enums). The LLM fills in the schema rather than generating free text. This guarantees format compliance and enables type-safe integration.

What if the LLM returns invalid structured output?

Implement validation + retry: parse the output, validate against your schema, and if invalid, send the error message back to the LLM asking it to fix. Most failures resolve in 1 retry. Set a max retry limit (2–3).

Can I get nested or complex structures from LLMs?

Yes. Define nested schemas clearly with examples. For very complex structures, break into multiple calls — extract top-level fields first, then details for each. Pydantic models are excellent for defining and validating complex schemas.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Extract contact info as JSON from an email signature

Without technique

Prompt

Extract contacts from signature: "John Smith, Marketing Director, TechCorp LLC, john@techcorp.com, +1 (555) 123-4567, techcorp.com"

Response

Name: John Smith Title: Marketing Director Email: john@techcorp.com Phone: +1 (555) 123-4567

Tokens:40/38

Time:310ms

Quality:

With applications-structured-output

With technique

Prompt

Extract contact information from the email signature and return strictly valid JSON. **JSON schema (all fields required, null if absent):** ```json { "name": "string", "title": "string | null", "company": "string | null", "email": "string | null", "phone": "string | null", "website": "string | null" } ``` **Rules:** - Phone: keep in original format - Website: add https:// if scheme is missing - Return only JSON, no explanation and no markdown code block **Signature:** "John Smith, Marketing Director, TechCorp LLC, john@techcorp.com, +1 (555) 123-4567, techcorp.com"

Response

{ "name": "John Smith", "title": "Marketing Director", "company": "TechCorp LLC", "email": "john@techcorp.com", "phone": "+1 (555) 123-4567", "website": "https://techcorp.com" }

👁️Explicit JSON schema with types eliminates format variability — the model knows the exact structure

🧠Requiring null for absent fields matters for programmatic processing — avoids KeyError

🔍Prohibiting markdown wrapper (no ```json```) allows parsing the response directly via JSON.parse()

Tokens:140/80

Time:620ms

Quality:

Why this works

For structured data extraction, give the model an exact JSON schema with types and a null rule for absent fields — this makes the output programmatically parseable without post-processing.

1 / 2

Practice Challenges

Create a free account to solve challenges

1 AI-verified challenges for this lesson

Related lessons:Information Extraction Text To Sql

This lesson is part of a structured LLM course.

My Learning Path

ApplicationJSON

Structured Output

JSON, schemas & validated responses

The Solution: Taming Free Text into Structured Data

How it works

When to use it — and the catch

Think of it like filling out a form instead of writing an essay:

1. Define output schema: Write a JSON Schema, Pydantic model, or TypeScript interface that describes the exact shape
2. Enable JSON mode or function calling: Pass the schema to the API; the model is now constrained to output matching that structure
3. LLM generates schema-compliant output: Model fills in fields like a form — field names, types, and nesting are guaranteed
4. Parse and validate against schema: Deserialise and run Pydantic/Zod validation to catch semantic errors, not just format errors
5. Retry with error feedback if invalid: On validation failure, feed the error message back to the model for self-correction (max 2 retries)

Where Is This Used?

API Response Formatting: Guaranteed JSON responses that parsers can consume without error handling for format issues
Data Extraction Pipelines: Pulling structured fields (name, date, amount) from unstructured documents at scale
Form Filling Automation: Converting free-text intake forms or emails into validated database records
Configuration Generation: Turning natural language feature descriptions into typed config objects or infrastructure-as-code
Common Pitfall: Semantically Wrong Values: Even with JSON mode, LLMs may produce valid JSON with wrong values (hallucinated dates, inverted booleans) — always validate values, not just schema shape, with Pydantic or Zod

Try It Yourself!

Use the interactive demo below to see the difference between free-text and structured output, and build your own schemas to extract data reliably.

Structured Output Explorer

Interactive

Input text

Hi, I'm Sarah Chen from Acme Corp. You can reach me at sarah@acme.com or call 555-0123. I'm based in San Francisco and I'd love to discuss the Q3 partnership proposal.

Free Text Mode

The sender is Sarah Chen who works at Acme Corp. Her email is sarah@acme.com and her phone number is 555-0123. She's located in San Francisco and wants to discuss a Q3 partnership proposal.

How do you extract the email? regex? string splitting?

Field order varies response to response

Extra prose makes parsing fragile

JSON Mode

Key Insight

• JSON mode eliminates 30% of responses that wrap JSON in "helpful" prose.
• Function calling guarantees YOUR schema compliance — fields, types, required/optional.
• Always validate + retry: most format errors resolve in 1 retry with error feedback.

Frequently asked questions

What is JSON mode and when should I use it?

How does function calling help with structured output?

What if the LLM returns invalid structured output?

Can I get nested or complex structures from LLMs?

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Extract contact info as JSON from an email signature

Without technique

Prompt

Extract contacts from signature: "John Smith, Marketing Director, TechCorp LLC, john@techcorp.com, +1 (555) 123-4567, techcorp.com"

Response

Name: John Smith Title: Marketing Director Email: john@techcorp.com Phone: +1 (555) 123-4567

Tokens:40/38

Time:310ms

Quality:

With applications-structured-output

With technique

Prompt

Response

{ "name": "John Smith", "title": "Marketing Director", "company": "TechCorp LLC", "email": "john@techcorp.com", "phone": "+1 (555) 123-4567", "website": "https://techcorp.com" }

👁️Explicit JSON schema with types eliminates format variability — the model knows the exact structure

🧠Requiring null for absent fields matters for programmatic processing — avoids KeyError

🔍Prohibiting markdown wrapper (no ```json```) allows parsing the response directly via JSON.parse()

Tokens:140/80

Time:620ms

Quality:

Why this works

For structured data extraction, give the model an exact JSON schema with types and a null rule for absent fields — this makes the output programmatically parseable without post-processing.

1 / 2

Practice Challenges

Create a free account to solve challenges

1 AI-verified challenges for this lesson

Related lessons:Information Extraction Text To Sql

This lesson is part of a structured LLM course.

My Learning Path