ApplicationSQL

Text-to-SQL — Natural Language to SQL Queries

Natural language to database queries

The Problem: Business users need data but can't write SQL. Developers become bottlenecks for every "Can you pull the numbers for...?" request. How do you let anyone query a database?

The Solution: Talk to Your Database

Text-to-SQL uses LLMs to convert natural language questions into SQL queries. The model needs the database schema (tables, columns, relationships) as context, then generates valid SQL. It acts like a translator between human questions and database language — letting any team member query data without SQL knowledge.

Think of it like a database expert who speaks plain English:

1. Provide database schema: Include CREATE TABLE statements, column descriptions, and sample values in the prompt
2. User asks in natural language: "What were the top 5 products by revenue last month?" — no SQL knowledge required
3. LLM generates SQL query: Model outputs a valid SELECT statement, including joins, aggregations, and filters
4. Validate and sanitize the SQL: Parse the AST, reject any mutation statements, and enforce row-level security
5. Execute on read-only replica: Run the query safely, return results, and display them in the requesting user's interface

Where Is This Used?

Business Analytics: "Show me sales by region this quarter" becomes a valid GROUP BY query instantly
Customer Support Dashboards: Support agents pull live ticket stats without bothering a data engineer
Self-Service Reporting: Marketing and finance teams query their own data directly, removing developer bottlenecks
Data Exploration: Analysts ask follow-up questions in natural language instead of rewriting queries
Common Pitfall: Unvalidated SQL Execution: Never execute LLM-generated SQL on production without validation — always run on a read-only replica, sanitize for injection, and confirm destructive operations (UPDATE, DELETE) require human approval

Fun Fact: The Spider benchmark for Text-to-SQL has over 10,000 questions across 200+ databases. Top LLMs achieve 85%+ accuracy on simple queries but drop to ~50% on complex multi-table joins. The trick? Providing column descriptions and sample values alongside the schema boosts accuracy by 15-20%.

Try It Yourself!

Try the interactive demo below to see how natural language questions get converted to SQL queries, and learn to spot common translation errors.

Text-to-SQL Translation

See how natural language questions are converted into SQL queries step by step.

E-commerce Database Schema3 tables

products

PKidINT

nameVARCHAR

categoryVARCHAR

priceDECIMAL

ratingDECIMAL

in_stockBOOL

table

orders

PKidINT

FKproduct_idINT

FKcustomer_idINT

quantityINT

order_dateDATE

totalDECIMAL

table

customers

PKidINT

nameVARCHAR

emailVARCHAR

cityVARCHAR

joined_dateDATE

table

products←orders(product_id → products.id)

customers←orders(customer_id → customers.id)

PK = Primary KeyFK = Foreign Key

Key Insight

• Without exact table and column names, the LLM guesses wrong. Always provide the full schema.
• Single-table SELECT with basic WHERE achieve 85%+ accuracy. Complex JOINs need human validation.
• NEVER run LLM-generated SQL on production. Use read-only connections. One missed WHERE can wipe a table.

1 / 3

Frequently asked questions

How does Text-to-SQL work with LLMs?

You provide the database schema (table names, columns, types, relationships) and a natural language question. The LLM generates a SQL query that answers the question. Schema context is critical — without it, the LLM guesses table/column names.

How accurate is LLM-generated SQL?

On standard benchmarks, top models achieve 80–90% accuracy on simple queries. Complex queries (multiple JOINs, subqueries, window functions) are less reliable. Always validate generated SQL before executing on production databases.

How do I prevent SQL injection from LLM output?

Never execute raw LLM output directly. Use parameterized queries, validate SQL syntax, restrict to SELECT-only (no DELETE/DROP/UPDATE), and run against a read-only database replica. Add query complexity limits.

What schema information should I provide?

Include table names, column names with types, primary/foreign key relationships, and sample values for ambiguous columns. A CREATE TABLE statement is the most efficient format. Add comments for non-obvious column names.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerCoding

Convert a simple natural language question into a SELECT query

Without technique

Prompt

Write SQL: "show all users from Moscow"

Response

SELECT * FROM users WHERE city = "Moscow";

Tokens:16/18

Time:210ms

Quality:

With applications-text-to-sql

With technique

Prompt

Convert the question into a SQL query. Use the database schema below. **Schema:** ```sql CREATE TABLE users ( id SERIAL PRIMARY KEY, name VARCHAR(100), email VARCHAR(255) UNIQUE, city VARCHAR(100), created_at TIMESTAMP, is_active BOOLEAN DEFAULT true ); ``` **Generation rules:** - Use specific column names instead of SELECT * - Add WHERE is_active = true unless the question explicitly requires inactive users - Use single quotes for strings (SQL standard) - Add ORDER BY for deterministic results - Return only the query code with no explanation **Question:** "Show all active users from Moscow, sorted by name"

Response

SELECT id, name, email, city, created_at
FROM users
WHERE city = 'Moscow'
  AND is_active = true
ORDER BY name ASC;

👁️The table schema eliminates the need to guess column names and data types

🧠The "specific columns instead of *" rule makes the query predictable and index-friendly

🔍Single quotes and ORDER BY are details the model often skips without explicit instruction

Tokens:155/52

Time:640ms

Quality:

Why this works

Text-to-SQL without a schema is guesswork. Providing the DDL schema, quoting standards, and SELECT * rules transforms an approximate query into correct, executable SQL.

1 / 2

Practice Challenges

Create a free account to solve challenges

1 AI-verified challenges for this lesson

Related lessons:Structured Output Information Extraction

This lesson is part of a structured LLM course.

My Learning Path

ApplicationSQL

Text-to-SQL — Natural Language to SQL Queries

Natural language to database queries

The Problem: Business users need data but can't write SQL. Developers become bottlenecks for every "Can you pull the numbers for...?" request. How do you let anyone query a database?

The Solution: Talk to Your Database

Think of it like a database expert who speaks plain English:

1. Provide database schema: Include CREATE TABLE statements, column descriptions, and sample values in the prompt
2. User asks in natural language: "What were the top 5 products by revenue last month?" — no SQL knowledge required
3. LLM generates SQL query: Model outputs a valid SELECT statement, including joins, aggregations, and filters
4. Validate and sanitize the SQL: Parse the AST, reject any mutation statements, and enforce row-level security
5. Execute on read-only replica: Run the query safely, return results, and display them in the requesting user's interface

Where Is This Used?

Business Analytics: "Show me sales by region this quarter" becomes a valid GROUP BY query instantly
Customer Support Dashboards: Support agents pull live ticket stats without bothering a data engineer
Self-Service Reporting: Marketing and finance teams query their own data directly, removing developer bottlenecks
Data Exploration: Analysts ask follow-up questions in natural language instead of rewriting queries
Common Pitfall: Unvalidated SQL Execution: Never execute LLM-generated SQL on production without validation — always run on a read-only replica, sanitize for injection, and confirm destructive operations (UPDATE, DELETE) require human approval

Try It Yourself!

Try the interactive demo below to see how natural language questions get converted to SQL queries, and learn to spot common translation errors.

Text-to-SQL Translation

See how natural language questions are converted into SQL queries step by step.

E-commerce Database Schema3 tables

products

PKidINT

nameVARCHAR

categoryVARCHAR

priceDECIMAL

ratingDECIMAL

in_stockBOOL

table

orders

PKidINT

FKproduct_idINT

FKcustomer_idINT

quantityINT

order_dateDATE

totalDECIMAL

table

customers

PKidINT

nameVARCHAR

emailVARCHAR

cityVARCHAR

joined_dateDATE

table

products←orders(product_id → products.id)

customers←orders(customer_id → customers.id)

PK = Primary KeyFK = Foreign Key

Key Insight

• Without exact table and column names, the LLM guesses wrong. Always provide the full schema.
• Single-table SELECT with basic WHERE achieve 85%+ accuracy. Complex JOINs need human validation.
• NEVER run LLM-generated SQL on production. Use read-only connections. One missed WHERE can wipe a table.

1 / 3

Frequently asked questions

How does Text-to-SQL work with LLMs?

How accurate is LLM-generated SQL?

How do I prevent SQL injection from LLM output?

What schema information should I provide?

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerCoding

Convert a simple natural language question into a SELECT query

Without technique

Prompt

Write SQL: "show all users from Moscow"

Response

SELECT * FROM users WHERE city = "Moscow";

Tokens:16/18

Time:210ms

Quality:

With applications-text-to-sql

With technique

Prompt

Response

SELECT id, name, email, city, created_at
FROM users
WHERE city = 'Moscow'
  AND is_active = true
ORDER BY name ASC;

👁️The table schema eliminates the need to guess column names and data types

🧠The "specific columns instead of *" rule makes the query predictable and index-friendly

🔍Single quotes and ORDER BY are details the model often skips without explicit instruction

Tokens:155/52

Time:640ms

Quality:

Why this works

Text-to-SQL without a schema is guesswork. Providing the DDL schema, quoting standards, and SELECT * rules transforms an approximate query into correct, executable SQL.

1 / 2

Practice Challenges

Create a free account to solve challenges

1 AI-verified challenges for this lesson

Related lessons:Structured Output Information Extraction

This lesson is part of a structured LLM course.

My Learning Path