Lesson 16Advanced

DSPy — Programming Language Models

Declarative signatures, modules & compilers

The Problem: You spent hours crafting the perfect prompt. Then the model updated, and it broke. You switched to a different LLM — broken again. How do you build prompts that survive change?

The Solution: Program, Don't Prompt

DSPy (Declarative Self-improving Python) is a framework from the Stanford NLP group that replaces fragile hand-written prompts with declarative programs. Instead of guessing the perfect wording, you describe the shape of the task with a signature — what input goes in and what output comes out — then choose a reasoning module (Predict, ChainOfThought, ReAct), and a compiler automatically generates and tunes the actual prompt for your specific data and metric. The core slogan is "program, don't prompt": you write Python that declares intent, and DSPy figures out the string that makes a given model perform best.

How it works

Three layers stack on top of each other. A signature like context, question -> answer declares the fields and their types. A module wraps that signature in a reasoning strategy — for example dspy.ChainOfThought tells the model to think step by step before answering. Finally an optimizer (also called a teleprompter), such as BootstrapFewShot or MIPROv2, runs your program over a handful of labeled examples, measures the result with a metric you define, and searches for the best combination of few-shot demonstrations and instructions. The output is a compiled program: a concrete, reproducible prompt you can save, version, and rerun. Because the prompt is generated rather than hand-typed, swapping the underlying model means recompiling instead of rewriting everything by hand.

When to use it — and the tradeoffs

Reach for DSPy when you have a multi-step pipeline (retrieval + reasoning + extraction), a measurable metric, and a small set of labeled examples — that is exactly where manual prompt tuning becomes a guessing game. It pairs especially well with RAG and multi-hop question answering, where many sub-prompts interact. The tradeoffs are real: there is a learning curve, compilation makes extra LLM calls (which cost time and money), and a good metric is mandatory — without one the optimizer has nothing to optimize. For a single throwaway prompt, plain prompting is faster. As a concrete example, imagine a support- ticket classifier. You write the signature ticket -> category, wrap it in dspy.Predict, define a metric that returns 1 when the predicted category matches the label, and hand the optimizer 50 labeled tickets. BootstrapFewShot tries different demonstration sets, keeps the ones that push accuracy up, and emits a final prompt — no manual example-picking, and you can recompile the same program for a cheaper or newer model in minutes.

Think of it like an ORM for databases:

1. Define signature: Declare input -> output fields: "context, question -> answer"
2. Choose module: Pick a reasoning pattern: Predict, ChainOfThought, ReAct, or ProgramOfThought
3. Compile with optimizer: BootstrapFewShot or MIPRO auto-selects best examples and instructions using your metric
4. Deploy compiled program: The optimized program works with any LLM — portable and reproducible

Where Is This Used?

NLP Pipelines: Chain multiple modules (summarize -> classify -> extract) with auto-optimized prompts at each step
RAG Optimization: Automatically optimize retrieval queries and generation prompts together for better answers
Multi-hop QA: Complex questions requiring multiple reasoning steps — DSPy chains modules and optimizes the full pipeline
Classification: Auto-optimized prompts with best few-shot examples selected by the compiler for your specific data

Fun Fact: DSPy was created by the same Stanford NLP team behind ColBERT and Baleen. The name "DSPy" is a play on PyTorch's nn.Module approach — signatures are like tensor shapes, and modules are like neural network layers.

Try It Yourself!

Explore the interactive visualization below to see how DSPy transforms a manual prompt into a compiled, optimized program.

DSPy: From Signature to Compiled Program

DSPy: Programming Language Models

Explore signatures, modules, optimizers and compilation

DSPy Pipeline

Signature

"context, question -> answer"

Module

ChainOfThought(QA)

Optimizer

BootstrapFewShot + metric

Compiled Prompt

Auto-optimized with best examples

Frequently asked questions

What is DSPy and how is it different from manual prompting?

DSPy is a framework by Stanford NLP that treats LLM calls as declarative functions (signatures) instead of hand-crafted string templates. You define WHAT you want (input -> output), choose a reasoning module (Predict, ChainOfThought, ReAct), and a compiler automatically optimizes the prompt with few-shot examples and instructions.

What are DSPy compilers and how do they work?

DSPy compilers like BootstrapFewShot and MIPRO automatically optimize prompts. BootstrapFewShot runs your program on training data, evaluates outputs with a metric function, and selects the best examples as few-shot demonstrations. MIPRO goes further by also optimizing the instructions themselves.

When should I use DSPy instead of writing prompts manually?

DSPy excels in complex multi-step pipelines, when you need model-agnostic prompts (portable across OpenAI, Anthropic, open-source), or when prompt quality is critical and you have labeled data + a metric function. For simple one-off prompts, manual prompting may be faster to set up.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskIntermediateAnalysis

Build a question-answering system that takes a context paragraph and a question, then returns a precise answer

Without technique

Prompt

You are a helpful assistant. Answer the following question based on the context. Context: The Python programming language was created by Guido van Rossum and first released in 1991. It emphasizes code readability with significant whitespace. Question: When was Python first released?

Response

Based on the context provided, Python was first released in 1991. It was created by Guido van Rossum and is known for its emphasis on code readability, particularly through the use of significant whitespace.

Tokens:65/42

Time:380ms

Quality:

With techniques-dspy

With technique

Prompt

DSPy Program (compiled by BootstrapFewShot): Signature: "context, question -> answer" Module: ChainOfThought(QA) --- Auto-selected demonstrations --- Context: "Marie Curie won the Nobel Prize in Physics in 1903 and in Chemistry in 1911." Question: "How many Nobel Prizes did Marie Curie win?" Reasoning: The context states she won one in 1903 and one in 1911. Answer: 2 Context: "The Great Wall of China was built over many centuries, with the most well-known sections built during the Ming Dynasty (1368-1644)." Question: "Which dynasty built the most famous sections?" Reasoning: The context explicitly says the most well-known sections were built during the Ming Dynasty. Answer: Ming Dynasty --- Current query --- Context: "The Python programming language was created by Guido van Rossum and first released in 1991." Question: "When was Python first released?"

Response

Reasoning: The context directly states that Python was "first released in 1991." Answer: 1991

👁️DSPy signature declares: context + question -> answer

🧠Compiler auto-selected 2 high-quality demonstrations from training data

🧠ChainOfThought module adds explicit reasoning before answering

✅Result: concise, factual answer with reasoning trace — no filler text

Tokens:210/18

Time:250ms

Quality:

Why this works

The manual prompt produced a correct but verbose answer with unnecessary paraphrasing. The DSPy-compiled prompt produced a concise, focused answer because the compiler selected demonstrations that teach the model the desired output format.

1 / 2

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

Related lessons:Ape Chain Of Thought Prompt Chaining

This lesson is part of a structured LLM course.

My Learning Path

Lesson 16Advanced

DSPy — Programming Language Models

Declarative signatures, modules & compilers

The Problem: You spent hours crafting the perfect prompt. Then the model updated, and it broke. You switched to a different LLM — broken again. How do you build prompts that survive change?

The Solution: Program, Don't Prompt

How it works

When to use it — and the tradeoffs

Think of it like an ORM for databases:

1. Define signature: Declare input -> output fields: "context, question -> answer"
2. Choose module: Pick a reasoning pattern: Predict, ChainOfThought, ReAct, or ProgramOfThought
3. Compile with optimizer: BootstrapFewShot or MIPRO auto-selects best examples and instructions using your metric
4. Deploy compiled program: The optimized program works with any LLM — portable and reproducible

Where Is This Used?

NLP Pipelines: Chain multiple modules (summarize -> classify -> extract) with auto-optimized prompts at each step
RAG Optimization: Automatically optimize retrieval queries and generation prompts together for better answers
Multi-hop QA: Complex questions requiring multiple reasoning steps — DSPy chains modules and optimizes the full pipeline
Classification: Auto-optimized prompts with best few-shot examples selected by the compiler for your specific data

Try It Yourself!

Explore the interactive visualization below to see how DSPy transforms a manual prompt into a compiled, optimized program.

DSPy: From Signature to Compiled Program

DSPy: Programming Language Models

Explore signatures, modules, optimizers and compilation

DSPy Pipeline

Signature

"context, question -> answer"

Module

ChainOfThought(QA)

Optimizer

BootstrapFewShot + metric

Compiled Prompt

Auto-optimized with best examples

Frequently asked questions

What is DSPy and how is it different from manual prompting?

What are DSPy compilers and how do they work?

When should I use DSPy instead of writing prompts manually?

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskIntermediateAnalysis

Build a question-answering system that takes a context paragraph and a question, then returns a precise answer

Without technique

Prompt

Response

Tokens:65/42

Time:380ms

Quality:

With techniques-dspy

With technique

Prompt

Response

Reasoning: The context directly states that Python was "first released in 1991." Answer: 1991

👁️DSPy signature declares: context + question -> answer

🧠Compiler auto-selected 2 high-quality demonstrations from training data

🧠ChainOfThought module adds explicit reasoning before answering

✅Result: concise, factual answer with reasoning trace — no filler text

Tokens:210/18

Time:250ms

Quality:

Why this works

1 / 2

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

Related lessons:Ape Chain Of Thought Prompt Chaining

This lesson is part of a structured LLM course.

My Learning Path