DSPy — Programming Language Models
Declarative signatures, modules & compilers
The Problem: You spent hours crafting the perfect prompt. Then the model updated, and it broke. You switched to a different LLM — broken again. How do you build prompts that survive change?
The Solution: Program, Don't Prompt
DSPy (Declarative Self-improving Python) is a framework from the Stanford NLP group that replaces fragile hand-written prompts with declarative programs. Instead of guessing the perfect wording, you describe the shape of the task with a signature — what input goes in and what output comes out — then choose a reasoning module (Predict, ChainOfThought, ReAct), and a compiler automatically generates and tunes the actual prompt for your specific data and metric. The core slogan is "program, don't prompt": you write Python that declares intent, and DSPy figures out the string that makes a given model perform best.
How it works
Three layers stack on top of each other. A signature like context, question -> answer declares the fields and their types. A module wraps that signature in a reasoning strategy — for example dspy.ChainOfThought tells the model to think step by step before answering. Finally an optimizer (also called a teleprompter), such as BootstrapFewShot or MIPROv2, runs your program over a handful of labeled examples, measures the result with a metric you define, and searches for the best combination of few-shot demonstrations and instructions. The output is a compiled program: a concrete, reproducible prompt you can save, version, and rerun. Because the prompt is generated rather than hand-typed, swapping the underlying model means recompiling instead of rewriting everything by hand.
When to use it — and the tradeoffs
Reach for DSPy when you have a multi-step pipeline (retrieval + reasoning + extraction), a measurable metric, and a small set of labeled examples — that is exactly where manual prompt tuning becomes a guessing game. It pairs especially well with RAG and multi-hop question answering, where many sub-prompts interact. The tradeoffs are real: there is a learning curve, compilation makes extra LLM calls (which cost time and money), and a good metric is mandatory — without one the optimizer has nothing to optimize. For a single throwaway prompt, plain prompting is faster. As a concrete example, imagine a support- ticket classifier. You write the signature ticket -> category, wrap it in dspy.Predict, define a metric that returns 1 when the predicted category matches the label, and hand the optimizer 50 labeled tickets. BootstrapFewShot tries different demonstration sets, keeps the ones that push accuracy up, and emits a final prompt — no manual example-picking, and you can recompile the same program for a cheaper or newer model in minutes.
Think of it like an ORM for databases:
- 1. Define signature: Declare input -> output fields: "context, question -> answer"
- 2. Choose module: Pick a reasoning pattern: Predict, ChainOfThought, ReAct, or ProgramOfThought
- 3. Compile with optimizer: BootstrapFewShot or MIPRO auto-selects best examples and instructions using your metric
- 4. Deploy compiled program: The optimized program works with any LLM — portable and reproducible
Where Is This Used?
- NLP Pipelines: Chain multiple modules (summarize -> classify -> extract) with auto-optimized prompts at each step
- RAG Optimization: Automatically optimize retrieval queries and generation prompts together for better answers
- Multi-hop QA: Complex questions requiring multiple reasoning steps — DSPy chains modules and optimizes the full pipeline
- Classification: Auto-optimized prompts with best few-shot examples selected by the compiler for your specific data
Fun Fact: DSPy was created by the same Stanford NLP team behind ColBERT and Baleen. The name "DSPy" is a play on PyTorch's nn.Module approach — signatures are like tensor shapes, and modules are like neural network layers.
Try It Yourself!
Explore the interactive visualization below to see how DSPy transforms a manual prompt into a compiled, optimized program.
DSPy: Programming Language Models
Explore signatures, modules, optimizers and compilation
"context, question -> answer"
ChainOfThought(QA)
BootstrapFewShot + metric
Auto-optimized with best examples
Frequently asked questions
What is DSPy and how is it different from manual prompting?
DSPy is a framework by Stanford NLP that treats LLM calls as declarative functions (signatures) instead of hand-crafted string templates. You define WHAT you want (input -> output), choose a reasoning module (Predict, ChainOfThought, ReAct), and a compiler automatically optimizes the prompt with few-shot examples and instructions.
What are DSPy compilers and how do they work?
DSPy compilers like BootstrapFewShot and MIPRO automatically optimize prompts. BootstrapFewShot runs your program on training data, evaluates outputs with a metric function, and selects the best examples as few-shot demonstrations. MIPRO goes further by also optimizing the instructions themselves.
When should I use DSPy instead of writing prompts manually?
DSPy excels in complex multi-step pipelines, when you need model-agnostic prompts (portable across OpenAI, Anthropic, open-source), or when prompt quality is critical and you have labeled data + a metric function. For simple one-off prompts, manual prompting may be faster to set up.
Try it yourself
Interactive demo of this technique
Build a question-answering system that takes a context paragraph and a question, then returns a precise answer
Based on the context provided, Python was first released in 1991. It was created by Guido van Rossum and is known for its emphasis on code readability, particularly through the use of significant whitespace.
Reasoning: The context directly states that Python was "first released in 1991." Answer: 1991
The manual prompt produced a correct but verbose answer with unnecessary paraphrasing. The DSPy-compiled prompt produced a concise, focused answer because the compiler selected demonstrations that teach the model the desired output format.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path