🧠 DSPy: Programming Language Models

Declarative Self-Improving Python for Modular AI Systems

源码级别解析 · Stanford NLP · Source Code Analysis
2026-04-17 | 每日技术深度解读

What is DSPy?

The framework for programming—not prompting—language models

Declarative framework for building modular AI software
Composes natural-language modules with different models
Offers algorithms for optimizing prompts and weights
Creates reliable, maintainable, portable AI systems

Think of DSPy as a higher-level language for AI programming, like the shift from assembly to C

DSPy Core Philosophy

From brittle prompts to structured code

Replace prompt engineering with declarative programming
Focus on behavior rather than implementation details
Automatic prompt compilation and optimization
Model-agnostic AI system design

DSPy shifts focus from tinkering with prompt strings to programming with structured modules

Key Concepts

Building blocks of DSPy

Signatures: Define input/output behavior
Modules: AI system components with strategies
Optimizers: Tune prompts and weights
Programs: Composed modules for complex tasks

Each concept addresses a specific aspect of building robust AI systems

DSPy Architecture Overview

┌─────────────────┐ ┌─────────────────┐ │ Signatures │────▶│ Modules │ │ (Behavior Spec) │ │ (Components) │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Optimizers │────▶│ Programs │ │ (Tune/Compile) │ │ (Composed Logic)│ └─────────────────┘ └─────────────────┘ │ │ └─────────────┬───────────┘ ▼ ┌─────────────────┐ │Language Models │ │ (OpenAI, etc.) │ └─────────────────┘

Architecture showing how signatures define behavior, modules implement strategies, and optimizers compile programs

Signatures: The Heart of DSPy

Defining AI behavior declaratively

Natural language input/output specifications
Type hints for structured outputs
Field definitions with descriptions
Automatic prompt generation

Signatures separate what the AI should do from how it does it

Signature Definition Example

class MathQuestion(dspy.Signature):
    """Solve mathematical word problems."""
    
    question: str = dspy.InputField(desc="The mathematical problem to solve")
    answer: float = dspy.OutputField(desc="The numerical answer")
    reasoning: str = dspy.OutputField(desc="Step-by-step reasoning")

Signature defines the interface and behavior for a mathematical problem solver

Signature Processing Flow

From signature to executable prompt

Parse signature into template
Generate few-shot examples
Construct final prompt
Execute and parse response

DSPy automatically converts signatures into effective prompts

Automatic Prompt Generation

# DSPy automatically converts this signature:
class SentimentAnalysis(dspy.Signature):
    text: str = dspy.InputField()
    sentiment: str = dspy.OutputField()

# Into optimized prompts like:
"""Analyze the sentiment of the following text:

Text: {text}

Sentiment (positive/negative/neutral):"""

DSPy handles the low-level prompt engineering automatically

Module Types

Building blocks for AI systems

dspy.Predict: Basic LM calls
dspy.ChainOfThought: Reasoning chains
dspy.ReAct: Tool-using agents
dspy.MultiChainComparison: Ensemble methods
dspy.ProgramOfThought: Code generation

Each module type provides a different strategy for LM interaction

Basic Module Usage

# Simple prediction module
predict = dspy.Predict("question -> answer")
result = predict(question="What is 2+2?")
print(result.answer)  # "4"

# Chain of Thought module
cot = dspy.ChainOfThought("question -> answer: float")
math_result = cot(question="What's the probability of rolling a 7 with two dice?")
print(math_result.reasoning)  # Detailed reasoning
print(math_result.answer)     # 0.166667

Modules provide higher-level abstractions over direct LM calls

Chain of Thought Module

Step-by-step reasoning with LMs

Generates intermediate reasoning steps
Improves accuracy for complex tasks
Supports mathematical and logical reasoning
Preserves reasoning in output

Chain of Thought helps LMs break down complex problems

Chain of Thought Implementation

class MathProblemSolver(dspy.Module):
    def __init__(self):
        self.solve = dspy.ChainOfThought("question -> answer: float")
    
    def forward(self, question):
        result = self.solve(question=question)
        return dspy.Prediction(
            answer=result.answer,
            reasoning=result.reasoning
        )

Chain of Thought modules automatically include reasoning in their output

ReAct Module

Reasoning and Acting with Tools

Combines reasoning with tool usage
Step-by-step execution
Multiple tool calls per query
Tool integration and output parsing

ReAct enables complex problem solving through iterative reasoning and tool use

ReAct Agent Example

def search_wikipedia(query: str) -> list[str]:
    """Search Wikipedia for relevant information"""
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

def calculate(expression: str) -> float:
    """Evaluate mathematical expression"""
    return dspy.PythonInterpreter({}).execute(expression)

# Create ReAct agent with tools
react = dspy.ReAct("question -> answer", tools=[search_wikipedia, calculate])
result = react(question="What is 9362158 divided by the year David Gregory was born?")

ReAct agents can use multiple tools to solve complex problems

Multi-Chain Comparison

Ensemble reasoning for better results

Generate multiple reasoning chains
Compare and rank results
Select best answer
Improve reliability and accuracy

Multiple perspectives lead to more robust answers

Multi-Chain Comparison Flow

┌─────────────────┐ ┌─────────────────┐ │ Input │─────▶│ Chain 1 │ │ (Question) │ │ Reasoning │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Chain 2 │─────▶│ Chain N │ │ Reasoning │ │ Reasoning │ └─────────────────┘ └─────────────────┘ │ │ └─────────────┬───────────┘ ▼ ┌─────────────────┐ │ Comparator │ │ (Select Best) │ └─────────────────┘ ▼ ┌─────────────────┐ │ Final Answer │ └─────────────────┘

Multiple reasoning chains are compared to select the best result

Program of Thought

Generating and executing programs

LM generates code as reasoning
Code execution for verification
Program synthesis and refinement
Complex task automation

Program of Thought combines code generation with execution

Optimizers: DSPy's Superpower

Automatically improving AI systems

Tune prompts automatically
Optimize module weights
Learn from training data
Cross-model portability

Optimizers replace manual prompt engineering with automatic optimization

Optimizer Types

Different optimization strategies

BootstrapFewShot: Few-shot learning
MIPROv2: Multi-objective optimization
BootstrapFinetune: Weight fine-tuning
GEPA: Grounded example proposal
BetterTogether: Composed optimization

Each optimizer addresses different aspects of system optimization

Basic Optimization Example

import dspy
from dspy.datasets import HotPotQA

# Configure language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Define training data
trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]

# Create ReAct agent
react = dspy.ReAct("question -> answer", tools=[search_wikipedia])

# Optimize with MIPROv2
tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)
optimized_react = tp.compile(react, trainset=trainset)

MIPROv2 can improve ReAct performance from 24% to 51% on HotPotQA

MIPROv2 Optimization Process

Multi-objective Program Optimization

Bootstrapping: Generate high-quality examples
Grounded Proposal: Create instruction drafts
Discrete Search: Evaluate and select best
Iterative improvement

MIPROv2 considers multiple objectives simultaneously

MIPROv2 Optimization Pipeline

┌─────────────────┐ ┌─────────────────┐ │ Training │─────▶│ Bootstrapping │ │ Data │ │ (Filter Good) │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Program │─────▶│ Grounded │ │ Structure │ │ Proposal │ └─────────────────┘ └─────────────────┘ │ │ ┼─────────────┬───────────┤ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Instructions │─────▶│ Discrete │ │ & Examples │ │ Search │ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Optimized │ │ Program │ └─────────────────┘

MIPROv2 uses a sophisticated multi-stage optimization process

BootstrapFewShot with Random Search

Effective few-shot learning

Bootstrap high-quality examples
Random search for best combinations
Cross-validation for robustness
Scalable to large datasets

BootstrapFewShotRS improves example selection quality

BootstrapFinetune

Weight fine-tuning optimization

Train model weights on task data
Combine with prompt optimization
Support various model types
Cross-model compatibility

BootstrapFinetune fine-tunes model weights for better performance

GEPA: Grounded Example Proposal Algorithm

Intelligent example generation

Uses program structure knowledge
Generates context-aware examples
Reduces manual effort
Improves example quality

GEPA provides better examples through structural understanding

BetterTogether

Composed optimization strategies

Combine multiple optimizers
Sequential or parallel execution
Cross-optimization benefits
Improved overall performance

Different optimizers can work together for better results

Model Configuration

Setting up language models

OpenAI: gpt-4o, gpt-4o-mini
Anthropic: claude-3-sonnet
Gemini: gemini-2.5-flash
Local models: Ollama, SGLang
50+ providers supported

DSPy supports a wide range of language model providers

LM Configuration Examples

# OpenAI configuration
lm_openai = dspy.LM("openai/gpt-5-mini", api_key="YOUR_API_KEY")
dspy.configure(lm=lm_openai)

# Anthropic configuration
lm_anthropic = dspy.LM("anthropic/claude-sonnet-4-5-20250929", api_key="YOUR_API_KEY")
dspy.configure(lm=lm_anthropic)

# Local Ollama configuration
lm_local = dspy.LM("ollama_chat/llama3.2:1b", api_base="http://localhost:11434", api_key="")
dspy.configure(lm=lm_local)

# Databricks configuration
lm_databricks = dspy.LM("databricks/databricks-llama-4-maverick", 
                       api_key="YOUR_TOKEN", 
                       api_base="YOUR_URL")

DSPy provides a unified API for different model providers

Retrieval-Augmented Generation (RAG)

Integrating external knowledge

ColBERTv2 for document retrieval
Vector database integration
Context-aware responses
Reduced hallucination

DSPy makes it easy to build RAG systems

RAG Implementation with DSPy

class RAG(dspy.Module):
    def __init__(self, num_docs=5):
        self.num_docs = num_docs
        self.retrieve = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")
        self.respond = dspy.ChainOfThought("context, question -> response")
    
    def forward(self, question):
        # Retrieve relevant documents
        context = self.retrieve(question, k=self.num_docs)
        # Generate response with context
        return self.respond(context=context, question=question)

DSPy RAG systems integrate retrieval and generation seamlessly

Multi-Stage Pipeline Example

Complex AI system composition

Outline generation module
Section drafting module
Content optimization
Quality evaluation

DSPy enables building complex multi-stage pipelines

Multi-Stage Article Generation

class ArticleGenerator(dspy.Module):
    def __init__(self):
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)
    
    def forward(self, topic):
        # Generate article outline
        outline = self.build_outline(topic=topic)
        sections = []
        
        # Draft each section
        for heading, subheadings in outline.section_subheadings.items():
            section = self.draft_section(
                topic=outline.title,
                section_heading=f"## {heading}",
                section_subheadings=[f"### {s}" for s in subheadings]
            )
            sections.append(section.content)
        
        return dspy.Prediction(title=outline.title, sections=sections)

Multi-stage pipelines can be optimized end-to-end

Classification with DSPy

Structured classification tasks

Multi-class classification
Multi-label classification
Probabilistic outputs
Custom metrics

DSPy supports various classification scenarios

Classification Implementation

from typing import Literal

class SentimentClassification(dspy.Signature):
    """Classify sentiment of text with toxicity score."""
    text: str = dspy.InputField()
    sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
    toxicity: float = dspy.OutputField()

class Classifier(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(SentimentClassification)
    
    def forward(self, text):
        return self.classify(text=text)

DSPy classifiers can output structured data including confidence scores

Information Extraction

Structured data from unstructured text

Named entity recognition
Relation extraction
Event extraction
Custom extraction schemas

DSPy can extract structured information from text

Information Extraction Example

class ExtractInfo(dspy.Signature):
    """Extract structured information from text."""
    
    text: str = dspy.InputField()
    title: str = dspy.OutputField()
    headings: list[str] = dspy.OutputField()
    entities: list[dict[str, str]] = dspy.OutputField(
        desc="a list of entities and their metadata"
    )

extractor = dspy.Predict(ExtractInfo)
text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."
result = extractor(text=text)
print(result.entities)  # [{'name': 'Apple Inc.', 'type': 'Organization'}, ...]

DSPy extractors can produce structured outputs from raw text

Code Generation

AI-powered programming assistance

Function generation
Class generation
Algorithm implementation
Code optimization

DSPy can generate and refine code

Code Generation with ProgramOfThought

class CodeGenerator(dspy.Signature):
    """Generate Python code for given specification."""
    
    specification: str = dspy.InputField()
    code: str = dspy.OutputField(desc="Python code implementation")
    explanation: str = dspy.OutputField(desc="Code explanation")

class ProgramGenerator(dspy.Module):
    def __init__(self):
        self.generate = dspy.ProgramOfThought(CodeGenerator)
    
    def forward(self, spec):
        return self.generate(specification=spec)

ProgramOfThought generates executable code from specifications

Evaluation Metrics

Measuring AI system performance

Answer exact match
Semantic F1 score
Custom evaluation functions
Automated testing

DSPy provides built-in metrics and supports custom evaluation

Evaluation Implementation

import dspy

# Built-in metrics
answer_exact_match = dspy.evaluate.answer_exact_match
semantic_f1 = dspy.evaluate.SemanticF1(decompositional=True)

# Custom evaluation function
def custom_accuracy(prediction, target, trace=None):
    """Custom accuracy metric"""
    # Custom logic for your specific task
    return prediction.answer.lower().strip() == target.answer.lower().strip()

# Evaluate system
evaluator = dspy.Evaluate(devset=devset, num_threads=24, display_progress=True)
accuracy = evaluator(reag_system, metric=custom_accuracy)

Evaluation metrics help measure and improve system performance

Cross-Model Portability

Same system, different models

Model-agnostic program design
Easy model switching
Performance comparison
Best model selection

DSPy programs can run on different models without changes

Cross-Model Example

# Define system once
math_system = dspy.ChainOfThought("question -> answer: float")

# Switch between models easily
# OpenAI
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
result_gpt = math_system(question="What is 15 * 27?")

# Anthropic
dspy.configure(lm=dspy.LM("anthropic/claude-sonnet-4-5-20250929"))
result_claude = math_system(question="What is 15 * 27?")

# Compare results
print(f"GPT-4o-mini: {result_gpt.answer}")
print(f"Claude: {result_claude.answer}")

Same system, different models - easy comparison and optimization

Research and Publications

DSPy academic foundation

DSPy: Compiling Declarative Language Model Calls (ICLR 2024)
Demonstrate-Search-Predict (arXiv 2022)
MIPROv2: Multi-objective Program Optimization (arXiv 2024)
BetterTogether: Composed Optimizers (arXiv 2024)

DSPy is built on solid research foundation

Real-World Applications

DSPy in production

STORM: Question Answering System
IReRa: Information Retrieval
PAPILLON: Multi-modal Generation
PATH: Tool Learning
WangLab@MEDIQA: Medical QA

DSPy powers many state-of-the-art AI systems

DSPy Ecosystem

Growing community and tools

250+ contributors on GitHub
Tutorials and documentation
Pre-built modules and optimizers
Active Discord community

DSPy has a thriving open-source ecosystem

Installation and Setup

Getting started with DSPy

pip install -U dspy
Simple LM configuration
Available datasets integration
Examples and tutorials

Easy installation and comprehensive documentation

Quick Start Example

# Install DSPy
# pip install -U dspy

import dspy

# Configure language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Create a simple module
math = dspy.ChainOfThought("question -> answer: float")

# Use it
result = math(question="What is the square root of 144?")
print(f"Answer: {result.answer}")
print(f"Reasoning: {result.reasoning}")

Simple example demonstrates DSPy's ease of use

Performance Optimization

Making DSPy systems faster

Caching for repeated queries
Batch processing
Asynchronous execution
Model optimization strategies

DSPy systems can be optimized for production use

Performance Optimization Techniques

# Enable caching
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini", cache=True))

# Batch processing
def batch_process(questions):
    with dspy.settings.context(lm=dspy.LM("openai/gpt-4o-mini", batch_size=32)):
        return [math(q) for q in questions]

# Asynchronous execution
import asyncio
async def async_process(question):
    result = await math.forward(question)
    return result

Various techniques to improve performance

Debugging and Monitoring

Understanding system behavior

Trace logging
Performance metrics
Error tracking
Output validation

DSPy provides tools for debugging and monitoring

Advanced Features

Powerful DSPy capabilities

Assertions for validation
Program templates
Multi-modal support
Custom adapters

DSPy offers many advanced features for complex systems

DSPy vs Traditional Prompting

Comparison with traditional approaches

Declarative vs imperative programming
Automatic vs manual optimization
Modular vs monolithic design
Maintainable vs fragile systems

DSPy represents a paradigm shift in AI system design

DSPy vs Traditional Approaches

Feature	Traditional Prompting	DSPy
Code Structure	String manipulation	Declarative modules
Optimization	Manual tweaking	Automatic optimization
Portability	Model-specific	Cross-model
Maintenance	Brittle prompts	Structured code
Performance	Manual optimization	Algorithmic optimization

Best Practices

Effective DSPy development

Start with simple signatures
Use appropriate module types
Optimize with relevant metrics
Test with diverse inputs
Iterative improvement

Follow best practices for better results

Future Directions

DSPy evolution

Advanced optimizers
Multi-modal extensions
Real-time optimization
Broader model support
Improved tool integration

DSPy continues to evolve with new capabilities

Conclusion

DSPy's Impact

Revolutionizes AI system development
Democratizes advanced AI capabilities
Improves reliability and maintainability
Enables faster innovation

DSPy represents the future of AI system development

参考资料

DSPy GitHub: https://github.com/stanfordnlp/dspy
Official Documentation: https://dspy.ai/
Research Papers: https://dspy.ai/tutorials/
Discord Community: https://discord.gg/XCGy2WDCQB

感谢阅读！
访问 https://atcfu.com/ai-articles/dspy-programming-language-models/ 回顾本文