🧠 DSPy: Programming Language Models

Declarative Self-Improving Python for Modular AI Systems

源码级别解析 · Stanford NLP · Source Code Analysis
2026-04-17 | 每日技术深度解读

What is DSPy?

The framework for programming—not prompting—language models
  • Declarative framework for building modular AI software
  • Composes natural-language modules with different models
  • Offers algorithms for optimizing prompts and weights
  • Creates reliable, maintainable, portable AI systems

Think of DSPy as a higher-level language for AI programming, like the shift from assembly to C

DSPy Core Philosophy

From brittle prompts to structured code
  • Replace prompt engineering with declarative programming
  • Focus on behavior rather than implementation details
  • Automatic prompt compilation and optimization
  • Model-agnostic AI system design

DSPy shifts focus from tinkering with prompt strings to programming with structured modules

Key Concepts

Building blocks of DSPy
  • Signatures: Define input/output behavior
  • Modules: AI system components with strategies
  • Optimizers: Tune prompts and weights
  • Programs: Composed modules for complex tasks

Each concept addresses a specific aspect of building robust AI systems

DSPy Architecture Overview

┌─────────────────┐ ┌─────────────────┐ │ Signatures │────▶│ Modules │ │ (Behavior Spec) │ │ (Components) │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Optimizers │────▶│ Programs │ │ (Tune/Compile) │ │ (Composed Logic)│ └─────────────────┘ └─────────────────┘ │ │ └─────────────┬───────────┘ ▼ ┌─────────────────┐ │Language Models │ │ (OpenAI, etc.) │ └─────────────────┘

Architecture showing how signatures define behavior, modules implement strategies, and optimizers compile programs

Signatures: The Heart of DSPy

Defining AI behavior declaratively
  • Natural language input/output specifications
  • Type hints for structured outputs
  • Field definitions with descriptions
  • Automatic prompt generation

Signatures separate what the AI should do from how it does it

Signature Definition Example

class MathQuestion(dspy.Signature):
    """Solve mathematical word problems."""
    
    question: str = dspy.InputField(desc="The mathematical problem to solve")
    answer: float = dspy.OutputField(desc="The numerical answer")
    reasoning: str = dspy.OutputField(desc="Step-by-step reasoning")

Signature defines the interface and behavior for a mathematical problem solver

Signature Processing Flow

From signature to executable prompt
  • Parse signature into template
  • Generate few-shot examples
  • Construct final prompt
  • Execute and parse response

DSPy automatically converts signatures into effective prompts

Automatic Prompt Generation

# DSPy automatically converts this signature:
class SentimentAnalysis(dspy.Signature):
    text: str = dspy.InputField()
    sentiment: str = dspy.OutputField()

# Into optimized prompts like:
"""Analyze the sentiment of the following text:

Text: {text}

Sentiment (positive/negative/neutral):"""

DSPy handles the low-level prompt engineering automatically

Module Types

Building blocks for AI systems
  • dspy.Predict: Basic LM calls
  • dspy.ChainOfThought: Reasoning chains
  • dspy.ReAct: Tool-using agents
  • dspy.MultiChainComparison: Ensemble methods
  • dspy.ProgramOfThought: Code generation

Each module type provides a different strategy for LM interaction

Basic Module Usage

# Simple prediction module
predict = dspy.Predict("question -> answer")
result = predict(question="What is 2+2?")
print(result.answer)  # "4"

# Chain of Thought module
cot = dspy.ChainOfThought("question -> answer: float")
math_result = cot(question="What's the probability of rolling a 7 with two dice?")
print(math_result.reasoning)  # Detailed reasoning
print(math_result.answer)     # 0.166667

Modules provide higher-level abstractions over direct LM calls

Chain of Thought Module

Step-by-step reasoning with LMs
  • Generates intermediate reasoning steps
  • Improves accuracy for complex tasks
  • Supports mathematical and logical reasoning
  • Preserves reasoning in output

Chain of Thought helps LMs break down complex problems

Chain of Thought Implementation

class MathProblemSolver(dspy.Module):
    def __init__(self):
        self.solve = dspy.ChainOfThought("question -> answer: float")
    
    def forward(self, question):
        result = self.solve(question=question)
        return dspy.Prediction(
            answer=result.answer,
            reasoning=result.reasoning
        )

Chain of Thought modules automatically include reasoning in their output

ReAct Module

Reasoning and Acting with Tools
  • Combines reasoning with tool usage
  • Step-by-step execution
  • Multiple tool calls per query
  • Tool integration and output parsing

ReAct enables complex problem solving through iterative reasoning and tool use

ReAct Agent Example

def search_wikipedia(query: str) -> list[str]:
    """Search Wikipedia for relevant information"""
    results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
    return [x["text"] for x in results]

def calculate(expression: str) -> float:
    """Evaluate mathematical expression"""
    return dspy.PythonInterpreter({}).execute(expression)

# Create ReAct agent with tools
react = dspy.ReAct("question -> answer", tools=[search_wikipedia, calculate])
result = react(question="What is 9362158 divided by the year David Gregory was born?")

ReAct agents can use multiple tools to solve complex problems

Multi-Chain Comparison

Ensemble reasoning for better results
  • Generate multiple reasoning chains
  • Compare and rank results
  • Select best answer
  • Improve reliability and accuracy

Multiple perspectives lead to more robust answers

Multi-Chain Comparison Flow

┌─────────────────┐ ┌─────────────────┐ │ Input │─────▶│ Chain 1 │ │ (Question) │ │ Reasoning │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Chain 2 │─────▶│ Chain N │ │ Reasoning │ │ Reasoning │ └─────────────────┘ └─────────────────┘ │ │ └─────────────┬───────────┘ ▼ ┌─────────────────┐ │ Comparator │ │ (Select Best) │ └─────────────────┘ ▼ ┌─────────────────┐ │ Final Answer │ └─────────────────┘

Multiple reasoning chains are compared to select the best result

Program of Thought

Generating and executing programs
  • LM generates code as reasoning
  • Code execution for verification
  • Program synthesis and refinement
  • Complex task automation

Program of Thought combines code generation with execution

Optimizers: DSPy's Superpower

Automatically improving AI systems
  • Tune prompts automatically
  • Optimize module weights
  • Learn from training data
  • Cross-model portability

Optimizers replace manual prompt engineering with automatic optimization

Optimizer Types

Different optimization strategies
  • BootstrapFewShot: Few-shot learning
  • MIPROv2: Multi-objective optimization
  • BootstrapFinetune: Weight fine-tuning
  • GEPA: Grounded example proposal
  • BetterTogether: Composed optimization

Each optimizer addresses different aspects of system optimization

Basic Optimization Example

import dspy
from dspy.datasets import HotPotQA

# Configure language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Define training data
trainset = [x.with_inputs('question') for x in HotPotQA(train_seed=2024, train_size=500).train]

# Create ReAct agent
react = dspy.ReAct("question -> answer", tools=[search_wikipedia])

# Optimize with MIPROv2
tp = dspy.MIPROv2(metric=dspy.evaluate.answer_exact_match, auto="light", num_threads=24)
optimized_react = tp.compile(react, trainset=trainset)

MIPROv2 can improve ReAct performance from 24% to 51% on HotPotQA

MIPROv2 Optimization Process

Multi-objective Program Optimization
  • Bootstrapping: Generate high-quality examples
  • Grounded Proposal: Create instruction drafts
  • Discrete Search: Evaluate and select best
  • Iterative improvement

MIPROv2 considers multiple objectives simultaneously

MIPROv2 Optimization Pipeline

┌─────────────────┐ ┌─────────────────┐ │ Training │─────▶│ Bootstrapping │ │ Data │ │ (Filter Good) │ └─────────────────┘ └─────────────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Program │─────▶│ Grounded │ │ Structure │ │ Proposal │ └─────────────────┘ └─────────────────┘ │ │ ┼─────────────┬───────────┤ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Instructions │─────▶│ Discrete │ │ & Examples │ │ Search │ └─────────────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Optimized │ │ Program │ └─────────────────┘

MIPROv2 uses a sophisticated multi-stage optimization process

BootstrapFewShot with Random Search

Effective few-shot learning
  • Bootstrap high-quality examples
  • Random search for best combinations
  • Cross-validation for robustness
  • Scalable to large datasets

BootstrapFewShotRS improves example selection quality

BootstrapFinetune

Weight fine-tuning optimization
  • Train model weights on task data
  • Combine with prompt optimization
  • Support various model types
  • Cross-model compatibility

BootstrapFinetune fine-tunes model weights for better performance

GEPA: Grounded Example Proposal Algorithm

Intelligent example generation
  • Uses program structure knowledge
  • Generates context-aware examples
  • Reduces manual effort
  • Improves example quality

GEPA provides better examples through structural understanding

BetterTogether

Composed optimization strategies
  • Combine multiple optimizers
  • Sequential or parallel execution
  • Cross-optimization benefits
  • Improved overall performance

Different optimizers can work together for better results

Model Configuration

Setting up language models
  • OpenAI: gpt-4o, gpt-4o-mini
  • Anthropic: claude-3-sonnet
  • Gemini: gemini-2.5-flash
  • Local models: Ollama, SGLang
  • 50+ providers supported

DSPy supports a wide range of language model providers

LM Configuration Examples

# OpenAI configuration
lm_openai = dspy.LM("openai/gpt-5-mini", api_key="YOUR_API_KEY")
dspy.configure(lm=lm_openai)

# Anthropic configuration
lm_anthropic = dspy.LM("anthropic/claude-sonnet-4-5-20250929", api_key="YOUR_API_KEY")
dspy.configure(lm=lm_anthropic)

# Local Ollama configuration
lm_local = dspy.LM("ollama_chat/llama3.2:1b", api_base="http://localhost:11434", api_key="")
dspy.configure(lm=lm_local)

# Databricks configuration
lm_databricks = dspy.LM("databricks/databricks-llama-4-maverick", 
                       api_key="YOUR_TOKEN", 
                       api_base="YOUR_URL")

DSPy provides a unified API for different model providers

Retrieval-Augmented Generation (RAG)

Integrating external knowledge
  • ColBERTv2 for document retrieval
  • Vector database integration
  • Context-aware responses
  • Reduced hallucination

DSPy makes it easy to build RAG systems

RAG Implementation with DSPy

class RAG(dspy.Module):
    def __init__(self, num_docs=5):
        self.num_docs = num_docs
        self.retrieve = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")
        self.respond = dspy.ChainOfThought("context, question -> response")
    
    def forward(self, question):
        # Retrieve relevant documents
        context = self.retrieve(question, k=self.num_docs)
        # Generate response with context
        return self.respond(context=context, question=question)

DSPy RAG systems integrate retrieval and generation seamlessly

Multi-Stage Pipeline Example

Complex AI system composition
  • Outline generation module
  • Section drafting module
  • Content optimization
  • Quality evaluation

DSPy enables building complex multi-stage pipelines

Multi-Stage Article Generation

class ArticleGenerator(dspy.Module):
    def __init__(self):
        self.build_outline = dspy.ChainOfThought(Outline)
        self.draft_section = dspy.ChainOfThought(DraftSection)
    
    def forward(self, topic):
        # Generate article outline
        outline = self.build_outline(topic=topic)
        sections = []
        
        # Draft each section
        for heading, subheadings in outline.section_subheadings.items():
            section = self.draft_section(
                topic=outline.title,
                section_heading=f"## {heading}",
                section_subheadings=[f"### {s}" for s in subheadings]
            )
            sections.append(section.content)
        
        return dspy.Prediction(title=outline.title, sections=sections)

Multi-stage pipelines can be optimized end-to-end

Classification with DSPy

Structured classification tasks
  • Multi-class classification
  • Multi-label classification
  • Probabilistic outputs
  • Custom metrics

DSPy supports various classification scenarios

Classification Implementation

from typing import Literal

class SentimentClassification(dspy.Signature):
    """Classify sentiment of text with toxicity score."""
    text: str = dspy.InputField()
    sentiment: Literal["positive", "negative", "neutral"] = dspy.OutputField()
    toxicity: float = dspy.OutputField()

class Classifier(dspy.Module):
    def __init__(self):
        self.classify = dspy.Predict(SentimentClassification)
    
    def forward(self, text):
        return self.classify(text=text)

DSPy classifiers can output structured data including confidence scores

Information Extraction

Structured data from unstructured text
  • Named entity recognition
  • Relation extraction
  • Event extraction
  • Custom extraction schemas

DSPy can extract structured information from text

Information Extraction Example

class ExtractInfo(dspy.Signature):
    """Extract structured information from text."""
    
    text: str = dspy.InputField()
    title: str = dspy.OutputField()
    headings: list[str] = dspy.OutputField()
    entities: list[dict[str, str]] = dspy.OutputField(
        desc="a list of entities and their metadata"
    )

extractor = dspy.Predict(ExtractInfo)
text = "Apple Inc. announced its latest iPhone 14 today. The CEO, Tim Cook, highlighted its new features in a press release."
result = extractor(text=text)
print(result.entities)  # [{'name': 'Apple Inc.', 'type': 'Organization'}, ...]

DSPy extractors can produce structured outputs from raw text

Code Generation

AI-powered programming assistance
  • Function generation
  • Class generation
  • Algorithm implementation
  • Code optimization

DSPy can generate and refine code

Code Generation with ProgramOfThought

class CodeGenerator(dspy.Signature):
    """Generate Python code for given specification."""
    
    specification: str = dspy.InputField()
    code: str = dspy.OutputField(desc="Python code implementation")
    explanation: str = dspy.OutputField(desc="Code explanation")

class ProgramGenerator(dspy.Module):
    def __init__(self):
        self.generate = dspy.ProgramOfThought(CodeGenerator)
    
    def forward(self, spec):
        return self.generate(specification=spec)

ProgramOfThought generates executable code from specifications

Evaluation Metrics

Measuring AI system performance
  • Answer exact match
  • Semantic F1 score
  • Custom evaluation functions
  • Automated testing

DSPy provides built-in metrics and supports custom evaluation

Evaluation Implementation

import dspy

# Built-in metrics
answer_exact_match = dspy.evaluate.answer_exact_match
semantic_f1 = dspy.evaluate.SemanticF1(decompositional=True)

# Custom evaluation function
def custom_accuracy(prediction, target, trace=None):
    """Custom accuracy metric"""
    # Custom logic for your specific task
    return prediction.answer.lower().strip() == target.answer.lower().strip()

# Evaluate system
evaluator = dspy.Evaluate(devset=devset, num_threads=24, display_progress=True)
accuracy = evaluator(reag_system, metric=custom_accuracy)

Evaluation metrics help measure and improve system performance

Cross-Model Portability

Same system, different models
  • Model-agnostic program design
  • Easy model switching
  • Performance comparison
  • Best model selection

DSPy programs can run on different models without changes

Cross-Model Example

# Define system once
math_system = dspy.ChainOfThought("question -> answer: float")

# Switch between models easily
# OpenAI
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
result_gpt = math_system(question="What is 15 * 27?")

# Anthropic
dspy.configure(lm=dspy.LM("anthropic/claude-sonnet-4-5-20250929"))
result_claude = math_system(question="What is 15 * 27?")

# Compare results
print(f"GPT-4o-mini: {result_gpt.answer}")
print(f"Claude: {result_claude.answer}")

Same system, different models - easy comparison and optimization

Research and Publications

DSPy academic foundation
  • DSPy: Compiling Declarative Language Model Calls (ICLR 2024)
  • Demonstrate-Search-Predict (arXiv 2022)
  • MIPROv2: Multi-objective Program Optimization (arXiv 2024)
  • BetterTogether: Composed Optimizers (arXiv 2024)

DSPy is built on solid research foundation

Real-World Applications

DSPy in production
  • STORM: Question Answering System
  • IReRa: Information Retrieval
  • PAPILLON: Multi-modal Generation
  • PATH: Tool Learning
  • WangLab@MEDIQA: Medical QA

DSPy powers many state-of-the-art AI systems

DSPy Ecosystem

Growing community and tools
  • 250+ contributors on GitHub
  • Tutorials and documentation
  • Pre-built modules and optimizers
  • Active Discord community

DSPy has a thriving open-source ecosystem

Installation and Setup

Getting started with DSPy
  • pip install -U dspy
  • Simple LM configuration
  • Available datasets integration
  • Examples and tutorials

Easy installation and comprehensive documentation

Quick Start Example

# Install DSPy
# pip install -U dspy

import dspy

# Configure language model
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))

# Create a simple module
math = dspy.ChainOfThought("question -> answer: float")

# Use it
result = math(question="What is the square root of 144?")
print(f"Answer: {result.answer}")
print(f"Reasoning: {result.reasoning}")

Simple example demonstrates DSPy's ease of use

Performance Optimization

Making DSPy systems faster
  • Caching for repeated queries
  • Batch processing
  • Asynchronous execution
  • Model optimization strategies

DSPy systems can be optimized for production use

Performance Optimization Techniques

# Enable caching
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini", cache=True))

# Batch processing
def batch_process(questions):
    with dspy.settings.context(lm=dspy.LM("openai/gpt-4o-mini", batch_size=32)):
        return [math(q) for q in questions]

# Asynchronous execution
import asyncio
async def async_process(question):
    result = await math.forward(question)
    return result

Various techniques to improve performance

Debugging and Monitoring

Understanding system behavior
  • Trace logging
  • Performance metrics
  • Error tracking
  • Output validation

DSPy provides tools for debugging and monitoring

Advanced Features

Powerful DSPy capabilities
  • Assertions for validation
  • Program templates
  • Multi-modal support
  • Custom adapters

DSPy offers many advanced features for complex systems

DSPy vs Traditional Prompting

Comparison with traditional approaches
  • Declarative vs imperative programming
  • Automatic vs manual optimization
  • Modular vs monolithic design
  • Maintainable vs fragile systems

DSPy represents a paradigm shift in AI system design

DSPy vs Traditional Approaches

FeatureTraditional PromptingDSPy
Code StructureString manipulationDeclarative modules
OptimizationManual tweakingAutomatic optimization
PortabilityModel-specificCross-model
MaintenanceBrittle promptsStructured code
PerformanceManual optimizationAlgorithmic optimization

Best Practices

Effective DSPy development
  • Start with simple signatures
  • Use appropriate module types
  • Optimize with relevant metrics
  • Test with diverse inputs
  • Iterative improvement

Follow best practices for better results

Future Directions

DSPy evolution
  • Advanced optimizers
  • Multi-modal extensions
  • Real-time optimization
  • Broader model support
  • Improved tool integration

DSPy continues to evolve with new capabilities

Conclusion

DSPy's Impact
  • Revolutionizes AI system development
  • Democratizes advanced AI capabilities
  • Improves reliability and maintainability
  • Enables faster innovation

DSPy represents the future of AI system development

参考资料

  • DSPy GitHub: https://github.com/stanfordnlp/dspy
  • Official Documentation: https://dspy.ai/
  • Research Papers: https://dspy.ai/tutorials/
  • Discord Community: https://discord.gg/XCGy2WDCQB

感谢阅读!
访问 https://atcfu.com/ai-articles/dspy-programming-language-models/ 回顾本文