Building Type-Safe AI Agents with PydanticAI — How We Caught 23 Bugs Before Production

Looking back on the first time I deployed an AI agent to production still gives me chills. I plastered try/except blocks all around the JSON-parsing code for LLM responses, and I can't count how many times the entire pipeline collapsed the moment a response came back in a slightly different format. Once, a confidence field came back as "high" instead of 0.87, and it blew up the entire downstream numerical calculation. That's when I thought, "What if we could enforce types on LLM output from the start?" — and PydanticAI hits exactly that point.

PydanticAI is a framework built by the Pydantic team — the same people behind FastAPI — with the philosophy of "doing for AI agent development what FastAPI did for web development." The V1 stable release came out in September 2025, and since then it has surpassed 15 million cumulative downloads and 16,000 GitHub stars. In terms of real-world data, a comparative measurement showed that for equivalent functionality (an analytics agent with structured output, dependency injection, and tool registration), PydanticAI produces 43% less code than LangGraph and caught 23 more type errors during development. If you're still parsing LLM responses as text while reading this, you're likely repeating the same class of bugs over and over.

In this article, we'll walk through PydanticAI's three core mechanisms — type-safe output, dependency injection, and tool registration — with real code examples, and give an honest breakdown of when to choose PydanticAI versus when to reach for another framework.

Core Concepts

Type-Safe Output: Enforcing LLM Responses into Pydantic Models

The biggest differentiator of PydanticAI is that it automatically maps LLM responses to a Pydantic BaseModel. It doesn't just parse JSON — it guarantees data integrity by retrying or raising type exceptions when the format is wrong.

python

from pydantic import BaseModel
from pydantic_ai import Agent
 
class ResponseModel(BaseModel):
    answer: str
    confidence: float
 
agent = Agent(
    model="openai:gpt-4o",
    result_type=ResponseModel,
    system_prompt="You are a helpful assistant."
)
 
result = agent.run_sync("What is the capital of France?")
print(result.data.answer)      # "Paris"
print(result.data.confidence)  # 0.99

In async environments, you can use await agent.run() instead of agent.run_sync(). In this article, we use run_sync() in synchronous contexts and await agent.run() inside async functions. If Python async patterns are unfamiliar to you, run_sync() is more than enough to get started.

All you need to do is pass ResponseModel to result_type. If the LLM returns confidence as a string, PydanticAI will attempt to coerce it automatically, and if that's not possible, it raises a ValidationError and retries up to 3 times by default. I also initially thought, "Can't I just write 'respond in JSON format' in the prompt?" — but the key difference is that prompt instructions can be ignored by the LLM, whereas type validation is enforced at the code level.

Internally, structured output is handled through three paths:

Path	How It Works	Best For
Tool call-based extraction	LLM returns data as a function call	Models with Function Calling support like OpenAI, Anthropic
Provider-managed JSON schema	Uses the model API's `response_format`	Modern models with JSON Mode support
Prompt-injected formatting	Injects schema into the system prompt	Fallback for models without Function Calling support

Pydantic BaseModel is the base class of Pydantic, a Python data validation library. Declare fields with type hints on the class, and type validation and coercion happen automatically at instance creation time.

Dependency Injection: Cleanly Separating Test and Production Code

One reason AI agents are hard to test is that external dependencies like DB connections and API clients get tangled up inside agent logic. Dependency Injection is a pattern where these external objects aren't hardcoded inside the code but are instead passed in from outside at runtime. PydanticAI solves this the same way FastAPI does.

python

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
 
@dataclass
class Deps:
    db_conn: DatabaseConnection  # Replace with your actual DB client
    api_key: str
 
agent = Agent(
    model="anthropic:claude-3-5-sonnet-latest",
    deps_type=Deps
)
 
@agent.tool
async def get_user_data(ctx: RunContext[Deps], user_id: str) -> dict:
    return await ctx.deps.db_conn.fetch(
        "SELECT * FROM users WHERE id = $1", user_id
    )
 
# Production run
result = await agent.run(
    "Get data for user 123",
    deps=Deps(db_conn=real_db, api_key="prod-key")
)
 
# Test run — just swap in a mock object
result = await agent.run(
    "Get data for user 123",
    deps=Deps(db_conn=mock_db, api_key="test-key")
)

Those who've used it know how convenient it is in practice to run unit tests without a real DB connection. Declare the dependency type with deps_type, and inject the actual object at agent.run(deps=...) time — meaning in a test environment, you can simply swap in a mock object.

Tool Registration: Turning Python Functions into the LLM's Hands and Feet

With just the @agent.tool decorator, you can register an ordinary Python function as a tool the LLM can call. The LLM reads the function's docstring and type hints to decide when to use that tool.

python

@agent.tool
async def query_sales(ctx: RunContext[Deps], region: str) -> list[dict]:
    """Retrieves sales data for the specified region.
    
    Args:
        region: The name of the region to query (e.g., 'seoul', 'busan')
    """
    return await ctx.deps.db_conn.fetch(
        "SELECT * FROM sales WHERE region = $1", region
    )
 
@agent.tool
async def calculate_growth_rate(
    ctx: RunContext[Deps],
    current: float,
    previous: float
) -> float:
    """Calculates the growth rate compared to the previous period."""
    return (current - previous) / previous * 100

Function Calling (tool calling): A feature that instructs the LLM to call a specific function instead of generating text. If the LLM determines that "answering this question requires a DB query," it calls the query_sales function, receives the result, and then generates the final response.

Let's look at how these three mechanisms combine in practice using financial domain code.

Real-World Application

Example 1: Financial Data Analysis Agent

In the financial domain, type safety is not optional — it's required. If revenue comes in as a string or growth_rate is missing as it flows into downstream systems, the consequences are serious. PydanticAI's Pydantic validation acts as that firewall.

python

from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass
from typing import Optional
 
class SalesReport(BaseModel):
    region: str
    total_revenue: float = Field(ge=0, description="Total revenue (KRW)")
    growth_rate: float = Field(description="Growth rate vs. previous period (%)")
    top_product: str
    risk_level: str = Field(pattern="^(low|medium|high)$")
    summary: str
    action_items: list[str]
 
@dataclass
class AnalyticsDeps:
    db_conn: DatabaseConnection  # Replace with your actual DB client
    report_date: str
 
agent = Agent(
    model="openai:gpt-4o",
    result_type=SalesReport,
    deps_type=AnalyticsDeps,
    system_prompt="""
    You are a financial analyst. Analyze sales data and provide 
    structured reports with actionable insights.
    """
)
 
@agent.tool
async def get_regional_sales(
    ctx: RunContext[AnalyticsDeps],
    region: str
) -> dict:
    """Retrieves sales data for a specific region."""
    return await ctx.deps.db_conn.fetch(
        "SELECT * FROM sales WHERE region = $1 AND date = $2",
        region, ctx.deps.report_date
    )
 
@agent.tool
async def get_previous_period(
    ctx: RunContext[AnalyticsDeps],
    region: str
) -> Optional[dict]:
    """Retrieves comparison data from the previous period."""
    return await ctx.deps.db_conn.fetch_one(
        "SELECT total_revenue FROM sales WHERE region = $1 AND date < $2 ORDER BY date DESC LIMIT 1",
        region, ctx.deps.report_date
    )
 
# Run
deps = AnalyticsDeps(db_conn=db, report_date="2026-05-01")
result = await agent.run("Analyze May sales performance for the Seoul region", deps=deps)
 
print(result.data.risk_level)    # Guaranteed to be one of "low", "medium", "high"
print(result.data.growth_rate)   # Guaranteed to be float type
print(result.data.action_items)  # Guaranteed to be list[str] type

The validation options on each field are what make this work:

Code Element	Role
`Field(ge=0)`	Blocks negative revenue at the source
`Field(pattern="^(low\|medium\|high)$")`	Only passes allowed values (low, medium, high)
`result_type=SalesReport`	Validates the entire LLM output through Pydantic
`deps_type=AnalyticsDeps`	Decouples DB connection from agent logic

Example 2: Human-in-the-Loop Approval Workflow

For sensitive operations (transfers, deletions, deployments, etc.), when you want to clearly distinguish between "situations where the agent can execute autonomously" and "situations that require human intervention," the structured output pattern provides a clean solution. The approach is to have the model fill in a requires_approval field based on business logic, and handle the branching at the code level.

python

from pydantic import BaseModel
from pydantic_ai import Agent
 
class TransferRequest(BaseModel):
    from_account: str
    to_account: str
    amount: float
    reason: str
    requires_approval: bool
 
agent = Agent(
    model="anthropic:claude-3-5-sonnet-latest",
    result_type=TransferRequest,
    system_prompt="""
    Process transfer requests. For amounts over 1,000,000 KRW,
    set requires_approval to True.
    """
)
 
result = await agent.run(
    "Process a transfer of 1.5 million KRW from account A to account B"
)
 
transfer = result.data
if transfer.requires_approval:
    # The point of human intervention is clearly expressed through types
    await request_human_approval(transfer)
else:
    await execute_transfer(transfer)

A single requires_approval field lets you clearly distinguish "automatic processing" from "cases requiring human review" at the code level. Back when we parsed LLM response text for phrases like "approval required" to branch logic, we'd constantly struggle with ambiguous expressions like "it seems approval may be needed." In this structure, the bool type eliminates that ambiguity.

Pros and Cons Analysis

Advantages

Item	Details
Type safety	23 additional type errors caught during development (compared to equivalent implementations in LangGraph and CrewAI)
Code conciseness	PydanticAI 160 lines vs. LangGraph 280 lines vs. CrewAI 420 lines for equivalent functionality
Testability	Dependency injection structure enables unit testing without real APIs
Model agnostic	Supports all major models including OpenAI, Anthropic, Gemini, DeepSeek
Durability	Durable Agent support that preserves execution state through API failures and restarts
Free open source	MIT license, no additional cost beyond LLM API fees

Disadvantages and Caveats

Item	Details	Mitigation
Multi-agent limitations	No built-in support for role-based multi-agent systems	Mix with CrewAI or LangGraph
Ecosystem size	Relatively smaller third-party ecosystem compared to LangChain's 300+ integrations	Expand external tool connectivity via MCP integration
Complex state management	LangGraph-level checkpointing and workflow state management requires manual implementation	Combine with LangGraph for complex workflows
Community resources	Fewer references and examples compared to LangChain	High-quality official documentation compensates

The multi-agent limitation hit me hardest when I tried to connect and orchestrate three agents with different roles. In the end, I handled just that part with LangGraph and kept the individual agent logic in PydanticAI — a hybrid structure. The two frameworks work well together, so the combination itself isn't difficult.

Durable Agent: A feature that saves the in-progress state to a checkpoint during agent execution, so if an API failure or server restart occurs, execution can resume from where it left off. Particularly useful for long-running batch jobs and complex multi-step agents.

The Most Common Mistakes in Production

Creating an agent without result_type — Type safety is the core of PydanticAI, but omitting result_type means you're just getting a string back, no different from a regular LLM call. It's best to design the BaseModel first.
Writing vague tool docstrings — The docstring is the basis on which the LLM decides which tool to use in which situation. Rather than vague descriptions like "retrieves data," writing something specific like "retrieves sales volume for the Seoul region in Q4 2024 within a given date range" improves tool selection accuracy.
Passing DB connections through global variables instead of using dependency injection — It seems convenient at first, but it makes it impossible to swap out the real DB for a mock in test code. Using deps_type and RunContext from the start saves a lot of pain later.

Closing Thoughts

Instead of "trust and parse" LLM responses, "prove and use" them — this is the shift PydanticAI proposes. If you're in a domain where data integrity matters (finance, healthcare, security) or you're a team that wants to build agents quickly with a testable structure, it's worth trying.

Three steps to get started right now:

pip install pydantic-ai — installation is a single line.
Define one BaseModel and run Agent(result_type=YourModel) to immediately get a feel for LLM responses being bound to types.
If you have existing LangChain code, re-implement the simplest chain and directly compare the difference in code volume and how early type errors are caught.

References

#PydanticAI#Python#타입안전성#AI에이전트#LLM#의존성주입#Pydantic#FunctionCalling#LangGraph#구조화출력

Building Type-Safe AI Agents with PydanticAI — How We Caught 23 Bugs Before Production | DEV BAK - 기술블로그

Building Type-Safe AI Agents with PydanticAI — How We Caught 23 Bugs Before Production

Core Concepts

Type-Safe Output: Enforcing LLM Responses into Pydantic Models

python

from pydantic import BaseModel
from pydantic_ai import Agent
 
class ResponseModel(BaseModel):
    answer: str
    confidence: float
 
agent = Agent(
    model="openai:gpt-4o",
    result_type=ResponseModel,
    system_prompt="You are a helpful assistant."
)
 
result = agent.run_sync("What is the capital of France?")
print(result.data.answer)      # "Paris"
print(result.data.confidence)  # 0.99

In async environments, you can use await agent.run() instead of agent.run_sync(). In this article, we use run_sync() in synchronous contexts and await agent.run() inside async functions. If Python async patterns are unfamiliar to you, run_sync() is more than enough to get started.

Internally, structured output is handled through three paths:

Path	How It Works	Best For
Tool call-based extraction	LLM returns data as a function call	Models with Function Calling support like OpenAI, Anthropic
Provider-managed JSON schema	Uses the model API's `response_format`	Modern models with JSON Mode support
Prompt-injected formatting	Injects schema into the system prompt	Fallback for models without Function Calling support

Pydantic BaseModel is the base class of Pydantic, a Python data validation library. Declare fields with type hints on the class, and type validation and coercion happen automatically at instance creation time.

Dependency Injection: Cleanly Separating Test and Production Code

python

from dataclasses import dataclass
from pydantic_ai import Agent, RunContext
 
@dataclass
class Deps:
    db_conn: DatabaseConnection  # Replace with your actual DB client
    api_key: str
 
agent = Agent(
    model="anthropic:claude-3-5-sonnet-latest",
    deps_type=Deps
)
 
@agent.tool
async def get_user_data(ctx: RunContext[Deps], user_id: str) -> dict:
    return await ctx.deps.db_conn.fetch(
        "SELECT * FROM users WHERE id = $1", user_id
    )
 
# Production run
result = await agent.run(
    "Get data for user 123",
    deps=Deps(db_conn=real_db, api_key="prod-key")
)
 
# Test run — just swap in a mock object
result = await agent.run(
    "Get data for user 123",
    deps=Deps(db_conn=mock_db, api_key="test-key")
)

Tool Registration: Turning Python Functions into the LLM's Hands and Feet

With just the @agent.tool decorator, you can register an ordinary Python function as a tool the LLM can call. The LLM reads the function's docstring and type hints to decide when to use that tool.

python

@agent.tool
async def query_sales(ctx: RunContext[Deps], region: str) -> list[dict]:
    """Retrieves sales data for the specified region.
    
    Args:
        region: The name of the region to query (e.g., 'seoul', 'busan')
    """
    return await ctx.deps.db_conn.fetch(
        "SELECT * FROM sales WHERE region = $1", region
    )
 
@agent.tool
async def calculate_growth_rate(
    ctx: RunContext[Deps],
    current: float,
    previous: float
) -> float:
    """Calculates the growth rate compared to the previous period."""
    return (current - previous) / previous * 100

Function Calling (tool calling): A feature that instructs the LLM to call a specific function instead of generating text. If the LLM determines that "answering this question requires a DB query," it calls the query_sales function, receives the result, and then generates the final response.

Let's look at how these three mechanisms combine in practice using financial domain code.

Real-World Application

Example 1: Financial Data Analysis Agent

python

from pydantic import BaseModel, Field
from pydantic_ai import Agent, RunContext
from dataclasses import dataclass
from typing import Optional
 
class SalesReport(BaseModel):
    region: str
    total_revenue: float = Field(ge=0, description="Total revenue (KRW)")
    growth_rate: float = Field(description="Growth rate vs. previous period (%)")
    top_product: str
    risk_level: str = Field(pattern="^(low|medium|high)$")
    summary: str
    action_items: list[str]
 
@dataclass
class AnalyticsDeps:
    db_conn: DatabaseConnection  # Replace with your actual DB client
    report_date: str
 
agent = Agent(
    model="openai:gpt-4o",
    result_type=SalesReport,
    deps_type=AnalyticsDeps,
    system_prompt="""
    You are a financial analyst. Analyze sales data and provide 
    structured reports with actionable insights.
    """
)
 
@agent.tool
async def get_regional_sales(
    ctx: RunContext[AnalyticsDeps],
    region: str
) -> dict:
    """Retrieves sales data for a specific region."""
    return await ctx.deps.db_conn.fetch(
        "SELECT * FROM sales WHERE region = $1 AND date = $2",
        region, ctx.deps.report_date
    )
 
@agent.tool
async def get_previous_period(
    ctx: RunContext[AnalyticsDeps],
    region: str
) -> Optional[dict]:
    """Retrieves comparison data from the previous period."""
    return await ctx.deps.db_conn.fetch_one(
        "SELECT total_revenue FROM sales WHERE region = $1 AND date < $2 ORDER BY date DESC LIMIT 1",
        region, ctx.deps.report_date
    )
 
# Run
deps = AnalyticsDeps(db_conn=db, report_date="2026-05-01")
result = await agent.run("Analyze May sales performance for the Seoul region", deps=deps)
 
print(result.data.risk_level)    # Guaranteed to be one of "low", "medium", "high"
print(result.data.growth_rate)   # Guaranteed to be float type
print(result.data.action_items)  # Guaranteed to be list[str] type

The validation options on each field are what make this work:

Code Element	Role
`Field(ge=0)`	Blocks negative revenue at the source
`Field(pattern="^(low\|medium\|high)$")`	Only passes allowed values (low, medium, high)
`result_type=SalesReport`	Validates the entire LLM output through Pydantic
`deps_type=AnalyticsDeps`	Decouples DB connection from agent logic

Example 2: Human-in-the-Loop Approval Workflow

python

from pydantic import BaseModel
from pydantic_ai import Agent
 
class TransferRequest(BaseModel):
    from_account: str
    to_account: str
    amount: float
    reason: str
    requires_approval: bool
 
agent = Agent(
    model="anthropic:claude-3-5-sonnet-latest",
    result_type=TransferRequest,
    system_prompt="""
    Process transfer requests. For amounts over 1,000,000 KRW,
    set requires_approval to True.
    """
)
 
result = await agent.run(
    "Process a transfer of 1.5 million KRW from account A to account B"
)
 
transfer = result.data
if transfer.requires_approval:
    # The point of human intervention is clearly expressed through types
    await request_human_approval(transfer)
else:
    await execute_transfer(transfer)

Pros and Cons Analysis

Advantages

Item	Details
Type safety	23 additional type errors caught during development (compared to equivalent implementations in LangGraph and CrewAI)
Code conciseness	PydanticAI 160 lines vs. LangGraph 280 lines vs. CrewAI 420 lines for equivalent functionality
Testability	Dependency injection structure enables unit testing without real APIs
Model agnostic	Supports all major models including OpenAI, Anthropic, Gemini, DeepSeek
Durability	Durable Agent support that preserves execution state through API failures and restarts
Free open source	MIT license, no additional cost beyond LLM API fees

Disadvantages and Caveats

Item	Details	Mitigation
Multi-agent limitations	No built-in support for role-based multi-agent systems	Mix with CrewAI or LangGraph
Ecosystem size	Relatively smaller third-party ecosystem compared to LangChain's 300+ integrations	Expand external tool connectivity via MCP integration
Complex state management	LangGraph-level checkpointing and workflow state management requires manual implementation	Combine with LangGraph for complex workflows
Community resources	Fewer references and examples compared to LangChain	High-quality official documentation compensates

Durable Agent: A feature that saves the in-progress state to a checkpoint during agent execution, so if an API failure or server restart occurs, execution can resume from where it left off. Particularly useful for long-running batch jobs and complex multi-step agents.

The Most Common Mistakes in Production

Creating an agent without result_type — Type safety is the core of PydanticAI, but omitting result_type means you're just getting a string back, no different from a regular LLM call. It's best to design the BaseModel first.
Writing vague tool docstrings — The docstring is the basis on which the LLM decides which tool to use in which situation. Rather than vague descriptions like "retrieves data," writing something specific like "retrieves sales volume for the Seoul region in Q4 2024 within a given date range" improves tool selection accuracy.
Passing DB connections through global variables instead of using dependency injection — It seems convenient at first, but it makes it impossible to swap out the real DB for a mock in test code. Using deps_type and RunContext from the start saves a lot of pain later.

Closing Thoughts

Three steps to get started right now:

pip install pydantic-ai — installation is a single line.
Define one BaseModel and run Agent(result_type=YourModel) to immediately get a feel for LLM responses being bound to types.
If you have existing LangChain code, re-implement the simplest chain and directly compare the difference in code volume and how early type errors are caught.

References

#PydanticAI#Python#타입안전성#AI에이전트#LLM#의존성주입#Pydantic#FunctionCalling#LangGraph#구조화출력

Core Concepts

Type-Safe Output: Enforcing LLM Responses into Pydantic Models

Dependency Injection: Cleanly Separating Test and Production Code

Tool Registration: Turning Python Functions into the LLM's Hands and Feet

Real-World Application

Example 1: Financial Data Analysis Agent

Example 2: Human-in-the-Loop Approval Workflow

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Production

Closing Thoughts

References

Core Concepts

Type-Safe Output: Enforcing LLM Responses into Pydantic Models

Dependency Injection: Cleanly Separating Test and Production Code

Tool Registration: Turning Python Functions into the LLM's Hands and Feet

Real-World Application

Example 1: Financial Data Analysis Agent

Example 2: Human-in-the-Loop Approval Workflow

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Production

Closing Thoughts

References

Recommended Posts

vLLM vs SGLang Performance Comparison — Choosing an Inference Engine Through the Lens of 2026 KV Cache Architecture

Cut LLM API Costs by Up to 80% — 5 Optimization Strategies Proven in GPT-4o & Claude Production

How to Specialize 7B·70B Models on a Single GPU — LoRA·QLoRA·PEFT Principles and Practical Code

MCP (Model Context Protocol) Connects Tools, A2A (Agent-to-Agent Protocol) Connects Agents: Division of Roles and Adoption Criteria in Multi-Agent Architecture

Hermes Agent: A Self-Improving AI Agent That Retains Learning Across Sessions

Building an AI Agent Monitoring & Evaluation System: Catching Quality That Silently Breaks in Production with DeepEval and Langfuse