LangGraph Supervisor Pattern in Production — Orchestrating Multiple AI Agents and Tracking Responsibility
Some complex problems are hard to solve with a single AI agent. When you need to search the web, perform math calculations, and execute code all at once, what happens if you cram everything into one agent? I tried that at first — prompts piled up like a mountain, and I couldn't even trace where things went wrong. That's when I discovered LangGraph's Supervisor pattern.
By the end of this post, you'll be able to split complex requests — the kind a single agent can't handle — into specialized agents by role, and trace their interactions at the code level. Specifically, we'll build a working multi-agent system using create_supervisor and create_handoff_tool, and visually confirm the handoff flow with LangSmith. The core of this post is understanding why LangGraph differs from a plain LangChain chain or ReAct agent, and how the Supervisor pattern leverages that difference.
Prerequisites: Basic Python syntax and some LangChain experience are enough. Async (
asyncio) only appears in the parallel processing example, and that section is clearly marked.
Core Concepts
How Is This Different from LangChain Chains and ReAct Agents?
The first question that comes to mind when encountering LangGraph is: "Can't I just use a LangChain chain?"
LangChain chains (LCEL) are optimized for sequential pipelines like A → B → C. The flow at each step is fixed, and the code quickly becomes complex when you need conditional branching or loops. ReAct agents have one agent repeatedly calling tools — but when roles need to be separated or work needs to be handed off to another agent, there are structural limitations.
LangGraph addresses both of these with a graph-based workflow. Processing steps are expressed as nodes (functions or agents) and edges (flow direction), making conditional branching, loops, and parallel execution natural to express.
Graph-based workflow: An approach where processing steps are represented as a directed graph. Each node performs a specific task, and edges determine which node to flow to next. Unlike a simple sequential pipeline, the flow can vary depending on the results of previous steps.
The 3-Layer Structure of the Supervisor Pattern
The Supervisor pattern implements layer separation on top of this graph. It works much like a well-organized development team.
Planner Layer → Supervisor (what to whom?)
Agents Layer → Worker Agents (handles each specialized domain)
Tooling Layer → Tools / APIs (actual execution units)- Supervisor: An orchestrator that analyzes user requests and decides which agent to delegate to. The LLM makes the routing decisions.
- Sub-Agent (Worker): Agents specialized in specific domains such as search, math calculations, or code execution. They focus only on their own tasks.
- Shared State: A central state object maintained across the entire graph. Agents exchange information through it.
Understanding Shared State Properly
I want to pause here for a moment. Shared State might seem like "just a dictionary" at first, but there's an important detail in the actual implementation.
from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[list, add_messages] # reducer 지정
current_task: strThe Annotated[list, add_messages] part is key. When multiple nodes update messages simultaneously, you specify how to merge them using the reducer called add_messages. If you skip this and just declare it as list, later nodes' updates will overwrite earlier nodes' results — a disaster I spent a long time debugging myself.
When using create_supervisor, State definition is handled internally, so you won't need to worry about this directly. But when building a custom graph or adding fields to State, you need to know this rule.
Handoff Tool and Responsibility Tracking
Honestly, my first thought was "can't agents just call each other's functions?" — but the real value of create_handoff_tool lies in automatic audit trails.
from langgraph_supervisor import create_handoff_tool
handoff_to_math = create_handoff_tool(
agent_name="math_expert",
description="수학 계산이 필요할 때 사용"
)Internally, when this tool is called, LangGraph appends the previous history as a ToolMessage to the messages history in the shared State. Because tool_call_id and the agent name are recorded together, the flow is visible at a glance when you look at it in LangSmith later.
[메시지 히스토리 예시]
User → "57의 제곱근을 계산하고 결과를 한국어로 설명해줘"
Supervisor → [ToolMessage] math_expert로 이전 (tool_call_id: abc123)
math_expert → [AIMessage] "57의 제곱근은 약 7.55입니다"
Supervisor → [ToolMessage] writer_expert로 이전 (tool_call_id: def456)
writer_expert → [AIMessage] "57의 제곱근은 7.55 정도예요. 쉽게 말하면..."If you connect agents with simple function calls, this tracking information is never recorded. It becomes very difficult to later figure out "which agent produced this result?"
Audit Trail: A chronologically ordered log of who performed what action and when. In multi-agent systems, it's directly used for debugging and identifying accountability.
Practical Application
Now that we have the concepts down, let's go straight to code. Example 1 is a minimal, complete example you can copy-paste and run immediately. Example 2 builds on it by adding escalation logic in a progressive way.
Example 1: Expert Panel System — Minimal Runnable Configuration
The fastest way is to look at code that actually works. Here's a structure where a Supervisor coordinates two specialists: one for math and one for writing.
import os
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langgraph_supervisor import create_supervisor
# 더미 도구 정의 — 실제로는 외부 API 호출 등으로 교체
@tool
def calculator(expression: str) -> str:
"""수식을 계산합니다. 예: '3 ** 2 + 1'"""
try:
return str(eval(expression)) # 실제 프로덕션에서는 안전한 파서 사용
except Exception as e:
return f"계산 오류: {e}"
@tool
def get_weather(city: str) -> str:
"""도시의 날씨 정보를 반환합니다."""
# 실제 구현에서는 외부 날씨 API 호출
return f"{city}의 현재 기온은 22°C, 맑음입니다."
llm = ChatOpenAI(model="gpt-4o", api_key=os.environ["OPENAI_API_KEY"])
# 수학 전문가 에이전트
math_expert = create_react_agent(
model=llm,
tools=[calculator],
name="math_expert", # 핸드오프 추적의 키. 공백 없이 snake_case로
prompt="당신은 수학 전문가입니다. 계산 문제만 처리합니다."
)
# 날씨 전문가 에이전트
weather_expert = create_react_agent(
model=llm,
tools=[get_weather],
name="weather_expert",
prompt="당신은 날씨 정보 전문가입니다."
)
# 글쓰기 전문가 에이전트 (도구 없이 LLM만 활용)
writer_expert = create_react_agent(
model=llm,
tools=[],
name="writer_expert",
prompt="당신은 글쓰기 전문가입니다. 명확하고 친절한 설명을 작성합니다."
)
# Supervisor 생성 및 컴파일
supervisor = create_supervisor(
agents=[math_expert, weather_expert, writer_expert],
model=llm,
prompt=(
"당신은 팀 매니저입니다. "
"요청을 분석하고 가장 적합한 전문가에게 위임하세요. "
"복합 요청은 여러 전문가를 순차적으로 활용할 수 있습니다."
)
).compile()
# 실행
result = supervisor.invoke({
"messages": [
{"role": "user", "content": "오늘 서울 날씨 알려주고, 기온의 제곱근도 계산해줘"}
]
})
print(result["messages"][-1].content)A few points to note while reading the code:
| Component | Role | Key Point |
|---|---|---|
create_react_agent |
Creates individual worker agents | The name parameter is the key for handoff tracking. snake_case recommended |
create_supervisor |
Creates the Supervisor agent | The agents list specifies the agents to manage |
.compile() |
Compiles into a LangGraph graph | Edges and nodes are finalized at this point |
.invoke() |
Synchronous execution | Use .stream() if streaming is needed |
When given the compound request "the square root of Seoul's temperature," the Supervisor first delegates to weather_expert, receives the temperature value, then delegates again to math_expert. This entire flow is recorded in the message history.
Example 2: Customer Support System — Adding Escalation
This is a version of Example 1's basic structure evolved for real-world use. It's a situation you encounter often in practice: a system that classifies customer requests and routes them to a billing team, technical team, or escalation team.
The key difference from Example 1 is that explicit routing rules are included in the Supervisor's prompt. The LLM decides which agent to route to based on the rules, and since the handoff history is preserved, you can later trace "why was this case escalated?"
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from langgraph_supervisor import create_supervisor
# AWS Bedrock 환경이라면 아래처럼 LLM만 교체하면 나머지 코드는 동일
# from langchain_aws import ChatBedrock
# llm = ChatBedrock(model_id="anthropic.claude-3-5-sonnet-20241022-v2:0")
@tool
def get_invoice(customer_id: str) -> str:
"""고객 청구서를 조회합니다."""
return f"고객 {customer_id}의 최근 청구액: 99,000원 (2025-04 기준)"
@tool
def process_refund(customer_id: str, amount: int) -> str:
"""환불을 처리합니다."""
return f"고객 {customer_id}에게 {amount}원 환불 처리 완료"
@tool
def search_kb(query: str) -> str:
"""지식베이스에서 기술 문서를 검색합니다."""
return f"'{query}' 관련 문서: docs.example.com/troubleshoot"
@tool
def create_priority_ticket(description: str) -> str:
"""긴급 티켓을 생성하고 인간 상담사에게 알립니다."""
return f"긴급 티켓 #9999 생성 완료. 담당자에게 알림 전송됨."
billing_agent = create_react_agent(
model=llm,
tools=[get_invoice, process_refund],
name="billing_agent",
prompt="청구 관련 문의만 처리합니다. 환불, 청구서 조회 등을 담당합니다."
)
technical_agent = create_react_agent(
model=llm,
tools=[search_kb],
name="technical_agent",
prompt="기술적 문제 해결을 담당합니다. 해결이 어렵다면 에스컬레이션을 권장합니다."
)
escalation_agent = create_react_agent(
model=llm,
tools=[create_priority_ticket],
name="escalation_agent",
prompt="긴급하거나 복잡한 케이스를 인간 상담사에게 연결합니다."
)
support_supervisor = create_supervisor(
agents=[billing_agent, technical_agent, escalation_agent],
model=llm,
prompt="""
고객 지원 매니저입니다. 요청 유형에 따라 적절한 에이전트에 위임하세요:
- 청구/결제 문의 → billing_agent
- 기술적 문제 → technical_agent
- 긴급하거나 동일 문의가 반복되는 경우 → escalation_agent
항상 고객에게 친절하게 응대하세요.
"""
).compile()One thing to watch out for here: the value of the name parameter must exactly match the name used in the Supervisor prompt for routing to work correctly. billing_agent and billing agent (with a space) can be recognized differently.
Example 3: Parallel Document Processing — Building Directly with StateGraph (No Supervisor)
This example uses a different pattern from the previous two. It builds directly with StateGraph without a Supervisor, for when you need finer-grained control. You should be careful not to confuse it with the Supervisor pattern.
This is a scatter-gather pattern that splits a long document into sections, has multiple agents process them simultaneously, and merges the results. This example directly uses asyncio, so it's intended for those comfortable with asynchronous programming.
import asyncio
from typing import TypedDict, List, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o")
class DocumentState(TypedDict):
document: str
sections: List[str]
summaries: List[str]
final_report: str
async def summarize_section(section: str) -> str:
"""섹션 하나를 요약합니다."""
response = await llm.ainvoke(f"다음 내용을 2문장으로 요약해주세요:\n\n{section}")
return response.content
def split_document(state: DocumentState) -> dict:
sections = [s.strip() for s in state["document"].split("\n\n") if s.strip()]
return {"sections": sections}
async def process_sections_parallel(state: DocumentState) -> dict:
# asyncio.gather로 여러 섹션을 병렬 요약
# LangGraph는 async def 노드를 지원하므로 그대로 await 가능
tasks = [summarize_section(section) for section in state["sections"]]
summaries = await asyncio.gather(*tasks)
return {"summaries": list(summaries)}
def merge_results(state: DocumentState) -> dict:
final = "\n\n".join(
f"[섹션 {i+1}] {summary}"
for i, summary in enumerate(state["summaries"])
)
return {"final_report": final}
# 그래프 구성
graph = StateGraph(DocumentState)
graph.add_node("split", split_document)
graph.add_node("process", process_sections_parallel)
graph.add_node("merge", merge_results)
graph.add_edge(START, "split")
graph.add_edge("split", "process")
graph.add_edge("process", "merge")
graph.add_edge("merge", END)
pipeline = graph.compile()
# 실행 (async 환경에서)
async def main():
result = await pipeline.ainvoke({
"document": "첫 번째 섹션 내용...\n\n두 번째 섹션 내용...\n\n세 번째 섹션 내용..."
})
print(result["final_report"])Note that in actual production parallel processing with LangGraph, using the
Send API(from langgraph.types import Send) enables more precise fan-out control. Theasyncio.gatherapproach is suitable for simple parallelization, while the Send API is more flexible when the number of agents changes dynamically.
Pros and Cons
Advantages
| Item | Details |
|---|---|
| Explicit state management | Consistency of context between agents is guaranteed via the shared State object |
| Audit trail | Every Handoff Tool call is recorded in the message history, making debugging and accountability easy |
| Conditional branching | Complex business logic (if/else, loops) can be naturally expressed as a graph |
| Parallel processing | Running independent agents simultaneously reduces overall processing time |
| Ecosystem support | LangChain integration, AWS/Google Cloud connectivity, LangSmith monitoring — all well-established |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Steep learning curve | You need to understand graph, State, and Handoff concepts simultaneously | Start with the official tutorial and approach progressively: single agent → 2 agents |
| LangChain dependency | LangChain API changes frequently, creating migration overhead | Use the langgraph-supervisor official package and pin versions in requirements.txt |
| Distributed system complexity | At scale, state synchronization, network latency, and memory spikes become issues | Use LangSmith to identify bottlenecks first, then apply distributed processing only where truly needed |
| Debugging difficulty | When multiple agents are intertwined, finding where things went wrong is hard | Always integrate LangSmith tracing, and assign a clear name to each agent |
| Infrastructure burden | Orchestration, deployment, and governance must be built from scratch | Early on, consider using managed services like LangGraph Cloud or AWS Bedrock |
LangSmith: The official tracing and evaluation platform provided by LangChain. It lets you visually confirm the order in which agents executed and how long each step took. It's essentially a requirement for multi-agent systems.
The Most Common Mistakes in Practice
In my experience, these three trip people up the most.
-
Assigning too many roles to a single agent — Many people try to build one "all-purpose agent" and end up throwing away all the benefits of the Supervisor pattern. The more an agent focuses on a single role, the higher the routing accuracy and the easier debugging becomes.
-
Applying the Supervisor pattern to simple tasks — If you have a sequential 3-step pipeline, implementing it as a plain chain is much better. The Supervisor shines when conditional branching and complex routing are needed. Applying it to small problems only increases maintenance costs.
-
Deploying to production without LangSmith — You end up digging through log dumps trying to trace agent interactions by eye. Simply setting
LANGCHAIN_TRACING_V2=trueandLANGCHAIN_API_KEYin your.envautomatically records the execution flow. Setting this up early in development will save you a great deal of trouble later.
Closing Thoughts
If I had to name one thing that changed after applying this pattern to a real project, it's that the time spent asking "where did things go wrong?" dropped significantly. The handoff history is right there in the messages.
Here are 3 steps you can take right now:
-
Start with package installation — Set up the environment with
pip install langgraph langgraph-supervisor langchain-openaiand try copy-pasting the Example 1 code and running it directly. All you need is anOPENAI_API_KEY. -
Experiment by changing the
promptincreate_supervisor— Try removing or rewriting routing rule sentences in the Supervisor prompt and observe how agent assignment changes. This observation is the fastest way to develop intuition for the pattern. -
Integrate LangSmith tracing — Setting
LANGCHAIN_TRACING_V2=trueandLANGCHAIN_API_KEYin your.envautomatically records the execution flow. The moment you see how handoffs are stamped into the message history, the structure becomes much clearer.
Next post: How to add Human-in-the-Loop to LangGraph — a production pattern for inserting a human review step into agent decisions using
interrupt_beforeandcheckpointer
References
- GitHub - langchain-ai/langgraph-supervisor-py
- LangGraph Reference: create_supervisor API
- langgraph-supervisor · PyPI
- LangGraph: Multi-Agent Workflows | LangChain Blog
- Build multi-agent systems with LangGraph and Amazon Bedrock | AWS Blog
- How Agent Handoffs Work in Multi-Agent Systems | Towards Data Science
- Building Multi-Agent Systems with LangGraph-Supervisor | DEV Community
- How to Continuously Improve Your LangGraph Multi-Agent System | Galileo
- @langchain/langgraph-supervisor | npm