Implementing Multi-Agent Orchestration with LangGraph · A2A — From Shared Memory Design to Production Code
If you've spent enough time with Claude Code, you'll eventually hit a wall. Hand it a complex refactoring job and the context window fills up; try running test writing and documentation in parallel and the agent starts mixing everything together. At first I thought, "Maybe I just need to write better prompts?" — but in practice, as projects scale up, cramming everything into a single agent clearly has its limits.
That's why multi-agent orchestration is getting so much attention lately. It's an architecture where each agent maintains its own memory and role, sharing only the information it needs to collaborate. By the end of this article, you'll have hands-on experience building a 3-agent pipeline with LangGraph and implementing agent capability discovery using the A2A protocol.
To be honest, there are plenty of cases where people adopt multi-agent systems because they "look cool," only to get hit with a massive token bill. So this article doesn't just introduce the technology — it also covers the real-world tradeoffs. If you've used LangChain or LangGraph even once as a backend developer, you can follow right along. If this is your first time, skimming the LangGraph official docs Quick Start first will make things smoother.
Core Concepts
What Is Multi-Agent Orchestration
Multi-agent orchestration is an architecture where multiple AI agents each independently maintain their own memory and context while collaborating toward a shared goal. Think of it like a team project: each team member has their own laptop and area of expertise, and they exchange key information through shared documents like Notion or Confluence.
The three core components of this structure are as follows.
| Concept | Description | Analogy |
|---|---|---|
| Private Memory | Independent memory optimized for each agent's role | A team member's personal laptop |
| Shared Context | A shared store that all agents read from and write to | The team's shared document |
| Orchestrator | A higher-level agent that coordinates the overall flow and spawns agents | The team lead |
Private Memory (Independent Agent Memory): Each agent separately maintains short-term and long-term memory specialized for its own role. It cannot directly access the internal state of other agents, and must communicate only through designated channels.
⚠️ Only use this when: Multi-agent shines when you have 2 or more subtasks that can be parallelized independently, and each task requires different domain knowledge. Conversely, for sequential and simple tasks, attaching a good prompt to a single agent is far more economical.
MCP and A2A — Two Easy-to-Confuse Protocols
When I first encountered these, I thought "aren't they both agent-related protocols?" — but their roles are clearly distinct.
[Agent A] ──A2A──▶ [Agent B]
│ │
MCP MCP
│ │
▼ ▼
[Tool/DB/API] [Tool/DB/API]- MCP (Model Context Protocol, Anthropic): The connection layer between agents and tools/data sources. The interface through which agents communicate with the outside world.
- A2A (Agent2Agent Protocol, Google): The agent-to-agent communication layer. Provides a standardized message format for agents to discover capabilities, negotiate tasks, and synchronize state.
Key Insight: MCP and A2A are not competing — they're complementary. You use MCP to connect tools and A2A for agents to negotiate with each other.
When Google announced A2A in April 2025, it made waves with over 50 partners including Microsoft, Salesforce, SAP, and Atlassian joining at launch. It was later incorporated into the Linux Foundation as an open-source project, securing vendor neutrality. That said, as we move through the second half of 2025, most of the agent ecosystem appears to be converging around MCP. A2A continues to show strength in long-running workflow and human-in-the-loop scenarios.
Shared Memory Architecture — Two Key Patterns
When multiple agents share memory, the most critical concern is consistency. It's a situation you'll frequently encounter in production: when two agents simultaneously modify the same shared state, conflicts arise. Let's go over two patterns that are widely used in practice.
| Pattern | Approach | Pros | Cons |
|---|---|---|---|
| Serialized Turns | Agents access shared memory in sequence | Simple to implement, no conflicts | Sacrifices parallelism |
| Semantic Locking | Query vector DB for "already known" facts before writing to prevent duplicate writes | Prevents redundant analysis | Query overhead |
A third option is the bipartite access graph (dynamically connecting users, agents, and resources in a graph to apply fine-grained read/write policies), but honestly, it's almost never used in production — it's usually over-engineering. If you need an advanced architecture, the Collaborative Memory paper is worth a look.
Reducer Pattern: A function that defines how to merge updates when multiple agents simultaneously update state. In LangGraph, declaring
Annotated[list, operator.add]automatically accumulates items into the list. It's the simplest and safest way to prevent concurrent write conflicts.
The Blackboard Architecture also deserves mention. It's a pattern where agents read from and write to a shared chalkboard (the blackboard) to collaboratively solve a problem, with the orchestrator dynamically deciding which agent to activate next based on the blackboard state. There are real research examples of software design systems where 9 specialized agents collaborate using a blackboard-based approach.
Practical Application
Example 1: Parallel Codebase Processing with Claude Code
First, let's implement the core structure of a multi-agent system directly in Python — an orchestrator spawning worker agents and sharing state via a blackboard. Since this is written with the vanilla anthropic SDK without LangGraph, the structure should be easier to grasp.
# Claude Code Multi-Agent — Orchestrator + Worker Agent Configuration Example
import anthropic
import concurrent.futures
client = anthropic.Anthropic()
# Orchestrator: analyzes the overall task and breaks it into subtasks
orchestrator_prompt = """
당신은 소프트웨어 엔지니어링 팀의 오케스트레이터입니다.
주어진 코드베이스 변경 요청을 분석하고, 다음 세 에이전트에게 작업을 분배합니다:
- test_agent: 테스트 케이스 작성 담당
- refactor_agent: 코드 리팩토링 담당
- doc_agent: 문서화 담당
각 에이전트는 독립된 컨텍스트를 가지며, blockers 리스트를 통해 이슈를 공유합니다.
"""
def spawn_agent(role: str, task: str, shared_context: dict) -> str:
agent_prompts = {
"test_agent": "당신은 테스트 전문가입니다. 주어진 코드의 엣지 케이스를 중심으로 테스트를 작성해 주세요.",
"refactor_agent": "당신은 리팩토링 전문가입니다. 가독성과 성능을 함께 고려해 코드를 개선해 주세요.",
"doc_agent": "당신은 기술 문서 전문가입니다. 개발자가 바로 이해할 수 있는 문서를 작성해 주세요.",
}
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
system=agent_prompts[role],
messages=[
{
"role": "user",
"content": f"작업: {task}\n\n공유 컨텍스트:\n{shared_context}"
}
]
)
return response.content[0].text
# Shared context (acts as the blackboard)
shared_context = {
"codebase_summary": "Python FastAPI 기반 REST API 서버",
"changed_files": ["src/user_service.py", "src/auth.py"],
"blockers": [] # 에이전트들이 공유하는 블로커 리스트
}
tasks = {
"test_agent": "user_service.py의 유닛 테스트 작성",
"refactor_agent": "auth.py의 토큰 검증 로직 리팩토링",
"doc_agent": "변경된 API 엔드포인트 문서화"
}
# Run three agents in parallel
results = {}
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
futures = {
executor.submit(spawn_agent, role, task, shared_context): role
for role, task in tasks.items()
}
for future in concurrent.futures.as_completed(futures):
role = futures[future]
results[role] = future.result()| Component | Role | Memory Scope |
|---|---|---|
orchestrator_prompt |
Overall task decomposition and agent assignment | Full codebase context |
spawn_agent() |
Executes role-specific agents | Individual independent context |
shared_context |
Acts as the blackboard, shared state | Read/write by all agents |
blockers |
Issue-sharing channel between agents | Simulates inter-agent mailbox |
Example 2: LangGraph-Based Multi-Agent System with Shared Vector Memory
Where Example 1 demonstrated "parallel agent execution" itself, this time we express how agents structurally read from and write to shared state using LangGraph's StateGraph. In specialized domains like finance or legal, a valid pattern is having each agent maintain role-specific embedding vector stores while recording and querying facts to a shared vector DB (Qdrant, Weaviate, etc.). It prevents redundant analysis and lets agents reuse facts already discovered — quite effective in production.
# LangGraph-Based Shared Vector Memory Multi-Agent Example
# Required packages: pip install langgraph qdrant-client sentence-transformers
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
from typing import TypedDict, Annotated
import operator
# Shared state schema definition — the blackboard all agents read from and write to
class SharedState(TypedDict):
query: str
facts: Annotated[list, operator.add] # Reducer: multiple agents can append
analysis_results: Annotated[list, operator.add]
final_answer: str
active_agent: str
# ── The functions below are interfaces that require actual implementation ──
# search_shared_vector_db(query): Query Qdrant collection for similar facts
# store_to_shared_vector_db(facts): Store new facts in Qdrant
# collect_facts(query): Collect new facts via LLM or external API
# load_private_memory(agent_id): Load domain knowledge from agent-specific vector store
# run_analysis(facts, knowledge): Run analysis using collected facts + domain knowledge
# synthesize(results): Synthesize multiple analysis results into a final answer
# ─────────────────────────────────────────────────────────
# Research agent — dedicated to fact collection
def research_agent(state: SharedState) -> dict:
# Query shared vector DB for existing facts (semantic locking pattern)
existing_facts = search_shared_vector_db(state["query"])
if existing_facts:
# Skip redundant analysis if facts are already known
return {"facts": existing_facts, "active_agent": "analysis_agent"}
# Collect new facts and store them in the shared vector DB
new_facts = collect_facts(state["query"])
store_to_shared_vector_db(new_facts)
return {"facts": new_facts, "active_agent": "analysis_agent"}
# Analysis agent — dedicated to fact-based analysis (uses independent long-term memory)
def analysis_agent(state: SharedState) -> dict:
# Load domain knowledge from the agent's own Private Memory
domain_knowledge = load_private_memory("analysis_agent")
analysis = run_analysis(state["facts"], domain_knowledge)
return {"analysis_results": [analysis], "active_agent": "synthesis_agent"}
# Synthesis agent — composes the final answer
def synthesis_agent(state: SharedState) -> dict:
final = synthesize(state["analysis_results"])
return {"final_answer": final, "active_agent": "end"}
# Routing logic — orchestrator decides the next agent based on blackboard state
def route_next(state: SharedState) -> str:
routing_map = {
"analysis_agent": "analysis_agent",
"synthesis_agent": "synthesis_agent",
"end": END
}
return routing_map.get(state["active_agent"], END)
# Graph construction
from langgraph.graph import END # END는 langgraph.graph에서 임포트
workflow = StateGraph(SharedState)
workflow.add_node("research_agent", research_agent)
workflow.add_node("analysis_agent", analysis_agent)
workflow.add_node("synthesis_agent", synthesis_agent)
workflow.set_entry_point("research_agent")
workflow.add_conditional_edges("research_agent", route_next)
workflow.add_conditional_edges("analysis_agent", route_next)
workflow.add_conditional_edges("synthesis_agent", route_next)
# Checkpoint — enables pause and resume, also useful for debugging
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)The key in the code above is the shared state fields declared with Annotated[list, operator.add]. By leveraging this reducer pattern, merging is handled automatically even when multiple agents simultaneously append items to the same list.
Semantic Lock: A pattern where an agent first performs a vector similarity search to check "is there already something similar?" before writing new information to shared memory.
search_shared_vector_dbplays exactly this role. For a real implementation, you can spin up a local Qdrant instance withdocker run -p 6333:6333 qdrant/qdrantand connect via theqdrant-clientlibrary.
Example 3: Implementing Agent Capability Discovery with the A2A Protocol
Where the previous two examples dealt with agent collaboration within a single system, A2A is the protocol for collaborating with external agents across service boundaries. It shines when two independent systems need to cooperate using only standard interfaces — without exposing each other's internal implementation. For example, your company's order agent asking a partner's inventory agent for the optimal order quantity. The TypeScript example below is based on Node.js 18+ (crypto.randomUUID and fetch are both built-in since Node 18).
// A2A Protocol — Agent Card (Capability Advertisement) Example
// Runtime: Node.js 18+ (crypto.randomUUID, fetch built-in)
const inventoryAgentCard = {
name: "inventory-optimizer-agent",
version: "1.0.0",
description: "Dedicated agent for real-time inventory optimization and order prediction",
capabilities: {
streaming: true, // SSE-based streaming supported
pushNotifications: true, // Async notifications supported
stateTransitionHistory: true
},
skills: [
{
id: "optimize-stock-level",
name: "재고 수준 최적화",
description: "현재 재고 데이터를 바탕으로 최적 발주량 계산",
inputModes: ["application/json"],
outputModes: ["application/json", "text/plain"]
},
{
id: "predict-demand",
name: "수요 예측",
description: "과거 판매 데이터 기반 수요 예측",
inputModes: ["application/json"],
outputModes: ["application/json"]
}
]
}
interface ProductData {
productId: string
currentStock: number
salesHistory: number[]
}
interface OptimizationResult {
recommendedOrderQuantity: number
predictedDemand: number
}
// A2A client — delegates tasks to another agent
async function delegateToInventoryAgent(
productData: ProductData,
agentEndpoint: string
): Promise<OptimizationResult> {
const taskRequest = {
id: crypto.randomUUID(),
message: {
role: "user",
parts: [{
type: "data",
data: productData
}]
}
}
// A2A cooperates through standard interfaces without exposing internal memory or tools
const response = await fetch(`${agentEndpoint}/tasks/send`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(taskRequest)
})
return await response.json() as OptimizationResult
}Reports indicate that food companies like Tyson Foods and Gordon Food Service are using similar patterns for supply chain optimization. Each company's inventory agent keeps its internal state hidden while exchanging product data and optimization information only through standardized channels.
Pros and Cons Analysis
Advantages
| Item | Details |
|---|---|
| Parallel Processing | Independent agents execute concurrently, significantly reducing overall processing time |
| Role Specialization | Each agent maintains domain-specific memory, improving accuracy |
| Isolation | Internal state is protected without memory contamination between agents |
| Scalability | Functionality can be extended simply by adding new agents, with minimal impact on existing workflows |
| Standardization | A2A/MCP enables vendor-neutral agent composition |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Token Cost | A 3-agent team consumes 2.5–4x more tokens than a single agent | Minimize the number of agents; apply parallelization only to tasks that genuinely need it |
| Shared Memory Bottleneck | Centralized shared memory risks throughput bottlenecks and single points of failure | Use sharding or distributed vector DBs; add a read cache layer |
| Consistency Issues | Conflicts can occur when multiple agents simultaneously modify shared state | Choose from reducer pattern, semantic locking, or serialized turns based on the situation |
| Orchestration Complexity | Coordination logic grows exponentially as the number of agents increases | Start with 3 or fewer agents and scale gradually |
| Debugging Difficulty | Tracing and reproducing distributed agent behavior is harder than with a single agent | LangSmith, checkpoint-based reproduction, and structured logging are essential |
| A2A Ecosystem Uncertainty | As of H2 2025, the ecosystem is converging on MCP; A2A adoption is slowing | Limit A2A to long-running workflows; prioritize MCP for general-purpose connectivity |
In practice, the third item — consistency issues — is the one that trips people up most often. It looks simple at first, but as you add more agents, tracking who touched the shared state and when becomes increasingly difficult. Getting the reducer pattern right early on makes a big difference down the line.
The Most Common Mistakes in Production
-
Creating too many agents from the start — It's far safer to begin with 3 or fewer agents, confirm where parallelization is actually needed, and then expand. More agents simultaneously increases token costs and coordination complexity.
-
Stuffing too much information into the shared context — Not every agent needs to share all information. It helps to ask "does this agent actually need this information?" every time. When the shared context grows bloated, token consumption increases for all agents at once.
-
Deferring observability to later — When one agent produces a wrong result, if you can't trace which agent produced what output from what input, debugging becomes nearly impossible. It's better to attach tracing tools like LangSmith and checkpoint storage from the very beginning.
Closing Thoughts
Multi-agent orchestration isn't about "using multiple agents" — it's a design philosophy of each agent guarding its own memory while sharing only the information that's truly necessary.
Three steps you can start right now:
-
Build a 2-agent pipeline with LangGraph — After installing with
pip install langgraph, try building a simple pipeline with justresearch_agentandanalysis_agentfrom the example code above. You'll get hands-on intuition for shared state and routing concepts. -
Experiment with shared vector memory using a local Qdrant instance — Spin one up locally with
docker run -p 6333:6333 qdrant/qdrant, then implementsearch_shared_vector_dbandstore_to_shared_vector_dbfrom Example 2 as real Qdrant calls. You'll get a direct feel for the semantic locking pattern. -
Attach an A2A Agent Card to your own service — Reference the
inventoryAgentCardstructure from Example 3 and define your agent's capabilities as JSON. When the time comes to collaborate with an external agent, you'll have a ready foundation.
When it comes to resolving shared memory conflicts, the final arbiter must ultimately be a human — in the next article, we'll cover LangGraph's Human-in-the-Loop pattern: interrupt design strategies where agents request human approval before making critical decisions.
Next Article: LangGraph's Human-in-the-Loop Pattern — Interrupt design strategies where agents request human approval before making critical decisions
References
- Announcing the Agent2Agent Protocol (A2A) - Google Developers Blog
- What Is Agent2Agent (A2A) Protocol? | IBM
- A2A Protocol Explained: Secure Interoperability for Agentic AI 2026 - OneReach
- Empowering multi-agent apps with the open A2A protocol - Microsoft Cloud Blog
- MCP vs A2A: Protocols for Multi-Agent Collaboration 2026 - OneReach
- GitHub - a2aproject/A2A
- What happened to Google's A2A? - fka.dev
- Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems - arXiv
- Collaborative Memory: Multi-User Memory Sharing in LLM Agents - arXiv
- A-MEM: Agentic Memory for LLM Agents - arXiv
- AI Agent Memory: Comparative Analysis of LangGraph, CrewAI, AutoGen - DEV Community
- Claude Code Agent Teams: Multi-Agent Development Guide - Lushbinary
- Multi-Agent Coordination Patterns: Architectures Beyond the Hype - Medium
- Building Intelligent Multi-Agent Systems with MCPs and the Blackboard Pattern - Medium