Building a Role-Based Multi-Agent Pipeline with CrewAI and LangGraph
Have you ever tried to solve a complex problem with a single AI agent? The context window quickly fills up, and an agent that must simultaneously act as a researcher, coder, and reviewer ultimately fails to perform any of these tasks properly. The solution to this problem is the Multi-Agent Collaboration Pattern. It is an architecture where multiple agents with distinct roles form a team to handle complex pipelines, and it has established itself as a core paradigm in current practical AI system design.
This article is intended for developers who know basic Python syntax and have used the LLM API at least once. The reason for choosing CrewAI and LangGraph among the numerous agent frameworks is clear. CrewAI is specialized for role-based team organization, providing the most intuitive development experience, while LangGraph offers the most flexible control for production workflows requiring complex branching, looping, and state management. The two frameworks have different strengths and become more powerful when used together.
By the end of this article, you will be able to personally run research automation pipelines with LangGraph, hierarchical code review teams with CrewAI, and cross-framework pipelines combining the two. You will understand the core concepts of each framework and gain criteria for determining which tool is better suited for your project.
Key Concepts
What is Multi-Agent Collaboration?
The Multi-Agent Collaboration Pattern is an architecture where multiple AI agents with different roles share and collaborate on complex tasks through orchestration, instead of a single LLM. It operates based on three core concepts.
| Concept | Explanation |
|---|---|
| Role Specialization | Assign specialized roles such as Researcher, Coder, Reviewer, and Manager to each agent. Specialized agents produce more accurate results in domain tasks than general-purpose agents. |
| Orchestrator-Subagent Pattern | The central orchestrator manages the overall flow and holds the context, while subagents perform delegated detailed tasks. |
| State Management Method | LangGraph explicitly manages shared state as graph nodes and edges, and CrewAI collaborates by passing task outputs to the next agent. |
Core Principle of the Orchestrator-Subagent Pattern: Subagents must be stateless. The orchestrator is responsible for the context. This design is key to preventing context pollution and allowing each agent to focus on its role.
CrewAI: Role-based Agent Team
CrewAI provides an intuitive object structure called Agent → Task → Crew. It is a structure that is easy for non-ML engineers to understand and officially supports two process types.
| Process Type | How It Works | Suitable Use Cases |
|---|---|---|
| Sequential | Agents process tasks in sequence. The output of the previous agent becomes the input of the next agent. | Clearly defined pipeline (Research → Write → Edit) |
| Hierarchical | Manager Agents (LLMs) delegate tasks to each agent and verify results | Complex code review, approval process |
Each Agent has its behavior determined by role (role), goal (goal), and backstory (background knowledge). Task defines the specific tasks each agent will perform, and Crew groups and executes them.
LangGraph: Graph-based State Machine
LangGraph models agent workflows as a directed graph. It consists of three elements: nodes (agent steps), edges (transition conditions), and state schemas.
A key differentiator of LangGraph is its cycle (loop) support. Unlike DAGs (Directed Acyclic Graphs), which are limited to simple sequential execution, LangGraph can represent loops that return to the previous node based on conditions. This feature shines in workflows requiring iterative improvements or patterns where "retrying if the result is unsatisfactory" is used. The code uses the following four core APIs.
| API | Role |
|---|---|
StateGraph(스키마) |
Initializes the graph based on the state schema |
add_node(이름, 함수) |
Registers the agent function as a graph node |
add_edge(출발, 도착) |
Defines the execution order (edges) between nodes |
compile() |
Compiles the graph into an executable form |
Checkpointing: A feature that saves the state at a specific point in time during a long task. In the event of an error, it allows the process to resume from the saved point instead of restarting from the beginning, significantly enhancing the stability of long-running pipelines.
Human-in-the-Loop: You can configure a workflow that waits for human approval before executing a specific node using the workflow.compile(interrupt_before=["노드명"]) setting. It is a pattern that flexibly combines automation and human review.
Now, let's implement these concepts in actual code.
Practical Application
Example 1: Configuring a Research Automation Pipeline with LangGraph
This is a pipeline that automatically generates executive reports by connecting three specialized agents (Research → Analysis → Report) as graph nodes.
import os
from langgraph.graph import StateGraph, END
from typing import TypedDict
from langchain_anthropic import ChatAnthropic
from langchain_community.tools import TavilySearchResults
# 환경 변수 설정 (실행 전 아래 키를 발급받아 설정하세요)
# export ANTHROPIC_API_KEY="your-anthropic-api-key"
# export TAVILY_API_KEY="your-tavily-api-key"
# 1. 공유 상태 스키마 정의 — 에이전트 간 데이터를 타입 안전하게 전달합니다
class ResearchState(TypedDict):
query: str
raw_data: str
analysis: str
report: str
# 2. LLM 및 도구 초기화
llm = ChatAnthropic(model="claude-sonnet-4-6")
search_tool = TavilySearchResults(max_results=5)
def research_agent(state: ResearchState) -> ResearchState:
"""웹 검색을 수행해 원시 데이터를 수집합니다."""
try:
results = search_tool.invoke(state["query"])
raw_data = "\n".join([r["content"] for r in results])
return {"raw_data": raw_data}
except Exception as e:
# 프로덕션에서는 재시도 로직이나 폴백 데이터 소스를 추가하는 것을 권장합니다
return {"raw_data": f"검색 오류: {str(e)}"}
def analysis_agent(state: ResearchState) -> ResearchState:
"""수집된 데이터를 분석해 인사이트를 추출합니다."""
prompt = f"다음 데이터를 분석하고 핵심 인사이트를 추출하세요:\n{state['raw_data']}"
analysis = llm.invoke(prompt).content
return {"analysis": analysis}
def report_agent(state: ResearchState) -> ResearchState:
"""분석 결과를 기반으로 임원 보고서를 생성합니다."""
prompt = f"다음 분석을 바탕으로 임원용 보고서를 작성하세요:\n{state['analysis']}"
report = llm.invoke(prompt).content
return {"report": report}
# 3. 그래프 구성 및 컴파일
workflow = StateGraph(ResearchState)
workflow.add_node("research", research_agent)
workflow.add_node("analyze", analysis_agent)
workflow.add_node("report", report_agent)
workflow.set_entry_point("research")
workflow.add_edge("research", "analyze")
workflow.add_edge("analyze", "report")
workflow.add_edge("report", END)
graph = workflow.compile()
# 4. 실행
result = graph.invoke({"query": "2025년 멀티에이전트 AI 트렌드"})
print(result["report"])| Code Components | Roles |
|---|---|
ResearchState |
State schema shared between agents. Delivers intermediate results type-safely |
add_node |
Register each agent function as a graph node |
add_edge |
Defines the execution order (edges) between nodes |
compile() |
Compiles the graph into an executable form |
To add Human-in-the-Loop, if you specify interrupt_before at compile time like workflow.compile(interrupt_before=["report"]), you can insert a human review step before the report node execution.
This time, let's take a look at how to structure a team using CrewAI.
Example 2: Building a Hierarchical Code Review Team with CrewAI
This is a hierarchical code review automation pipeline that follows the path of Coder → Reviewer → Tester. The Manager LLM delegates tasks to each agent and verifies the results.
import os
from crewai import Agent, Crew, Process, Task
from crewai_tools import CodeInterpreterTool
from langchain_anthropic import ChatAnthropic
# 환경 변수 설정
# export ANTHROPIC_API_KEY="your-anthropic-api-key"
code_interpreter = CodeInterpreterTool()
# 1. 에이전트 정의 — 각 역할에 특화된 목표와 배경 지식을 부여합니다
coder = Agent(
role="시니어 Python 개발자",
goal="명확하고 효율적인 Python 코드를 작성합니다",
backstory="10년 경력의 백엔드 엔지니어로, PEP 8과 SOLID 원칙을 철저히 따릅니다",
tools=[code_interpreter],
verbose=True,
)
reviewer = Agent(
role="코드 리뷰어",
goal="코드의 품질, 보안 취약점, 성능 문제를 검토합니다",
backstory="보안 전문가 출신으로 OWASP Top 10을 숙지하고 있습니다",
verbose=True,
)
tester = Agent(
role="QA 엔지니어",
goal="엣지 케이스를 포함한 포괄적인 테스트 케이스를 작성합니다",
backstory="TDD 방법론을 따르며 pytest를 주로 사용합니다",
tools=[code_interpreter],
verbose=True,
)
# 2. 태스크 정의 — 각 에이전트가 수행할 구체적인 작업을 명시합니다
code_task = Task(
description="사용자 인증 모듈을 Python으로 구현하세요. JWT 토큰 발급 및 검증 기능을 포함합니다.",
expected_output="완성된 Python 코드와 간략한 구현 설명",
agent=coder,
)
review_task = Task(
description="작성된 코드를 검토하고 보안 취약점, 코드 스타일, 성능 개선점을 리포트하세요.",
expected_output="검토 보고서 (취약점 목록, 권고 사항 포함)",
agent=reviewer,
context=[code_task], # 이전 태스크 결과를 컨텍스트로 활용합니다
)
test_task = Task(
description="인증 모듈에 대한 pytest 테스트 케이스를 작성하세요. 정상 케이스와 엣지 케이스를 모두 포함합니다.",
expected_output="pytest 테스트 파일 (커버리지 80% 이상 목표)",
agent=tester,
context=[code_task, review_task],
)
# 3. 계층형 Crew 구성
# manager_llm은 문자열이 아닌 LLM 객체를 전달해야 합니다
manager_llm = ChatAnthropic(model="claude-opus-4-6")
crew = Crew(
agents=[coder, reviewer, tester],
tasks=[code_task, review_task, test_task],
process=Process.hierarchical,
manager_llm=manager_llm, # 복잡한 추론이 필요한 오케스트레이터에는 강력한 모델을 사용하는 것을 권장합니다
verbose=True,
)
result = crew.kickoff()
print(result.raw)| Code Components | Roles |
|---|---|
backstory |
This is a key prompt that determines the agent's expertise and conduct. |
context=[code_task] |
Pass previous task results to the next agent |
Process.hierarchical |
Enables a hierarchical structure where Manager LLM delegates tasks to agents |
manager_llm |
This is the orchestrator model responsible for overall coordination. It must be passed in the form of an LLM object such as ChatAnthropic |
Now that we have examined the two frameworks individually, let's look at how to combine them.
Example 3: Cross-Framework Integration — LangGraph Orchestrates CrewAI
This is a cross-framework pattern where a LangGraph orchestrator delegates tasks to a CrewAI crew. It combines CrewAI's intuitive role-based team organization with LangGraph's flexible state management.
import os
from langgraph.graph import StateGraph, END
from typing import TypedDict
from crewai import Agent, Crew, Process, Task
from langchain_anthropic import ChatAnthropic
# 환경 변수 설정
# export ANTHROPIC_API_KEY="your-anthropic-api-key"
# 전체 파이프라인 상태 스키마
class PipelineState(TypedDict):
topic: str
research_result: str
final_code: str
# --- 리서치 크루: 에이전트 및 태스크 정의 ---
researcher = Agent(
role="리서치 전문가",
goal="주어진 주제에 대한 심층 기술 조사를 수행합니다",
backstory="기술 트렌드를 분석하는 5년 경력의 리서처입니다",
verbose=True,
)
analyst = Agent(
role="데이터 분석가",
goal="수집된 정보를 구조화하고 핵심 인사이트를 추출합니다",
backstory="데이터 기반 의사결정을 지원하는 분석 전문가입니다",
verbose=True,
)
research_task = Task(
description="{topic}에 대한 최신 기술 동향과 구현 사례를 조사하세요.",
expected_output="기술 동향 요약 보고서",
agent=researcher,
)
analysis_task = Task(
description="조사된 내용을 바탕으로 구현에 필요한 핵심 기술 스택을 분석하세요.",
expected_output="기술 스택 분석 및 권고안",
agent=analyst,
context=[research_task],
)
# --- 코딩 크루: 에이전트 및 태스크 정의 ---
coder = Agent(
role="시니어 Python 개발자",
goal="명확하고 효율적인 Python 코드를 작성합니다",
backstory="10년 경력의 백엔드 엔지니어로, SOLID 원칙을 철저히 따릅니다",
verbose=True,
)
reviewer = Agent(
role="코드 리뷰어",
goal="코드 품질, 보안 취약점, 성능 문제를 검토합니다",
backstory="보안 전문가 출신으로 OWASP Top 10을 숙지하고 있습니다",
verbose=True,
)
tester = Agent(
role="QA 엔지니어",
goal="포괄적인 테스트 케이스를 작성합니다",
backstory="TDD 방법론을 따르며 pytest를 주로 사용합니다",
verbose=True,
)
code_task = Task(
description="{research} 결과를 바탕으로 데이터 파이프라인 모듈을 Python으로 구현하세요.",
expected_output="완성된 Python 구현 코드",
agent=coder,
)
review_task = Task(
description="작성된 코드를 검토하고 보안 취약점과 성능 개선점을 리포트하세요.",
expected_output="코드 리뷰 보고서",
agent=reviewer,
context=[code_task],
)
test_task = Task(
description="구현된 모듈에 대한 pytest 테스트 케이스를 작성하세요.",
expected_output="pytest 테스트 파일",
agent=tester,
context=[code_task, review_task],
)
manager_llm = ChatAnthropic(model="claude-opus-4-6")
# CrewAI 크루를 LangGraph 노드로 래핑합니다
def run_research_crew(state: PipelineState) -> PipelineState:
"""LangGraph 노드: CrewAI 리서치 크루를 실행합니다."""
research_crew = Crew(
agents=[researcher, analyst],
tasks=[research_task, analysis_task],
process=Process.sequential,
)
result = research_crew.kickoff(inputs={"topic": state["topic"]})
return {"research_result": result.raw}
def run_coding_crew(state: PipelineState) -> PipelineState:
"""LangGraph 노드: CrewAI 코딩 크루를 실행합니다."""
coding_crew = Crew(
agents=[coder, reviewer, tester],
tasks=[code_task, review_task, test_task],
process=Process.hierarchical,
manager_llm=manager_llm,
)
result = coding_crew.kickoff(inputs={"research": state["research_result"]})
return {"final_code": result.raw}
# LangGraph로 두 크루를 조율합니다
workflow = StateGraph(PipelineState)
workflow.add_node("research_crew", run_research_crew)
workflow.add_node("coding_crew", run_coding_crew)
workflow.set_entry_point("research_crew")
workflow.add_edge("research_crew", "coding_crew")
workflow.add_edge("coding_crew", END)
pipeline = workflow.compile()
result = pipeline.invoke({"topic": "실시간 데이터 스트리밍 아키텍처"})
print(result["final_code"])| Code Components | Roles |
|---|---|
PipelineState |
This is a shared state schema connecting LangGraph and CrewAI crews |
run_research_crew / run_coding_crew |
Wraps a CrewAI Crew instance in a LangGraph node function |
kickoff(inputs=...) |
Injects dynamic input when Crew runs. Maps to the {변수명} template in the Task description |
| You must pass it in the form of an LLM object, such as | manager_llm |
Pros and Cons Analysis
Framework Comparison Matrix
The two frameworks can be compared based on key criteria as follows.
| Comparison Items | CrewAI | LangGraph |
|---|---|---|
| Initial Setup | Quick start possible with Agent·Task·Crew structure | Initial cost exists as state schema, nodes, edges, and compilation must all be defined |
| Intuitiveness | Object structure easy to understand even for non-ML engineers | Flow can be grasped through graph visualization, but there is a learning curve |
| Conditional Branching & Loops | Limited. Optimized for simple sequential and hierarchical structures | Strengths. Naturally represents complex conditional branches and cyclic loops as graphs |
| State Management | Task Output Delivery Method (Implicit) | Type-Safe Shared State Schema (Explicit) |
| Checkpointing | Limited | Built-in support. Resume and rollback possible after interruption |
| Human-in-the-Loop | Limited | Apply for Level 1 as interrupt_before |
| Debugging | Difficulty due to general logs within the task not functioning properly | Detailed tracing possible with LangSmith integration |
| Suitable Cases | Rapid prototyping, role-based teams, clear sequential workflow | Complex branching, loops, long execution times, production stability |
Disadvantages and Precautions
These are common precautions applicable to both frameworks.
| Item | Content | Response Plan |
|---|---|---|
| Increased Costs | Token costs increase rapidly as multiple agents each call the LLM API | It is recommended to deploy lightweight models (such as Claude Haiku) for sub-agents and powerful models for orchestrators |
| Context Pollution | As sub-agent results accumulate, the main agent's context becomes overloaded | It is recommended to design sub-agents to be stateless and deliver only a summary |
| Observability | It is difficult to track which agent did what | We recommend integrating LangSmith or CrewAI's built-in logging from the start |
| Prompt Injection | When an agent receives the output of another agent as input, malicious content can manipulate the next agent's behavior | It is recommended to apply input validation and sandboxing when transferring data between agents |
Prompt Injection: This is a security threat that requires particular attention in multi-agent pipelines. When the output of one agent is directly injected into the prompt of the next agent, malicious directives contained in the external data can alter the behavior of the entire pipeline.
The Most Common Mistakes in Practice
- Defining roles too broadly: An "all-or-nothing" agent is no different from a single agent. The more narrowly and clearly each agent's role is defined, the higher the consistency of performance and results.
- Using Lightweight Models for Orchestrators: Orchestrators, responsible for coordinating sub-agents and distributing tasks, perform complex reasoning. Using models weak in Manager LLM to reduce costs degrades the quality of the entire pipeline. An effective strategy is to deploy powerful models for orchestrators and lightweight models for sub-agents.
- Omitting Observability from the Initial Design: In multi-agent systems, it is difficult to track which agent did what. If you do not integrate LangSmith or CrewAI's built-in logging from the start, it can take a significant amount of time to identify the cause of errors when they occur. Observability is a necessity, not an option.
In Conclusion
If you need rapid prototyping and intuitive role-based teams, CrewAI is the more suitable choice, while if you need complex branching logic, loops, and production-level state management, LangGraph is the better choice. Also, the two frameworks are not mutually exclusive. As seen in Example 3, using them together can complement each framework's weaknesses and maximize its strengths.
3 Steps to Start Right Now:
- Try a simple sequential pipeline with CrewAI first. After installing with
pip install crewai, we recommend creating a blog draft generator that connects the three agents—Researcher → Writer → Editor—withProcess.sequential. The Official CrewAI Quickstart is a good starting point. - Get familiar with the concept of state machines using official LangGraph examples. After installing with
pip install langgraph langchain-anthropic, you can learn the state schema and node-edge structure by running them yourself, starting with the simple graph example in Official LangGraph Tutorial. - Integrate with LangSmith to visualize agent execution flow. By simply setting the
LANGCHAIN_TRACING_V2=trueenvironment variable, you can grasp each agent call, token usage, and execution time at a glance on the LangSmith Dashboard. We recommend securing observability first and then increasing complexity.
Multi-agent pipelines can feel difficult during the initial setup. However, the moment you first run a system where each agent focuses on their role and collaborates, possibilities that were impossible with a single agent open up.
Next Article: Practical Integration of MCP (Model Context Protocol) and A2A (Agent-to-Agent Protocol) — A Guide to Implementing a Standard Protocol Connecting External Tools and Heterogeneous Agents
Reference Materials
- LangGraph vs CrewAI vs AutoGen: The Complete Multi-Agent AI Orchestration Guide for 2026 | DEV Community
- Best Multi-Agent Frameworks in 2026: LangGraph, CrewAI | GuruSup
- CrewAI vs LangGraph vs AutoGen: Choosing the Right Multi-Agent AI Framework | DataCamp
- LangGraph vs CrewAI: Let's Learn About the Differences | ZenML Blog
- A Coding Implementation to Advanced LangGraph Multi-Agent Research Pipeline | MarkTechPost
- A Coding Guide to Build a Hierarchical Supervisor Agent Framework with CrewAI and Google Gemini | MarkTechPost
- CrewAI: A Practical Guide to Role-Based Agent Orchestration | DigitalOcean
- Build multi-agent systems with LangGraph and Amazon Bedrock | AWS Blog
- langgraph-supervisor | PyPI
- Hierarchical Process | CrewAI Official Documentation
- Single-Agent vs. Multi-Agent Code Review: Why One AI Isn't Enough | Qodo