Context Engineering 4 Strategies — Applying Write, Select, Compress, and Isolate to Multi-Agent Systems
If you run an LLM agent for a few days, there is a moment you are bound to encounter at least once. This occurs when the agent fails to remember what it just did, a sub-agent corrupts the orchestrator's conversation history, or a task is forcibly terminated because the 200k token window fills up. These problems all share the same root—how context is managed.
If prompt engineering was a matter of "what to say," context engineering is a matter of "what information to place, when, and where so that the model behaves optimally." In September 2025, Anthropic systematized agent context management into four strategies (buckets): Write, Select, Compress, and Isolate, through its official engineering blog. This article is aimed at backend and full-stack developers with experience in LLM API calls and covers what each strategy is, when to use it, and how to implement it in actual TypeScript code through a single consistent scenario (News Summary Multi-Agent).
If you read this article to the end, you will be able to diagnose where context bottlenecks occur in your agent code and implement a production-grade agent by using the four strategies independently or in combination.
Key Concepts
The context window is the agent's "working memory".
Like human short-term memory, the context window of an LLM is the total amount of information that can be processed at once. Based on Claude 3.7, this is 200,000 tokens, but as the number of tokens increases, the Context Rot phenomenon occurs. Due to the O(n²) relationship where every token attends every other token, the model's ability to accurately recall information deteriorates as the context becomes larger. This reveals why the idea that "simply using a longer context solves the problem" is incorrect.
Key Question: Which context configuration best elicits the model's desired behavior?
Anthropic defined four buckets to answer this question.
| Strategy | Key Action | Problem Solved |
|---|---|---|
| Write | Writes information to storage outside the context | Cannot maintain state across session boundaries |
| Select | Dynamically load only relevant information into the context | Unnecessary context inflation and Context Rot |
| Compress | Summarize conversation history and restart in a new context | Force quit when context window limit is reached |
| Isolate | Allocate context windows independent of subagents | Subagent details do not pollute the orchestrator |
These four strategies are not mutually exclusive. Actual production agents use a combination of two or more of them.
Write — Build memories outside the context
The Write strategy is a pattern in which an agent actively saves information outside the context window (files, databases, runtime state objects). Even if the context is reset or the session is disconnected, the agent can read the scratchpad written externally and continue.
applied period:
- Long-term tasks spanning tens to hundreds of steps (codebase analysis, game agents)
- When externalizing state that needs to be shared by multiple agents
- When you need to retry from the last checkpoint instead of the beginning if an error occurs
Trade-off: External storage I/O latency is added, and file system dependencies are introduced. This can be mitigated by asynchronous writes and a local cache layer.
Select — Accurately retrieve only the necessary information
The Select strategy is a pattern in which the agent dynamically injects only information relevant to the current task into the context. Instead of putting all documents into the context at once, it searches for and retrieves only highly relevant chunks.
Three Memory Types of Select Strategy
- Episodedic: Examples of desired behavior (few-shot examples)
- Procedural: Agent Action Guidelines — Rule files like
CLAUDE.md - Semantic: Task-related facts and knowledge (RAG, Vector DB)
Contextual Retrieval proposed by Anthropic improves upon the core limitations of existing RAGs. In existing RAGs, contextual information is lost when chunks are extracted from a document. Contextual Retrieval prepends and embeds phrases into each chunk that explain the context in which the chunk is situated within the document. Hybridly combining BM25 keyword search and semantic embedding search can reduce search errors by 49% compared to existing methods.
Trade-off: The overall performance of the agent depends on search quality. Injecting irrelevant chunks is counterproductive. Building an evaluation pipeline is essential.
Compress — Keep running beyond context limits
The Compress strategy is a pattern that summarizes (compacts) the conversation history and restarts with a new context when the context window approaches a threshold. Instead of forcibly terminating the session, it allows you to compress and continue the key content from that point.
Information loss is inevitable during the compression process. You must clearly determine what to preserve and what to discard.
| Items to preserve | Things that can be discarded |
|---|---|
| Task Objectives and Completed Steps | Details of Intermediate Reasoning Process |
| Key Facts and Data Discovered | Path of Repeated Attempts and Failures |
| Current Status and Pending Items | Full Original Response from Successful Tool Calls |
| Constraints and Error History to Note | Original Pre-Compaction Version Already Summarized |
The Claude Agent SDK supports automatic compaction with the compaction_control parameter. Claude Code automatically summarizes the entire history when the context window reaches 95%.
Trade-off: If the summary quality is low, critical information is permanently lost. It is recommended to verify the compression results separately or to back up the original data using Write before compression.
Isolate — Share context with sub-agents
The Isolate strategy is a pattern that distributes complex tasks across multiple sub-agents, allowing each to have an independent context window. The orchestrator is responsible only for strategy and coordination, while the details of the actual work remain within the sub-agent's context.
Context Pollution: If long search results or error logs from subagents are fed directly into the orchestrator context, the orchestrator is unable to focus on high-level decision-making. The Isolate strategy blocks this by returning only a summary of results from subagents to the orchestrator.
Trade-off: Token costs skyrocket as the number of sub-agents increases. In Anthropic's multi-agent researcher case, up to 15 times more tokens were consumed compared to a single agent. The number of sub-agents and task decomposition criteria must be carefully designed.
Practical Application
Example 1: News Summary Multi-Agent — Comparison of 4 Strategies Before and After
This is a scenario for implementing an agent pipeline that collects, classifies, and summarizes hundreds of news articles every day.
Before — Naive implementation without strategy:
// 모든 문제가 발생하는 순진한 구현
interface Message {
role: "user" | "assistant";
content: string;
}
async function naiveNewsAgent(articles: string[]): Promise<string> {
let messages: Message[] = [];
for (const article of articles) {
// 문제 1: 모든 기사를 순차적으로 컨텍스트에 쌓는다 → Context Rot
messages.push({ role: "user", content: `이 기사를 요약해줘: ${article}` });
const summary = await callClaudeWithHistory(messages);
messages.push({ role: "assistant", content: summary });
// 200기사 처리 시 컨텍스트 윈도우 초과 → 100번째 기사에서 강제 종료
// 문제 2: 중단되면 처음부터 재시작 — 진행 상황 없음
// 문제 3: 단일 스레드 순차 처리 — 처리 속도 느림
}
return messages[messages.length - 1].content;
}After — Write + Select + Compress + Isolate 조합:
First, define the scratchpad for the Write strategy.
// [Write] 에이전트 스크래치패드 — 세션 간 상태 영속화
import fs from "fs/promises";
import path from "path";
interface ScratchpadEntry {
step: number;
timestamp: string;
action: string;
result: string;
notes: string;
}
class AgentScratchpad {
private filePath: string;
private entries: ScratchpadEntry[] = [];
constructor(taskId: string) {
this.filePath = path.join("./scratchpad", `${taskId}.json`);
}
async load(): Promise<void> {
try {
const raw = await fs.readFile(this.filePath, "utf-8");
this.entries = JSON.parse(raw);
console.log(`[Write] 이전 체크포인트 ${this.entries.length}개 로드 완료`);
} catch {
this.entries = []; // 첫 실행이면 빈 상태로 시작
}
}
async write(entry: Omit<ScratchpadEntry, "step" | "timestamp">): Promise<void> {
const newEntry: ScratchpadEntry = {
step: this.entries.length + 1,
timestamp: new Date().toISOString(),
...entry,
};
this.entries.push(newEntry);
await fs.mkdir(path.dirname(this.filePath), { recursive: true });
await fs.writeFile(this.filePath, JSON.stringify(this.entries, null, 2));
}
getSummary(): string {
return this.entries
.map((e) => `[Step ${e.step}] ${e.action}: ${e.notes}`)
.join("\n");
}
}The following is a contextual retrieval-based search of the Select strategy.
// [Select] Contextual Retrieval — 관련 청크만 컨텍스트에 주입
interface NewsChunk {
id: string;
content: string;
contextualDescription: string; // 청크의 문서 내 위치 설명
}
class NewsKnowledgeBase {
private chunks: NewsChunk[] = [];
// 인덱싱 시: 각 청크에 컨텍스트 설명을 prepend (Contextual Retrieval 핵심)
async indexArticle(articleId: string, fullText: string): Promise<void> {
const rawChunks = this.splitIntoChunks(fullText, 400);
for (const [i, chunk] of rawChunks.entries()) {
// 각 청크가 전체 기사에서 어떤 역할을 하는지 설명 생성
const contextualDescription = await callClaude(
`다음 기사 "${articleId}" 전체 내용:\n${fullText}\n\n` +
`아래 청크가 이 기사 내에서 어떤 맥락에 위치하는지 2문장으로 설명하라.\n\n청크: ${chunk}`
);
this.chunks.push({
id: `${articleId}-${i}`,
content: chunk,
contextualDescription,
});
}
}
// 검색 시: BM25 + 임베딩 하이브리드 검색으로 관련 청크만 선별
async buildContext(query: string, topK = 5): Promise<string> {
// 실제 구현에서는 벡터 DB(Pinecone 등)와 BM25 점수를 결합
const relevant = this.hybridSearch(query, topK);
// 전체 문서 대신 관련 청크만 주입 → 토큰 절약 + 관련성 집중
return relevant
.map((c) => `[배경 정보]\n${c.contextualDescription}\n${c.content}`)
.join("\n\n");
}
private hybridSearch(query: string, topK: number): NewsChunk[] {
// 0.4 * BM25 + 0.6 * 임베딩 유사도 하이브리드 가중치
return this.chunks.slice(0, topK); // 간략화된 예시
}
private splitIntoChunks(text: string, maxWords: number): string[] {
const words = text.split(" ");
const chunks: string[] = [];
for (let i = 0; i < words.length; i += maxWords) {
chunks.push(words.slice(i, i + maxWords).join(" "));
}
return chunks;
}
}Run category-specific sub-agents in parallel using the Isolate strategy.
// [Isolate] 서브에이전트 — 독립 컨텍스트에서 좁은 태스크 수행 후 요약만 반환
interface SubTaskResult {
taskId: string;
summary: string;
keyFindings: string[];
}
async function runSubAgent(
taskId: string,
specificTask: string
): Promise<SubTaskResult> {
// 서브에이전트는 자신의 태스크 범위만 알고 있다
// 오케스트레이터의 전체 히스토리는 전혀 전달되지 않는다 (격리)
const response = await callClaude(
`[서브태스크 #${taskId}]\n${specificTask}\n\n` +
`완료 후 반드시 JSON 형식으로 응답하라:\n` +
`{ "summary": "...", "keyFindings": ["...", "..."] }`
);
// 오케스트레이터에게는 요약된 결과만 전달 — 세부 컨텍스트는 격리
const parsed = JSON.parse(extractJson(response));
return { taskId, ...parsed };
}Automatic compression is performed when the context limit is reached using the Compress strategy.
// [Compress] 컴팩션 트리거 — 80% 도달 시 히스토리를 요약하고 재시작
const CONTEXT_WINDOW_TOKENS = 200_000;
const COMPACTION_THRESHOLD = 0.80;
function estimateTokenCount(messages: Message[]): number {
return messages.reduce((acc, m) => acc + m.content.length / 4, 0);
}
async function compactIfNeeded(
messages: Message[],
taskGoal: string
): Promise<Message[]> {
const usage = estimateTokenCount(messages) / CONTEXT_WINDOW_TOKENS;
if (usage < COMPACTION_THRESHOLD) return messages;
console.log(`[Compress] 컨텍스트 ${Math.round(usage * 100)}% 사용 → 컴팩션 실행`);
const historyText = messages
.map((m) => `[${m.role}]: ${m.content}`)
.join("\n");
const summary = await callClaude(
`태스크 "${taskGoal}"의 대화 히스토리를 다음 형식으로 요약하라:\n` +
`1. 완료된 작업 (구체적)\n2. 현재 상태 및 미완료 항목\n` +
`3. 수집된 핵심 사실\n4. 주의해야 할 제약·오류\n\n` +
`히스토리:\n${historyText}`
);
// 압축된 단일 메시지로 새 컨텍스트 시작
return [
{
role: "user",
content: `[이전 작업 요약]\n${summary}\n\n태스크를 이어서 진행하라.`,
},
];
}Finally, it is a production-level orchestrator that combines four strategies.
// 4전략 통합 오케스트레이터
async function productionNewsAgent(
taskId: string,
articleUrls: string[]
): Promise<string> {
// [Write] 이전 실행 체크포인트 로드
const scratchpad = new AgentScratchpad(taskId);
await scratchpad.load();
const knowledgeBase = new NewsKnowledgeBase();
let messages: Message[] = [];
// [Isolate] 카테고리별 서브에이전트 병렬 실행
const categories = ["정치", "경제", "기술", "사회"];
const subResults = await Promise.all(
categories.map((category) =>
runSubAgent(
`${taskId}-${category}`,
`다음 URL 목록에서 "${category}" 관련 기사만 선별해 핵심 내용을 요약하라:\n` +
articleUrls.join("\n")
)
)
);
// [Write] 서브에이전트 결과를 스크래치패드에 영속화
for (const result of subResults) {
await scratchpad.write({
action: `카테고리 요약 완료: ${result.taskId}`,
result: result.summary,
notes: result.keyFindings.join("; "),
});
}
// [Compress] 오케스트레이터 컨텍스트가 임계값 도달 시 자동 압축
messages.push({
role: "user",
content: `카테고리별 요약:\n${scratchpad.getSummary()}`,
});
messages = await compactIfNeeded(messages, "일간 뉴스 브리핑 생성");
// [Select] 과거 관련 기사를 지식베이스에서 검색해 컨텍스트 보강
const currentSummary = subResults.map((r) => r.summary).join(" ");
const backgroundContext = await knowledgeBase.buildContext(currentSummary);
return await callClaude(
`[오늘 카테고리 요약]\n${scratchpad.getSummary()}\n\n` +
`[관련 배경 정보]\n${backgroundContext}\n\n` +
`위를 통합해 500자 이내의 독자용 데일리 뉴스레터를 작성하라.`
);
}Before vs After Quantitative Comparison:
| Item | Before (No Strategy) | After (Combination of 4 Strategies) |
|---|---|---|
| Ability to process 200 articles | Impossible — Terminates at the 100th | Possible — Compress + Write combination |
| Restart on mid-game error | Full restart from the beginning | Resume from the last checkpoint |
| Token usage (200 articles) | ~160k accumulated (all piled up in context) | ~40k — Selectively load only relevant chunks |
| Processing Speed | Sequential Processing (200 articles × Average 3 seconds) | Parallel Processing by Category (4 Sub-agents) |
| Information contamination between categories | Political articles impact technical summaries | Completely isolated with Isolate |
Pros and Cons Analysis
Advantages
| Strategy | Key Advantages | Representative Use Cases |
|---|---|---|
| Write | Long-term task persistence across session boundaries | Codebase analysis, Pokémon game agent |
| Select | Reduced Token Costs + Improved Accuracy through Centralized Relevant Information | RAG System, Domain Knowledge Agent |
| Compress | Long-term tasks possible with virtually unlimited context length | Long-term conversation agent, iterative execution pipeline |
| Isolate | Increased throughput through parallel processing + prevention of context pollution | Research agent, multistep analysis system |
Disadvantages and Precautions
| Strategy | Disadvantages | Countermeasures |
|---|---|---|
| Write | External storage I/O latency, file system dependency | Asynchronous writes + added in-memory cache layer |
| Select | Overall performance depends on search quality | Introduction of Contextual Retrieval method + Establishment of search evaluation pipeline |
| Compress | Permanent loss of critical information if summary quality is low | Back up original with Write before compression, explicitly specify items to retain |
| Isolate | Token cost increases by the number of sub-agents (up to 15x) | Sub-agent limit, task decomposition criteria clarified |
The Most Common Mistakes in Practice
- Pattern using only Isolate without Select: Injects the entire document into the context for each sub-agent. The token cost increases exponentially with the number of agents. Isolate and Select must always be applied as a set.
- Pattern of writing to logs only: It records to the scratchpad, but the content is not read back into the context during the next agent execution. Write is only meaningful if it is completed as a "write → read" cycle.
- Pattern of triggering Compress too late: The context window fills up, terminates with an error, and restarts from the beginning. It is safe to trigger compaction preemptively at 80% and leave a checkpoint with Write beforehand.
In Conclusion
The core of context engineering is "putting the right information in the right place at the right time." Externalizing state with Write, selecting only relevant information with Select, breaking through window limits with Compress, and preventing contamination with Isolate—these four strategies complement each other and form the design basis for creating production-grade agents.
3 Steps to Start Right Now:
- Diagnosis: If the agent is currently accumulating the entire conversation history in the context, start by applying the Write + Compress combination. Adding just a single scratchpad file enables the ability to resume long-term tasks.
- Improvement: If you are using RAG, reduce search errors by prepending in-document location descriptions to each chunk using Anthropic Contextual Retrieval. Search accuracy increases significantly without major changes to existing code.
- Extension: If a single agent is performing too many roles, split the tasks into sub-agents, but apply an Isolate strategy to return only the summary to the orchestrator. This achieves the effects of parallel processing and preventing context pollution simultaneously.
Next Post: Building a Custom MCP (Model Context Protocol) Server — A Practical Pattern for Injecting Domain-Specific Context into Agents in Real-Time
Reference Materials
- Effective context engineering for AI agents | Anthropic Engineering
- Context Engineering for Agents | LangChain Blog
- Contextual Retrieval | Anthropic
- Anthropic Multi-Agent Research System | Anthropic
- Context Management and Compaction | Claude Cookbooks DeepWiki
- Agentic Context Engineering: The Complete 2025 Guide | Sundeep Teki
- Context Engineering 101 — What We Can Learn from Anthropic | omnigeorgio
- Claude's Context Engineering Secrets | Bojie Li