Andrej Karpathy's Vibe Coding Journey — Correcting LLM Agent Behavior with CLAUDE.md
Honestly, when I first heard the term "vibe coding," I assumed it was just another buzzword. I couldn't quite grasp what it meant to write code by feel, or whether it was something actually usable in real-world work. But when Andrej Karpathy himself declared that he had "shifted from 80% manual coding to 80% agent-driven coding in 8 weeks," I finally realized this was more than just a trend.
Karpathy's thinking isn't simply "write code with AI." He argues that software itself has undergone three paradigm shifts, and he systematically articulated how developers should think and work at each stage. On top of that, he observed patterns of mistakes that LLMs make when using tools like Claude Code or Cursor in practice, and even created and shared a concrete guideline file (CLAUDE.md) to prevent them.
By the end of this article, you'll want to add a CLAUDE.md to your project root right away. We'll cover the Software 1.0/2.0/3.0 framework, four principles for LLM coding pitfalls, a CLAUDE.md setup you can use immediately in practice, and the LLM wiki pattern for building a personal knowledge base without RAG — all in one go.
Core Concepts
Software 1.0 → 2.0 → 3.0: Three Paradigm Shifts
Understanding the big picture Karpathy drew makes everything else much easier to follow.
| Stage | Definition | Core Medium | Developer's Role |
|---|---|---|---|
| Software 1.0 | Humans explicitly write logic as code | .py, .ts source files |
Algorithm design and direct implementation |
| Software 2.0 | Models learn behavior from data | Trained model files (.pt, .ckpt) |
Data curation, training pipeline design |
| Software 3.0 | Natural language prompts instruct LLMs | CLAUDE.md, system prompts |
Prompt engineering, agent orchestration |
Karpathy's core insight about Software 2.0 is somewhat more radical. The idea is that "data, not code, is the program" — instead of developers writing if-else statements directly, behavior is defined by feeding large amounts of data to a model. Once that clicks, Software 3.0 follows naturally — this time, natural language defines the program instead of data.
Software 3.0 — A new programming paradigm that uses LLMs as the "runtime" and treats natural language like "source code."
CLAUDE.mdor system prompt files are the program itself.
What's interesting is that these three paradigms don't replace one another — they coexist hierarchically. The LLM itself is a product of Software 2.0, controlling that LLM via prompts is Software 3.0, and once the generated code is deployed to production, it settles back into Software 1.0. All three layers are alive simultaneously.
Four LLM Coding Pitfalls — Bad Habits Karpathy Observed in AI
Here are four mistake patterns Karpathy repeatedly witnessed while working with LLMs. These are situations you'll frequently encounter in practice, and once you know them, you'll approach prompt writing in a completely different way.
- No Assumptions: The behavior where LLMs fill in ambiguous requirements on their own judgment. They make hidden dependency and architecture decisions without user consent.
- Surface Confusion: When something is uncertain, ask immediately instead of pressing forward. Silently heading in the wrong direction is the most dangerous thing.
- Explicit Trade-offs: When multiple implementation approaches exist, don't silently choose one — show the user the pros and cons of each option.
- Goal-oriented Execution: Implement only what was explicitly requested. Unsolicited additions like "while we're at it, this would also be nice" are the root cause of code bloat.
Code Bloat — The phenomenon where an LLM inflates a problem solvable in 100 lines to 1,000 lines through excessive abstraction, unnecessary helper functions, and defensive code "for the future." This significantly raises maintenance costs.
Now that we have the concepts down, let's move on to settings you can actually use.
Practical Application
Example 1: Correcting AI Behavior with a CLAUDE.md Skill File
The forrestchang/andrej-karpathy-skills repository is a single CLAUDE.md file that distills Karpathy's four principles into a form Claude Code can understand. Simply adding this file to your project root noticeably changes how the AI behaves.
Here's what the pattern looks like for how core guidelines are written:
# Project Guidelines
## Core Principles
### No Assumptions
If a requirement is ambiguous, STOP and ask for clarification.
Do NOT infer intent from context alone.
### Surface Confusion Immediately
If you are confused about ANY part of the task, say so before proceeding.
"I'm not sure about X — should I do A or B?" is always better than guessing.
### Explicit Trade-offs
When multiple implementation approaches exist, present them with pros/cons.
Do not silently choose one approach when alternatives exist.
### Goal-Oriented Execution
Implement ONLY what was explicitly requested.
Do not add features, refactoring, or abstractions beyond the task scope.I also initially thought "how much can one markdown file really change?" But after testing it myself, I noticed a clear increase in how often the AI would stop at uncertain points and ask questions, and the habit of spontaneously adding unrequested refactoring dropped noticeably. For example, when I gave a vague requirement like "add payment functionality," the AI previously would design the payment flow and start writing code on its own — now it first asks: "Could you clarify which payment gateway to use and whether refund handling should be included?"
| Behavior | Without CLAUDE.md | With CLAUDE.md |
|---|---|---|
| Handling ambiguous requirements | "add payment functionality" → designs and implements on its own | "Can you confirm the payment gateway and refund handling first?" |
| Adding out-of-scope features | Happens frequently | Explicitly suppressed |
| Choosing implementation approach | Silent unilateral decision | Presents options and trade-offs |
| Handling confusing code | Just keeps going | Immediately expresses uncertainty |
Example 2: LLM Wiki — Building a Personal Knowledge Base Without RAG
The LLM wiki architecture Karpathy released in early 2026 has a surprisingly simple idea with powerful results. Without using any RAG (Retrieval Augmented Generation) pipeline or vector database, it builds a structured knowledge base using nothing but markdown files and a long context window (the length of text a model can process at once).
From using it myself, I found that "no management overhead" is a much bigger advantage than I expected. You can get started with just one folder, without worrying about vector DB setup, embedding pipeline management, or search tuning. Karpathy himself mentioned 70x efficiency compared to RAG — this is best understood as referring to the difference in operational burden when you eliminate vector search error rates and infrastructure complexity, rather than a precise benchmark figure. It's an expression pointing to the overall cost of maintaining a pipeline.
Here's how you might structure the directory:
my-llm-wiki/
├── raw_sources/ # LLM이 읽기만 하는 원본 자료
│ ├── papers/
│ ├── articles/
│ └── meeting_notes/
├── wiki/ # LLM이 생성·관리하는 마크다운
│ ├── entities/ # 사람, 조직, 제품 등
│ ├── concepts/ # 기술 개념, 용어 정의
│ └── index.md # 자동 생성 인덱스
└── schema.md # 에이전트 행동 규칙 정의In schema.md, you define the rules for how the AI should build and update the wiki:
# Wiki Schema
## Entity Format
Each entity file should contain:
- **Name**: Official name
- **Type**: person | org | product | concept
- **Summary**: 2-3 sentence description
- **Relations**: Links to related entities
- **Sources**: References to raw_sources/ files
## Update Rules
- Never delete existing content; append or update only
- Cross-link related concepts
- Flag conflicts between sources as `> ⚠️ Conflicting info:`The > ⚠️ Conflicting info: tag is how the AI automatically marks conflicting information that appears across different documents. The actual output looks something like this:
## Karpathy — Software 3.0 정의
LLM을 프로그래밍 런타임으로 사용하는 패러다임.
> ⚠️ Conflicting info: article_2025.md는 "자연어가 소스 코드"라 정의하지만,
> interview_2026.md는 "프롬프트 엔지니어링이 새로운 컴파일 단계"라 표현함.By leaving conflict points flagged rather than overwriting them, the knowledge base maintains its trustworthiness. The entire flow — where the AI reads raw_sources/ and automatically creates and updates files under wiki/ — is controlled by this single file.
Example 3: Layering the Agent Tool Stack
The workflow Karpathy made public uses tools in a layered manner based on task size:
[소규모 — 자동완성]
Cursor Autocomplete
→ 한 줄, 짧은 함수 완성. 타이핑하면서 실시간으로 제안
[중규모 — 편집 지시]
Cursor Highlight & Edit
→ 특정 코드 블록을 선택(하이라이트)한 뒤 자연어로 수정 지시
[대규모 — 에이전트]
Claude Code / OpenAI Codex
→ 파일 여러 개에 걸친 기능 구현, 리팩터링
[복잡한 디버깅 / 리서치]
최고 성능 모델 (Claude Opus, Gemini Ultra 등)
→ 원인 불명의 버그 심층 추적, 아키텍처 분석The key to this layered approach is not using the most powerful agent for every task, but choosing the right tool for the scope and complexity of the work. Running an agent on simple autocomplete just slows things down, and delegating complex refactoring to autocomplete leaves it without enough context.
The Most Common Mistakes in Practice
It's easy to fall into these traps as you learn the concepts and tools — honestly, I went through all three myself at first.
- Giving vague instructions to the agent without
CLAUDE.md— Without a context file, telling the AI "add this feature" can lead it to make its own assumptions and change the codebase in unexpected ways. - Pushing vibe coding output directly to production — Prototype speed is excellent, but error handling and edge cases are often missing. Going through a Software 1.0 refinement process is recommended.
- Applying the same agent level to all tasks — Running Claude Code for a one-line change, or conversely leaving hundreds of lines of refactoring to autocomplete, are both inefficient. Consciously applying tool layering will change your workflow faster than you'd expect.
Pros and Cons
Honestly, this was the part that required the most caution — it's easy to underestimate the limitations when you're excited about the possibilities of Software 3.0.
Advantages
| Item | Detail |
|---|---|
| Improved accessibility | Prototyping is possible with natural language alone, lowering the barrier for non-developers |
| Reduced repetitive work | Productivity increases significantly for high-repetition tasks like boilerplate, CRUD, and configuration code |
| Simplified knowledge management | You can run a structured personal wiki with just markdown, without a vector DB |
| Maturing tool ecosystem | Tools like Claude Code and Cursor have stabilized rapidly between 2025 and 2026 |
Disadvantages and Caveats
| Item | Detail | Mitigation |
|---|---|---|
| Limits on novel design | Original system architecture design remains an area where LLMs are still weak | Recommended approach: humans design the architecture first, then delegate implementation to AI |
| Code bloat | When instructions are vague, LLMs tend to add unnecessary features | Using CLAUDE.md to constrain behavior and expressing request scope clearly is effective |
| Overconfidence in agent autonomy | "Fully autonomous agents" are still at demo level | Karpathy himself emphasizes that "partially autonomous products are realistic" |
| Context cost | The long context approach of the LLM wiki scales token costs proportionally with document volume | Recommend a separate chunking strategy depending on wiki size |
| Technical debt risk | Code produced quickly through vibe coding needs production-quality validation | It's important to maintain a workflow where AI-generated code also goes through code review and testing |
Closing Thoughts
What Karpathy has shown us is not how to use AI, but how developers need to restructure their thinking in the age of AI. Software 3.0 is a world where natural language becomes code, and in that world, the most important capabilities remain knowing "what to build," and understanding and controlling the mistake patterns that AI makes.
Three steps you can start right now:
-
Try adding a
CLAUDE.mdfile to your current project root — Copy the file from theforrestchang/andrej-karpathy-skillsrepository and proceed with your usual work using Claude Code or Cursor. You'll quickly experience the AI asking questions first when faced with ambiguous requirements. -
Try splitting your tools based on task size — Categorize one-line changes to autocomplete, specific block edits to Cursor Highlight & Edit, and work spanning multiple files to Claude Code. Consciously applying this layering will naturally speed up your workflow.
-
Try applying a small-scale LLM wiki to your personal notes — Create the
raw_sources/,wiki/, andschema.mdfolder structure, put a couple of recent technical documents intoraw_sources/, and ask the AI to generatewiki/concepts/files. You can confirm for yourself just how quickly structured knowledge organization happens without RAG.
References
- andrej-karpathy-skills GitHub (forrestchang)
- Karpathy's CLAUDE.md Skills File: The Complete Guide | antigravity.codes
- Andrej Karpathy on Software 3.0 | Latent Space
- Software 2.0 | Karpathy on Medium
- Neural Networks: Zero To Hero (공식 강의)
- LLM Wiki GitHub Gist (Karpathy 원본)
- Karpathy LLM Wiki | VentureBeat
- Andrej Karpathy Vibe Coding | Klover.ai
- Eureka Labs 발표 | TechCrunch
- 안드레 카파시 "소프트웨어 3.0의 시대" | 바이라인네트워크
- 카파시의 AI 코딩 워크플로 분석 | FastCampus