Hermes Agent SOUL.md and the 5-Pillar Architecture — An Inside Look at the Tier 3 Skill Auto-Generation Mechanism
At the end of every session, the agent extracts the procedure from the work it performed in that conversation, saves it as a skill file, and reuses that skill in subsequent sessions — this was the first thing that caught my attention when I first encountered Hermes Agent. The phrase "self-improving agent" is easy to dismiss as marketing copy, but I wanted to verify exactly how this particular mechanism works in concrete terms.
Honestly, I was skeptical at first. I had to dig into the code myself to figure out whether "performance improves the more you use it" was a clever repackaging of fine-tuning, or actually something different. Hermes creates meaningful performance gains by solidifying LLM call results into procedural skill files on the filesystem and selectively injecting those skills into the system prompt of the next session. It's closer to external memory expansion than fine-tuning.
In this post, I'll break down the 5-Pillar architecture that underpins this mechanism one piece at a time — tracing it all the way to the point where the design intent becomes unmistakably clear: why SOUL.md occupies the first slot in the system prompt, and why prompt_builder.py reads only the frontmatter (the YAML metadata block at the top of a file) first.
Core Concepts
5-Pillar: How the Five Pillars Interlock
The fastest way to understand Hermes is to grasp how its five components relate to one another. Each looks like an independent component, but in practice they interlock organically.
| Pillar | File / Component | Core Role |
|---|---|---|
| Memory | MEMORY.md, USER.md |
Maintaining context across sessions |
| Skills | ~/.hermes/skills/*.md |
Reusing procedural knowledge |
| SOUL | ~/.hermes/SOUL.md |
Defining agent identity and tone |
| Crons | state.db + gateway daemon |
Proactive scheduling |
| Self-Improvement | Auto-generation loop | Usage experience → skill extraction |
Pillar 1 — How Memory Persists After a Session Ends
MEMORY.md is the agent's own notepad; USER.md records user preferences and task patterns. The combined token limit for both files is capped at roughly 1,300 tokens, which forces you to think about prioritization — a little frustrating at first, but ultimately a structure that leaves only "what truly matters."
The most important thing to understand upfront is that memory files are loaded exactly once at session start and then frozen. Updates made to the files mid-session are not reflected in the current session's prompt. I initially thought this was a bug, but it turned out to be an intentional design decision — the reason becomes immediately clear in Pillar 3.
If you need immediate reflection, you can restart the session.
Pillar 2 — The Procedural Recipe System: Skills
The most surprising thing I noticed when actually using it was how simple the structure of a skill file is. YAML frontmatter for metadata, markdown body for the actual procedure — that's it.
---
name: competitive-analysis
description: 경쟁사 주간 동향을 분석하고 Slack 요약을 전송하는 절차
trigger: [competitive, analysis, market research]
tier: 3
created_at: 2026-04-22T09:14:00Z
---
## 절차
1. 타깃 경쟁사 리스트를 USER.md에서 로드
2. 각 사이트의 최근 7일 변경사항 크롤링
3. 변경사항을 카테고리별로 분류 (제품, 가격, 채용)
4. 요약본 작성 후 Slack #competitive-intel 채널 전송
## 주의사항
- 크롤링 실패 시 3회 재시도 후 실패 알림
- 요약본은 500자 이내로 압축Progressive Disclosure:
prompt_builder.pyreads only the frontmatter of a skill file first to assess relevance. The actual body is loaded only when the skill is deemed necessary. A practical design choice that conserves the context window.
The core logic behind this behavior can be summarized as follows:
# agent/prompt_builder.py (핵심 발췌 — 개념 수준)
def load_skills_for_session(query: str) -> list[Skill]:
skills_dir = Path("~/.hermes/skills/").expanduser()
candidates = []
for skill_file in skills_dir.glob("*.md"):
# 1단계: 프런트매터만 파싱 (본문 미로드)
frontmatter = parse_frontmatter_only(skill_file)
relevance = score_relevance(frontmatter, query)
if relevance > THRESHOLD:
candidates.append((skill_file, frontmatter, relevance))
# 2단계: 상위 N개만 본문 로드
candidates.sort(key=lambda x: x[2], reverse=True)
return [load_full_skill(path) for path, _, _ in candidates[:MAX_SKILLS]]The reason prompt size doesn't explode even as skills grow to dozens is precisely this two-stage loading approach.
Pillar 3 — Why SOUL.md Occupies the First Slot
SOUL.md defines the agent's tone, values, and response style. But it carries significance beyond being a simple configuration file.
Looking at the system prompt assembly order, SOUL.md sits at slot #1 — the very beginning.
[slot 1] SOUL.md
[slot 2] MEMORY.md + USER.md
[slot 3] Relevant skill frontmatter → selected skill bodies
[slot 4] Context files (.hermes.md / AGENTS.md / CLAUDE.md)
[slot 5] Tool usage guide
[slot 6] Model-specific instructions (provider-specific optimizations for Anthropic, OpenAI, etc.)In Pillar 1, I promised to explain "why memory is frozen at session start" — here's the answer. Prompt caches from Anthropic and OpenAI require a stable prefix to generate cache hits. Placing the session-invariant SOUL.md first and the relatively stable MEMORY.md after it increases cache hit rates, which in turn reduces API costs. This is a design decision driven by economics, not performance.
# SOUL.md 현재 내용 확인
> read me your soul file
# 피드백으로 직접 업데이트 (에이전트가 파일을 수정)
> 너무 장황해. 핵심만 짧게 답해줘
> 공식적인 어조 그만. 편하게 말해줘Pillar 4 — From Reactive to Proactive: Crons
A built-in gateway daemon checks state.db every 60 seconds to execute scheduled tasks. Because it runs in an isolated agent session, it does not affect the current conversation.
# 자연어로 스케줄 등록
> 매일 밤 12시에 스테이징 서버 로그를 분석하고 Slack에 요약 전송해줘This single command causes Hermes to simultaneously do two things — register a cron job and create a Tier 3 skill file containing the corresponding procedure. And the more the same task is repeated, the more refined the skill becomes.
Pillar 5 — Usage Experience Solidifies Into Procedure: Self-Improvement
This is Hermes's core differentiator. There are four conditions under which a Tier 3 skill is auto-generated:
- A complex task is completed with 5 or more tool calls in a single session
- The task has a multi-step structure
- Error recovery or user correction occurs
- The user explicitly confirms the result
All four conditions are heuristics for judging "is this task worth repeating?" The inclusion of error recovery as a condition is interesting — the error recovery process itself captures "what pitfalls exist and how to avoid them."
And starting from v0.12.0, an autonomous Curator scores, consolidates, and cleans up the skill library on a 7-day cycle. Similar skills are merged, and skills that aren't actually being used have their priority lowered. Not perfect, but a practical mechanism that prevents the library from accumulating indefinitely.
Practical Application
Example 1: Tracing the Full Tier 3 Skill Auto-Generation Flow
User requests a competitive analysis task
→ Agent performs a 7-step multi-step task (12 tool calls)
→ Crawling fails midway → recovers via retry logic (error recovery occurs)
→ User: "Perfect, do this every week" (explicit confirmation)When this flow satisfies all four conditions, the following pipeline executes:
Execute → Perform the actual task (tool calls, API requests, file processing, etc.)
Evaluate → Assess reuse value (5+ tool calls + multi-step + confirmation)
Extract → Extract procedure, error patterns, and validation steps from the session log
Write → Create a skill file in ~/.hermes/skills/
Validate → Tool registry validation + dangerous pattern scan
Discoverable → Included in frontmatter scan targets starting from the next sessionThe code below is pseudocode intended to explain how the mechanism works. The actual implementation and detailed interfaces may differ.
# agent/skill_extractor.py (의사 코드 — 개념 설명용)
def should_generate_skill(session: Session) -> bool:
return (
session.tool_call_count >= 5
and session.is_multistep
and (session.had_error_recovery or session.had_user_correction)
and session.user_confirmed
)
def extract_procedure_steps(session: Session) -> list[str]:
steps = []
tool_calls = [tc for tc in session.tool_calls if tc.success]
for i, call in enumerate(tool_calls):
# 사용자 수정이 있었던 단계는 주의사항으로 마킹
if call.had_user_correction:
steps.append(
f"{i+1}. ⚠️ {call.description} (수정 포인트: {call.correction_note})"
)
else:
steps.append(f"{i+1}. {call.description}")
return steps
def extract_skill(session: Session) -> Skill:
procedure = extract_procedure_steps(session)
pitfalls = extract_error_patterns(session)
validation = extract_validation_steps(session)
return Skill(
name=generate_skill_name(session),
procedure=procedure,
pitfalls=pitfalls,
validation=validation,
tier=3,
)The key point is how extract_procedure_steps marks error recovery steps with ⚠️ and preserves them. The skill captures not just a simple success procedure, but also "where to be careful."
In the Validate step, along with tool registry verification, it scans for dangerous patterns such as prompt injection (in the context of AI agents, this refers to external inputs inadvertently modifying agent behavior — conceptually similar to SQL injection in web security, but targeting the LLM prompt), TODOs, and placeholders. The security layer is thin, but it's better than nothing.
Generated skills are saved to ~/.hermes/skills/ and are included in frontmatter scan targets starting from the next session. They cannot be used in the session in which they were created.
Example 2: Building a Consistent Coding Assistant via SOUL.md Customization
---
# ~/.hermes/SOUL.md
---
## 정체성
나는 시니어 풀스택 개발자 스타일로 응답한다. 핵심을 먼저 말하고, 부연은 짧게.
## 코딩 스타일
- TypeScript strict mode 기본
- async/await만 사용 (Promise.then 금지)
- 2-space 들여쓰기
## 응답 원칙
- 500자 넘는 답변은 요약 먼저, 상세 내용은 접어서
- 코드 블록엔 언어 항상 명시
- 설명 없이 코드만 요청받으면 코드만 반환It's recommended to separate project-specific context into a .hermes.md file. This is also where you can remember that keeping SOUL.md short improves cache hit rates.
---
# 프로젝트 루트/.hermes.md
---
## 프로젝트 컨텍스트
- 스택: Next.js 15 + NestJS + PostgreSQL
- 테스트: 실제 DB 연결 필수 (mock 금지)
- 배포: Vercel (frontend) + Railway (backend)| File | Scope | Modified By |
|---|---|---|
SOUL.md |
Global (all projects) | User + agent |
USER.md |
Global (user preferences) | Agent (automatic) |
.hermes.md |
Per-project | User |
Example 3: Automating a DevOps Workflow with Cron + Skill Combination
> Every day at 9 AM, compile a list of GitHub Actions failures,
categorize them by severity, and send them to Slack #dev-alerts.
If there are no failures, don't send a message.As Hermes processes this request, it simultaneously performs the following internally:
# 자동 생성된 스킬 파일 예시
---
name: github-actions-failure-report
description: GitHub Actions 실패를 심각도별 분류 후 Slack 전송
trigger: [github actions, CI failure, daily report]
tier: 3
cron: "0 9 * * *"
---
## 절차
1. GitHub API로 최근 24시간 워크플로 실행 결과 조회
2. 실패 항목 심각도 분류 (critical / warning / info)
3. critical 항목 없으면 실행 중단
4. 요약 메시지 포맷팅 후 Slack 전송
## 에러 처리
- GitHub API 타임아웃 시 3회 재시도
- Slack 전송 실패 시 이메일 폴백It's notable that the cron field is included directly in the skill file. Pillar 4 (Crons) and Pillar 2 (Skills) don't merely coexist — they integrate into a single file.
Pros and Cons
Advantages
| Item | Details |
|---|---|
| Cumulative learning | Reports of 40% speed improvement on similar tasks when holding 20+ self-generated skills |
| Context efficiency | Progressive Disclosure minimizes unnecessary token consumption |
| Cache optimization | Immutable system prompt maximizes Anthropic/OpenAI prompt cache utilization |
| Declarative configuration | SOUL.md, MEMORY.md, USER.md are all plain markdown, directly editable |
| Model-agnostic | Compatible with both Anthropic and OpenAI APIs |
| Open-source ecosystem | 520+ community skills on agentskills.io, including 16 official Anthropic skills |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Prompt injection risk | A malicious session exceeding the 5-tool-call threshold can permanently store a corrupted skill (Issue #25833) | Regularly audit auto-generated skills; actively use user-locked skills |
| No accuracy guarantee | No mechanism-level accuracy guarantee for auto-generated skills | Periodically review Curator scoring results; manually review critical skills |
| Memory size limit | Combined MEMORY.md + USER.md cap of ~1,300 tokens | Keep only high-priority items; separate the rest into per-project .hermes.md |
| No mid-session reflection | Memory updates during a session are not applied to the current prompt | Restart the session if immediate reflection is needed |
Risk of editing prompt_builder.py |
A globally affecting product file; unsuitable for general customization | Use SOUL.md, USER.md, and .hermes.md for customization |
User-locked skills: When a user locks a specific skill, the agent can only read it — modification is not permitted. Well-suited for stable procedures that don't need automatic improvement. Community discussion is ongoing in GitHub Issue #17583.
The Most Common Mistakes in Practice
-
Writing SOUL.md too long — SOUL.md occupies the first slot of the system prompt, but the longer it gets, the more its cache hit rate advantage erodes and the more context it consumes. It's more effective to include only core identity and distribute the rest to USER.md or
.hermes.md. -
Leaving auto-generated skills unreviewed — The Curator scores and cleans up on a 7-day cycle, but it's not perfect. Skills generated during error recovery in particular can solidify temporary workarounds for special circumstances into general procedures. It helps to make a habit of periodically browsing
~/.hermes/skills/. -
Not explicitly confirming important work mid-session — One of the conditions for Tier 3 skill generation is "explicit user confirmation." Responses like "Perfect" or "Keep using this going forward" act as skill generation triggers. Conversely, if you casually confirm something while trying an experimental approach, you may end up with an unwanted skill being generated.
Closing Thoughts
Once you understand this architecture, the difference between using Hermes simply as "a smarter chatbot" versus "a procedural library that grows to fit your workflow" becomes unmistakably clear. How you use it ultimately determines what kind of agent it becomes — that's the structure.
Three steps you can start right now:
-
Open
~/.hermes/SOUL.mddirectly and write the agent's response tone and coding style to match your preferences. You can check the current state with the> read me your soul filecommand, and the agent will update the file directly with a single line of feedback. -
Perform a task you repeat at least once a week 3–4 times and naturally satisfy the Tier 3 skill auto-generation conditions. Leave an explicit confirmation response like "Perfect" or "Keep using this going forward" when the task completes — this triggers skill generation.
-
Open the
~/.hermes/skills/directory roughly every two weeks and browse the auto-generated skills. Delete skills that don't match your intentions, and protect frequently used skills with user-locked to keep things running stably.
References
- Hermes Agent 공식 문서
- Architecture | Hermes Agent — NousResearch
- Prompt Assembly | Hermes Agent
- Skills System | Hermes Agent
- Creating Skills | Hermes Agent
- GitHub — NousResearch/hermes-agent
- Hermes Agent Five Pillars: Memory, Skills, Soul, Crons, and Self-Improvement | MindStudio
- Hermes Agent memory: SOUL.md, MEMORY.md and state.db | LumaDock
- Inside Hermes Agent: What 'Self-Improving AI Agent' Actually Means in Production | Saulius blog
- Self-created skills lack mechanism-level guarantees (Issue #25833) | GitHub
- Skills: user-locked vs self-improving tiers (Issue #17583) | GitHub
- Hermes Agent Masterclass | Daily Dose of Data Science
- The Compounding Agent: Why Hermes Is More Than Just a Pretty TUI
- Context and Prompt Management | DeepWiki