Hermes Agent: A Self-Improving AI Agent That Retains Learning Across Sessions
To be honest, there's a feeling I get every time I try a new AI agent tool. "Once I close this session, I'll have to explain everything from scratch again." Tell it the codebase structure, explain the team conventions, describe the stack — repeating the same things over and over makes you wonder whether it's a tool or a burden. Hermes Agent, released by Nous Research in February 2026, takes a fundamentally different architectural approach to this problem. Learning persists even when sessions end, and the more difficult problems it solves, the better an agent it becomes.
Surpassing 140,000 GitHub stars in just three months and becoming the most-used agent in the world by OpenRouter's metrics isn't mere marketing. With official support for RTX PC local inference through an NVIDIA collaboration, an MIT license, and zero telemetry, it has become a genuinely viable option even in enterprise environments. This article covers how Hermes Agent works, realistic scenarios for team use, and an honest look at its limitations.
Core Concepts
What Is a Self-Improving Agent?
Most existing AI tools are stateless. Whether it's a GPT-based copilot or the Claude CLI, when a session ends, everything that happened in that conversation disappears. Hermes Agent starts from a different premise.
Self-Improving Agent: An agent that automatically generates reusable Skill documents each time it solves a problem, then loads and applies them in future sessions. Over time, it handles the same types of problems faster and more accurately.
The core architecture is built on three principles:
| Principle | How It Works |
|---|---|
| Persistent Memory | MEMORY.md (environment, stack, rules) and USER.md (user profile) are automatically loaded at session start |
| Self-Improving Skills | After solving complex problems, the agent autonomously decides to write reusable Skill documents, with support for open community-sharing standards |
| Model-Agnostic Brain | The default model is Hermes-3, but it can be swapped for any endpoint — OpenAI, Anthropic, Ollama, OpenRouter, etc. |
I initially thought, "You're just having the agent manage memory files — isn't that the same thing?" But using it in practice, the difference is real. Once you define "our team uses pnpm, here's our branching strategy" in MEMORY.md, every subsequent session starts with that as a given. The time previously spent on context setup simply disappears.
The Skill System: How the Agent Builds Its Own Knowledge
The most distinctive part of Hermes Agent is its Skills. Skill creation happens through two paths: the user explicitly requests a save, or after solving a problem requiring multi-step reasoning, the agent autonomously decides "this could be useful again" and creates the document automatically. In other words, it's a structure where the agent accumulates experience without constant manual management.
For example, if the agent solves "how to create a custom exception filter in NestJS to send errors to Sentry" for the first time, it documents that process as a Skill file and saves it to the .hermes/skills/ folder. The next time a similar task comes up, it references the saved Skill before reasoning from scratch.
# Example structure of .hermes/skills/nestjs-sentry-exception-filter.md
---
name: nestjs-sentry-exception-filter
description: NestJS에서 전역 exception filter로 Sentry 에러 리포팅 연동
tags: [nestjs, sentry, error-handling]
---
## 문제
NestJS 앱에서 처리되지 않은 예외를 Sentry로 자동 전송해야 하는 경우
## 해결 패턴
1. @Catch() 데코레이터로 AllExceptionsFilter 구현
2. SentryService 주입 후 captureException 호출
3. APP_FILTER 토큰으로 전역 등록
## 코드
(에이전트가 작성한 실제 코드 스니펫)These Skills are designed as an open standard for community sharing (the sharing platform is currently in preparation for launch). You can keep team-internal Skills on an internal server and contribute only the general-purpose ones to the community.
Practical Application
Foundation Setup: Connecting MCP Servers
Before diving into practical examples, it's worth first looking at the MCP connection setup that underpins all of them. Model Context Protocol (MCP) is an AI agent-to-tool connectivity standard led by Anthropic, which allows agents to communicate with external systems like GitHub, databases, and cloud infrastructure without writing custom integration code each time. Think of it as standardizing interfaces the way REST APIs do.
Hermes Agent can connect to any MCP server with a single line of config:
// hermes.config.json
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
}
},
"linear": {
"command": "npx",
"args": ["-y", "@linear/mcp-server"],
"env": {
"LINEAR_API_KEY": "${LINEAR_API_KEY}"
}
}
}
}Once this is configured, you can build workflows where the agent directly operates GitHub or Linear with nothing but natural language instructions, as shown in the examples below.
Example 1: Automating the Team Daily Standup
This is one of the most practical use cases. Checking calendars, GitHub, and Linear every morning and posting a summary to the team channel is repetitive work where context still matters. Below is pseudocode illustrating the concept — refer to the official documentation for the actual API shapes.
# Conceptual example code — refer to official docs for actual API shapes
from hermes import Agent
agent = Agent(
memory_path=".hermes/MEMORY.md",
user_path=".hermes/USER.md"
)
@agent.schedule("0 9 * * 1-5") # Weekdays at 9 AM
async def daily_standup():
# Collect data from multiple sources via MCP tool calls
context = await agent.gather([
"github:list_pull_requests",
"linear:list_issues",
"calendar:get_today"
])
summary = await agent.run(
task="어제 머지된 PR, 오늘 진행 중인 이슈, 오늘 미팅 일정을 팀 스탠드업 형식으로 요약해줘",
context=context
)
await agent.send(
channel="slack://team-standup",
message=summary
)| Component | Role |
|---|---|
@agent.schedule |
SQLite-based scheduler, maintains schedule across server restarts |
agent.gather |
Simultaneously collects data from multiple sources via MCP |
agent.run |
Generates a summary based on the collected context |
agent.send |
Sends to one of 20+ supported messaging platforms |
This is a situation that comes up often in practice — standup bots tend to stop at just "listing PR numbers." I thought the same thing at first: "Is that all?" But as team context accumulates in USER.md and MEMORY.md, it changes. Around week three, when PR summaries start reading like "Issue #42 is blocking these PRs, which is why they're delayed," you really feel the difference.
Example 2: Infrastructure Monitoring and Incident Alerts
Running health checks every five minutes and sending alerts on failure is another common pattern. What sets Hermes Agent apart is that it goes beyond simple HTTP 200 checks — it can learn failure patterns and add context like "this error typically indicates a DB connection pool problem."
# hermes-monitor.yaml
monitors:
- name: api-health
url: https://api.yourservice.com/health
interval: "*/5 * * * *"
on_failure:
- notify:
channel: telegram://oncall-alerts
message_template: |
🚨 {{service_name}} 응답 없음 ({{status_code}})
마지막 정상: {{last_success_at}}
연속 실패: {{failure_count}}회
- run_skill: diagnose-api-failure # Runs a saved Skill automatically
checkpoint: true # Resume from last completed pointcheckpoint option: Even if a long pipeline fails partway through, it can retry from the last successful point. This prevents having to start over from the beginning in unstable network environments.
By pre-defining a failure diagnosis Skill like run_skill: diagnose-api-failure, the agent doesn't just send an alert — it begins root cause analysis on its own. As the Skills built early on accumulate, a system naturally emerges where "if you see these symptoms, check this first."
Example 3: Running Cost-Free with a Local LLM
Taking advantage of Hermes Agent's model-agnostic nature, a fully offline configuration using Ollama and a local GPU is also possible. With a GPU of 8GB VRAM or more, you can run the Hermes-3 8B model via Ollama, achieving 91% tool-call accuracy according to Nous Research's own benchmarks.
# Run Hermes-3 locally with Ollama
ollama pull nous-hermes3:8b// hermes.config.json — switch provider to ollama
{
"model": {
"provider": "ollama",
"name": "nous-hermes3:8b",
"base_url": "http://localhost:11434"
}
}# Start the agent
hermes start --config hermes.config.jsonThis configuration is especially useful for personal projects where API costs are a concern, or for internal environments where data cannot be sent externally.
Pros and Cons
Advantages
| Item | Detail |
|---|---|
| Self-Improvement Loop | Generates and refines Skills from experience; handles the same tasks better the longer you use it |
| Local Execution Performance | Hermes-3 8B + Ollama: 91% tool-call accuracy per Nous Research's own benchmarks |
| Security | Zero CVEs as of April 2026; all data stays on your own infrastructure |
| Fully Open Source | MIT license, no telemetry, no vendor lock-in |
| Multi-Platform Messaging | Single gateway connecting 20+ channels |
| Memory Plugin Ecosystem | Supports 8 external memory providers including Honcho, Mem0, and Hindsight |
Drawbacks — and How to Address Them
For comparison: OpenClaw is another open-source agent framework that emerged around the same time as Hermes Agent. In terms of Skills ecosystem size, OpenClaw has the lead, but Hermes Agent holds the advantage on security and licensing.
| Item | Detail | Mitigation |
|---|---|---|
| Deep Tool Chain Weakness | Accuracy degrades with 4–5+ sequential tool calls on 8B models | Use a 70B model or combine with LangGraph state checkpointing |
| TUI-First UX | Terminal-centric interface creates friction in GUI environments | Use the VS Code extension or third-party web UI plugins |
| Initial Infrastructure Cost | Local execution requires a minimum 8GB VRAM GPU or server | Reduce entry cost with Modal or Daytona serverless backends |
| Skill Ecosystem Still Early | Fewer Skills compared to OpenClaw (13K+ community skills) | Build internal Skills directly; use the awesome-hermes-agent repo |
| Context Limitations | Minimum 64K context size, snapshot delays, cron prompt overhead | Set shorter memory summary intervals; use Hindsight vector DB retrieval |
Atropos RL Framework: The reinforcement learning framework Nous Research used to train Hermes-3. Specialized in improving tool-call accuracy, it reduces the frequency of the agent incorrectly calling external APIs or commands.
Honcho User Modeling: A user modeling system that progressively learns a user's behavioral patterns, preferences, and work style. Integrated with Hermes Agent, agent responses become increasingly personalized over time.
Common Pitfalls in Practice
-
Trying to build perfect Skills from the start — I also tried to manually convert all our team patterns into Skill documents early on, but it only multiplied maintenance points. The foundational design is for the agent to automatically create Skill documents as it solves real problems. It's fine if they're incomplete at first — the agent will refine them over time.
-
Running chains of 5+ steps with an 8B model — Local 8B models are sufficient for simple repetitive tasks, but their limitations are clear for work requiring complex multi-step planning. When you find yourself thinking "why does it keep getting this wrong?", suspect the model size first. Switching to a 70B model or a cloud model produces noticeably better results.
-
Starting with an empty MEMORY.md — If you install the agent and start asking questions right away, responses will remain generic and context-free for the first few weeks. The value of Persistent Memory starts with how well you seed the context upfront. It's recommended to explicitly record your team's stack, coding conventions, and commonly used patterns in
MEMORY.mdfirst.
Closing Thoughts
Hermes Agent is the fastest-growing open-source agent and the framework that has most practically implemented cross-session learning accumulation. Session discontinuity, repetitive context entry, local execution costs — the real-world problems you hit when attaching agents to production work — it has approached these at the architectural level, and that is translating into real adoption.
Three steps you can take right now:
-
Install it locally and set up MEMORY.md — Check the install command in the official quickstart guide, then run
hermes initto interactively createMEMORY.mdandUSER.md. Simply recording your team's stack and coding conventions here will improve the quality of subsequent sessions. -
Build one small automation with an MCP connection — Connect the GitHub MCP server and start with a simple task you actually use, like "summarize the PRs opened this week." After the task completes, check the
.hermes/skills/folder — you'll find a Skill document has been created. That is exactly how Hermes Agent learns on its own. -
Connect a team channel and set up a standup summary pipeline — Connect Slack or Telegram, attach GitHub and Linear via MCP, and configure a daily morning standup summary. After two or three weeks of operation, compare how
MEMORY.mdhas changed — you'll be able to see firsthand how the self-improvement loop actually works.
References
- Hermes Agent Official Site | Nous Research
- Hermes Agent Official Documentation
- GitHub - NousResearch/hermes-agent
- Hermes Agent Quickstart Guide
- Persistent Memory Documentation
- User Stories & Use Cases Documentation
- NVIDIA Blog: Hermes Unlocks Self-Improving AI Agents
- Analytics Vidhya: Hermes Agent Guide
- DEV Community: Hermes Agent - A Self-Improving AI Agent That Runs Anywhere
- Turing Post: Full Comparison of Hermes Agent vs OpenClaw
- GitHub - NousResearch/Hermes-Function-Calling
- OpenRouter: Hermes Agent Integration Documentation
- Ollama: Hermes Agent Integration Documentation
- DeepWiki: NousResearch/hermes-agent
- Firecrawl: Hermes Agent Usage Guide