Hermes Agent: A Self-Improving AI Agent That Retains Learning Across Sessions

To be honest, there's a feeling I get every time I try a new AI agent tool. "Once I close this session, I'll have to explain everything from scratch again." Tell it the codebase structure, explain the team conventions, describe the stack — repeating the same things over and over makes you wonder whether it's a tool or a burden. Hermes Agent, released by Nous Research in February 2026, takes a fundamentally different architectural approach to this problem. Learning persists even when sessions end, and the more difficult problems it solves, the better an agent it becomes.

Surpassing 140,000 GitHub stars in just three months and becoming the most-used agent in the world by OpenRouter's metrics isn't mere marketing. With official support for RTX PC local inference through an NVIDIA collaboration, an MIT license, and zero telemetry, it has become a genuinely viable option even in enterprise environments. This article covers how Hermes Agent works, realistic scenarios for team use, and an honest look at its limitations.

Core Concepts

What Is a Self-Improving Agent?

Most existing AI tools are stateless. Whether it's a GPT-based copilot or the Claude CLI, when a session ends, everything that happened in that conversation disappears. Hermes Agent starts from a different premise.

Self-Improving Agent: An agent that automatically generates reusable Skill documents each time it solves a problem, then loads and applies them in future sessions. Over time, it handles the same types of problems faster and more accurately.

The core architecture is built on three principles:

Principle	How It Works
Persistent Memory	`MEMORY.md` (environment, stack, rules) and `USER.md` (user profile) are automatically loaded at session start
Self-Improving Skills	After solving complex problems, the agent autonomously decides to write reusable Skill documents, with support for open community-sharing standards
Model-Agnostic Brain	The default model is Hermes-3, but it can be swapped for any endpoint — OpenAI, Anthropic, Ollama, OpenRouter, etc.

I initially thought, "You're just having the agent manage memory files — isn't that the same thing?" But using it in practice, the difference is real. Once you define "our team uses pnpm, here's our branching strategy" in MEMORY.md, every subsequent session starts with that as a given. The time previously spent on context setup simply disappears.

The Skill System: How the Agent Builds Its Own Knowledge

The most distinctive part of Hermes Agent is its Skills. Skill creation happens through two paths: the user explicitly requests a save, or after solving a problem requiring multi-step reasoning, the agent autonomously decides "this could be useful again" and creates the document automatically. In other words, it's a structure where the agent accumulates experience without constant manual management.

For example, if the agent solves "how to create a custom exception filter in NestJS to send errors to Sentry" for the first time, it documents that process as a Skill file and saves it to the .hermes/skills/ folder. The next time a similar task comes up, it references the saved Skill before reasoning from scratch.

yaml

# Example structure of .hermes/skills/nestjs-sentry-exception-filter.md
---
name: nestjs-sentry-exception-filter
description: NestJS에서 전역 exception filter로 Sentry 에러 리포팅 연동
tags: [nestjs, sentry, error-handling]
---
 
## 문제
NestJS 앱에서 처리되지 않은 예외를 Sentry로 자동 전송해야 하는 경우
 
## 해결 패턴
1. @Catch() 데코레이터로 AllExceptionsFilter 구현
2. SentryService 주입 후 captureException 호출
3. APP_FILTER 토큰으로 전역 등록
 
## 코드
(에이전트가 작성한 실제 코드 스니펫)

These Skills are designed as an open standard for community sharing (the sharing platform is currently in preparation for launch). You can keep team-internal Skills on an internal server and contribute only the general-purpose ones to the community.

Practical Application

Foundation Setup: Connecting MCP Servers

Before diving into practical examples, it's worth first looking at the MCP connection setup that underpins all of them. Model Context Protocol (MCP) is an AI agent-to-tool connectivity standard led by Anthropic, which allows agents to communicate with external systems like GitHub, databases, and cloud infrastructure without writing custom integration code each time. Think of it as standardizing interfaces the way REST APIs do.

Hermes Agent can connect to any MCP server with a single line of config:

json

// hermes.config.json
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "linear": {
      "command": "npx",
      "args": ["-y", "@linear/mcp-server"],
      "env": {
        "LINEAR_API_KEY": "${LINEAR_API_KEY}"
      }
    }
  }
}

Once this is configured, you can build workflows where the agent directly operates GitHub or Linear with nothing but natural language instructions, as shown in the examples below.

Example 1: Automating the Team Daily Standup

This is one of the most practical use cases. Checking calendars, GitHub, and Linear every morning and posting a summary to the team channel is repetitive work where context still matters. Below is pseudocode illustrating the concept — refer to the official documentation for the actual API shapes.

python

# Conceptual example code — refer to official docs for actual API shapes
from hermes import Agent
 
agent = Agent(
    memory_path=".hermes/MEMORY.md",
    user_path=".hermes/USER.md"
)
 
@agent.schedule("0 9 * * 1-5")  # Weekdays at 9 AM
async def daily_standup():
    # Collect data from multiple sources via MCP tool calls
    context = await agent.gather([
        "github:list_pull_requests",
        "linear:list_issues",
        "calendar:get_today"
    ])
 
    summary = await agent.run(
        task="어제 머지된 PR, 오늘 진행 중인 이슈, 오늘 미팅 일정을 팀 스탠드업 형식으로 요약해줘",
        context=context
    )
 
    await agent.send(
        channel="slack://team-standup",
        message=summary
    )

Component	Role
`@agent.schedule`	SQLite-based scheduler, maintains schedule across server restarts
`agent.gather`	Simultaneously collects data from multiple sources via MCP
`agent.run`	Generates a summary based on the collected context
`agent.send`	Sends to one of 20+ supported messaging platforms

This is a situation that comes up often in practice — standup bots tend to stop at just "listing PR numbers." I thought the same thing at first: "Is that all?" But as team context accumulates in USER.md and MEMORY.md, it changes. Around week three, when PR summaries start reading like "Issue #42 is blocking these PRs, which is why they're delayed," you really feel the difference.

Example 2: Infrastructure Monitoring and Incident Alerts

Running health checks every five minutes and sending alerts on failure is another common pattern. What sets Hermes Agent apart is that it goes beyond simple HTTP 200 checks — it can learn failure patterns and add context like "this error typically indicates a DB connection pool problem."

yaml

# hermes-monitor.yaml
monitors:
  - name: api-health
    url: https://api.yourservice.com/health
    interval: "*/5 * * * *"
    on_failure:
      - notify:
          channel: telegram://oncall-alerts
          message_template: |
            🚨 {{service_name}} 응답 없음 ({{status_code}})
            마지막 정상: {{last_success_at}}
            연속 실패: {{failure_count}}회
      - run_skill: diagnose-api-failure  # Runs a saved Skill automatically
    checkpoint: true  # Resume from last completed point

checkpoint option: Even if a long pipeline fails partway through, it can retry from the last successful point. This prevents having to start over from the beginning in unstable network environments.

By pre-defining a failure diagnosis Skill like run_skill: diagnose-api-failure, the agent doesn't just send an alert — it begins root cause analysis on its own. As the Skills built early on accumulate, a system naturally emerges where "if you see these symptoms, check this first."

Example 3: Running Cost-Free with a Local LLM

Taking advantage of Hermes Agent's model-agnostic nature, a fully offline configuration using Ollama and a local GPU is also possible. With a GPU of 8GB VRAM or more, you can run the Hermes-3 8B model via Ollama, achieving 91% tool-call accuracy according to Nous Research's own benchmarks.

bash

# Run Hermes-3 locally with Ollama
ollama pull nous-hermes3:8b

json

// hermes.config.json — switch provider to ollama
{
  "model": {
    "provider": "ollama",
    "name": "nous-hermes3:8b",
    "base_url": "http://localhost:11434"
  }
}

bash

# Start the agent
hermes start --config hermes.config.json

This configuration is especially useful for personal projects where API costs are a concern, or for internal environments where data cannot be sent externally.

Pros and Cons

Advantages

Item	Detail
Self-Improvement Loop	Generates and refines Skills from experience; handles the same tasks better the longer you use it
Local Execution Performance	Hermes-3 8B + Ollama: 91% tool-call accuracy per Nous Research's own benchmarks
Security	Zero CVEs as of April 2026; all data stays on your own infrastructure
Fully Open Source	MIT license, no telemetry, no vendor lock-in
Multi-Platform Messaging	Single gateway connecting 20+ channels
Memory Plugin Ecosystem	Supports 8 external memory providers including Honcho, Mem0, and Hindsight

Drawbacks — and How to Address Them

For comparison: OpenClaw is another open-source agent framework that emerged around the same time as Hermes Agent. In terms of Skills ecosystem size, OpenClaw has the lead, but Hermes Agent holds the advantage on security and licensing.

Item	Detail	Mitigation
Deep Tool Chain Weakness	Accuracy degrades with 4–5+ sequential tool calls on 8B models	Use a 70B model or combine with LangGraph state checkpointing
TUI-First UX	Terminal-centric interface creates friction in GUI environments	Use the VS Code extension or third-party web UI plugins
Initial Infrastructure Cost	Local execution requires a minimum 8GB VRAM GPU or server	Reduce entry cost with Modal or Daytona serverless backends
Skill Ecosystem Still Early	Fewer Skills compared to OpenClaw (13K+ community skills)	Build internal Skills directly; use the `awesome-hermes-agent` repo
Context Limitations	Minimum 64K context size, snapshot delays, cron prompt overhead	Set shorter memory summary intervals; use Hindsight vector DB retrieval

Atropos RL Framework: The reinforcement learning framework Nous Research used to train Hermes-3. Specialized in improving tool-call accuracy, it reduces the frequency of the agent incorrectly calling external APIs or commands.

Honcho User Modeling: A user modeling system that progressively learns a user's behavioral patterns, preferences, and work style. Integrated with Hermes Agent, agent responses become increasingly personalized over time.

Common Pitfalls in Practice

Trying to build perfect Skills from the start — I also tried to manually convert all our team patterns into Skill documents early on, but it only multiplied maintenance points. The foundational design is for the agent to automatically create Skill documents as it solves real problems. It's fine if they're incomplete at first — the agent will refine them over time.
Running chains of 5+ steps with an 8B model — Local 8B models are sufficient for simple repetitive tasks, but their limitations are clear for work requiring complex multi-step planning. When you find yourself thinking "why does it keep getting this wrong?", suspect the model size first. Switching to a 70B model or a cloud model produces noticeably better results.
Starting with an empty MEMORY.md — If you install the agent and start asking questions right away, responses will remain generic and context-free for the first few weeks. The value of Persistent Memory starts with how well you seed the context upfront. It's recommended to explicitly record your team's stack, coding conventions, and commonly used patterns in MEMORY.md first.

Closing Thoughts

Hermes Agent is the fastest-growing open-source agent and the framework that has most practically implemented cross-session learning accumulation. Session discontinuity, repetitive context entry, local execution costs — the real-world problems you hit when attaching agents to production work — it has approached these at the architectural level, and that is translating into real adoption.

Three steps you can take right now:

Install it locally and set up MEMORY.md — Check the install command in the official quickstart guide, then run hermes init to interactively create MEMORY.md and USER.md. Simply recording your team's stack and coding conventions here will improve the quality of subsequent sessions.
Build one small automation with an MCP connection — Connect the GitHub MCP server and start with a simple task you actually use, like "summarize the PRs opened this week." After the task completes, check the .hermes/skills/ folder — you'll find a Skill document has been created. That is exactly how Hermes Agent learns on its own.
Connect a team channel and set up a standup summary pipeline — Connect Slack or Telegram, attach GitHub and Linear via MCP, and configure a daily morning standup summary. After two or three weeks of operation, compare how MEMORY.md has changed — you'll be able to see firsthand how the self-improvement loop actually works.

References

Hermes Agent: A Self-Improving AI Agent That Retains Learning Across Sessions | DEV BAK - 기술블로그

Hermes Agent: A Self-Improving AI Agent That Retains Learning Across Sessions

Core Concepts

What Is a Self-Improving Agent?

Self-Improving Agent: An agent that automatically generates reusable Skill documents each time it solves a problem, then loads and applies them in future sessions. Over time, it handles the same types of problems faster and more accurately.

The core architecture is built on three principles:

Principle	How It Works
Persistent Memory	`MEMORY.md` (environment, stack, rules) and `USER.md` (user profile) are automatically loaded at session start
Self-Improving Skills	After solving complex problems, the agent autonomously decides to write reusable Skill documents, with support for open community-sharing standards
Model-Agnostic Brain	The default model is Hermes-3, but it can be swapped for any endpoint — OpenAI, Anthropic, Ollama, OpenRouter, etc.

The Skill System: How the Agent Builds Its Own Knowledge

yaml

# Example structure of .hermes/skills/nestjs-sentry-exception-filter.md
---
name: nestjs-sentry-exception-filter
description: NestJS에서 전역 exception filter로 Sentry 에러 리포팅 연동
tags: [nestjs, sentry, error-handling]
---
 
## 문제
NestJS 앱에서 처리되지 않은 예외를 Sentry로 자동 전송해야 하는 경우
 
## 해결 패턴
1. @Catch() 데코레이터로 AllExceptionsFilter 구현
2. SentryService 주입 후 captureException 호출
3. APP_FILTER 토큰으로 전역 등록
 
## 코드
(에이전트가 작성한 실제 코드 스니펫)

Practical Application

Foundation Setup: Connecting MCP Servers

Hermes Agent can connect to any MCP server with a single line of config:

json

// hermes.config.json
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "linear": {
      "command": "npx",
      "args": ["-y", "@linear/mcp-server"],
      "env": {
        "LINEAR_API_KEY": "${LINEAR_API_KEY}"
      }
    }
  }
}

Once this is configured, you can build workflows where the agent directly operates GitHub or Linear with nothing but natural language instructions, as shown in the examples below.

Example 1: Automating the Team Daily Standup

python

# Conceptual example code — refer to official docs for actual API shapes
from hermes import Agent
 
agent = Agent(
    memory_path=".hermes/MEMORY.md",
    user_path=".hermes/USER.md"
)
 
@agent.schedule("0 9 * * 1-5")  # Weekdays at 9 AM
async def daily_standup():
    # Collect data from multiple sources via MCP tool calls
    context = await agent.gather([
        "github:list_pull_requests",
        "linear:list_issues",
        "calendar:get_today"
    ])
 
    summary = await agent.run(
        task="어제 머지된 PR, 오늘 진행 중인 이슈, 오늘 미팅 일정을 팀 스탠드업 형식으로 요약해줘",
        context=context
    )
 
    await agent.send(
        channel="slack://team-standup",
        message=summary
    )

Component	Role
`@agent.schedule`	SQLite-based scheduler, maintains schedule across server restarts
`agent.gather`	Simultaneously collects data from multiple sources via MCP
`agent.run`	Generates a summary based on the collected context
`agent.send`	Sends to one of 20+ supported messaging platforms

Example 2: Infrastructure Monitoring and Incident Alerts

yaml

# hermes-monitor.yaml
monitors:
  - name: api-health
    url: https://api.yourservice.com/health
    interval: "*/5 * * * *"
    on_failure:
      - notify:
          channel: telegram://oncall-alerts
          message_template: |
            🚨 {{service_name}} 응답 없음 ({{status_code}})
            마지막 정상: {{last_success_at}}
            연속 실패: {{failure_count}}회
      - run_skill: diagnose-api-failure  # Runs a saved Skill automatically
    checkpoint: true  # Resume from last completed point

checkpoint option: Even if a long pipeline fails partway through, it can retry from the last successful point. This prevents having to start over from the beginning in unstable network environments.

Example 3: Running Cost-Free with a Local LLM

bash

# Run Hermes-3 locally with Ollama
ollama pull nous-hermes3:8b

json

// hermes.config.json — switch provider to ollama
{
  "model": {
    "provider": "ollama",
    "name": "nous-hermes3:8b",
    "base_url": "http://localhost:11434"
  }
}

bash

# Start the agent
hermes start --config hermes.config.json

This configuration is especially useful for personal projects where API costs are a concern, or for internal environments where data cannot be sent externally.

Pros and Cons

Advantages

Item	Detail
Self-Improvement Loop	Generates and refines Skills from experience; handles the same tasks better the longer you use it
Local Execution Performance	Hermes-3 8B + Ollama: 91% tool-call accuracy per Nous Research's own benchmarks
Security	Zero CVEs as of April 2026; all data stays on your own infrastructure
Fully Open Source	MIT license, no telemetry, no vendor lock-in
Multi-Platform Messaging	Single gateway connecting 20+ channels
Memory Plugin Ecosystem	Supports 8 external memory providers including Honcho, Mem0, and Hindsight

Drawbacks — and How to Address Them

Item	Detail	Mitigation
Deep Tool Chain Weakness	Accuracy degrades with 4–5+ sequential tool calls on 8B models	Use a 70B model or combine with LangGraph state checkpointing
TUI-First UX	Terminal-centric interface creates friction in GUI environments	Use the VS Code extension or third-party web UI plugins
Initial Infrastructure Cost	Local execution requires a minimum 8GB VRAM GPU or server	Reduce entry cost with Modal or Daytona serverless backends
Skill Ecosystem Still Early	Fewer Skills compared to OpenClaw (13K+ community skills)	Build internal Skills directly; use the `awesome-hermes-agent` repo
Context Limitations	Minimum 64K context size, snapshot delays, cron prompt overhead	Set shorter memory summary intervals; use Hindsight vector DB retrieval

Atropos RL Framework: The reinforcement learning framework Nous Research used to train Hermes-3. Specialized in improving tool-call accuracy, it reduces the frequency of the agent incorrectly calling external APIs or commands.

Honcho User Modeling: A user modeling system that progressively learns a user's behavioral patterns, preferences, and work style. Integrated with Hermes Agent, agent responses become increasingly personalized over time.

Common Pitfalls in Practice

Trying to build perfect Skills from the start — I also tried to manually convert all our team patterns into Skill documents early on, but it only multiplied maintenance points. The foundational design is for the agent to automatically create Skill documents as it solves real problems. It's fine if they're incomplete at first — the agent will refine them over time.
Running chains of 5+ steps with an 8B model — Local 8B models are sufficient for simple repetitive tasks, but their limitations are clear for work requiring complex multi-step planning. When you find yourself thinking "why does it keep getting this wrong?", suspect the model size first. Switching to a 70B model or a cloud model produces noticeably better results.
Starting with an empty MEMORY.md — If you install the agent and start asking questions right away, responses will remain generic and context-free for the first few weeks. The value of Persistent Memory starts with how well you seed the context upfront. It's recommended to explicitly record your team's stack, coding conventions, and commonly used patterns in MEMORY.md first.

Closing Thoughts

Three steps you can take right now:

Install it locally and set up MEMORY.md — Check the install command in the official quickstart guide, then run hermes init to interactively create MEMORY.md and USER.md. Simply recording your team's stack and coding conventions here will improve the quality of subsequent sessions.
Build one small automation with an MCP connection — Connect the GitHub MCP server and start with a simple task you actually use, like "summarize the PRs opened this week." After the task completes, check the .hermes/skills/ folder — you'll find a Skill document has been created. That is exactly how Hermes Agent learns on its own.
Connect a team channel and set up a standup summary pipeline — Connect Slack or Telegram, attach GitHub and Linear via MCP, and configure a daily morning standup summary. After two or three weeks of operation, compare how MEMORY.md has changed — you'll be able to see firsthand how the self-improvement loop actually works.

Core Concepts

What Is a Self-Improving Agent?

The Skill System: How the Agent Builds Its Own Knowledge

Practical Application

Foundation Setup: Connecting MCP Servers

Example 1: Automating the Team Daily Standup

Example 2: Infrastructure Monitoring and Incident Alerts

Example 3: Running Cost-Free with a Local LLM

Pros and Cons

Advantages

Drawbacks — and How to Address Them

Common Pitfalls in Practice

Closing Thoughts

References

Core Concepts

What Is a Self-Improving Agent?

The Skill System: How the Agent Builds Its Own Knowledge

Practical Application

Foundation Setup: Connecting MCP Servers

Example 1: Automating the Team Daily Standup

Example 2: Infrastructure Monitoring and Incident Alerts

Example 3: Running Cost-Free with a Local LLM

Pros and Cons

Advantages

Drawbacks — and How to Address Them

Common Pitfalls in Practice

Closing Thoughts

References

Recommended Posts

Building an AI Agent Monitoring & Evaluation System: Catching Quality That Silently Breaks in Production with DeepEval and Langfuse

Five Inference Optimization Techniques to Double or Quadruple LLM Serving Throughput on the Same GPU — From Quantization to Speculative Decoding

Figma MCP Server + Claude Code/Cursor Integration: Building React Components from a Single Design URL

Boosting AI Component Reuse with Figma Code Connect — From Mapping to Measurement in Large-Scale Design Systems

Cutting the Design-to-Code Iteration Cycle by Up to 80% with Figma MCP: Practical Integration of Model Context Protocol and AI Coding Agents

Applying Coding Rules and Design Rules Simultaneously to AI Agents — How to Use CLAUDE.md and DESIGN.md Together for Claude Code Team Setup