AI Agent-Based CI/CD Automation — Hermes Agent Crons' state.db Structure and Isolated Execution Mechanics
A single YAML indentation error that breaks an entire build, or half a day wasted on external SaaS account permission issues — I've been through it multiple times, and I'm sure most of you have too. Those moments of looking up GitHub Actions' on: block syntax again and thinking, "why do I even have to memorize this?" If you've ever felt frustrated by traditional YAML-based workflows, you'll find the Crons system in Hermes Agent — NousResearch's self-evolving AI agent framework — quite intriguing. "every night at 12am, push changes to GitHub" — this single sentence defines an entire nightly auto-commit pipeline. I'll be honest: when I first saw it, I was a little taken aback.
Here's the key point: even though the LLM itself decides whether to deploy, a mechanism that guarantees isolated execution without state contamination between jobs is already built in. This is why it's worth considering seriously as a CI/CD alternative, not just a "cool natural language interface."
In this article, we'll dig into the internal structure of Hermes Crons' state.db storage and how isolated execution is guaranteed. We'll walk through CI health monitoring, nightly code review summaries, and multi-step deployment pipelines with real code.
TL;DR
- Hermes Agent Crons defines scheduled tasks with a single natural language prompt sentence, and the LLM decides how to execute them.
- All execution history is automatically saved to
~/.hermes/state.db(SQLite WAL + FTS5), and a completely new session is created for each job.- Using
no_agentmode, which runs scripts without LLM costs, lets you build a cost-efficient hybrid pipeline.
Core Concepts
Why Hermes Agent Crons Is Different from CI/CD
Traditional CI/CD pipelines require you to explicitly specify "what to run" in YAML. Logic for analyzing test failure causes or deciding contextually whether to halt or proceed with a deployment must be implemented manually via shell scripts or complex conditionals. In contrast, Hermes Crons uses the LLM as the job's execution agent, so the agent interprets the intent expressed in a single prompt sentence and decides on its own how to execute.
Before diving into concrete examples, it's worth touching on the Skills concept first. Skills are a set of tools that can be attached to cron jobs. By connecting built-in or custom Skills like git-ops or docker-build to a job, the agent can actually invoke those tools to perform tasks like Git commits or Docker image builds. It's because of these Skills that a prompt like "deploy this" actually works. Writing Skills themselves is beyond the scope of this article and will be covered separately, but for now, think of them as "a set of tools the agent can use."
{
"name": "nightly-deploy",
"schedule": "0 2 * * *",
"skills": ["git-ops", "docker-build"],
"script": "~/.hermes/scripts/pre_check.sh",
"workdir": "/srv/myapp",
"delivery": {"channel": "slack", "target": "#deploys"}
}This might not look very different from a traditional CI job. The difference is that the stdout of the script field is injected into the prompt, and then the agent reads that output and judges "whether it's safe to proceed with deployment right now." It's the LLM's reasoning — not YAML conditionals — that handles the decision logic.
state.db — Where All of Hermes' Execution History Accumulates
Hermes stores all sessions and messages in a single SQLite file at ~/.hermes/state.db. It operates in WAL (Write-Ahead Logging) mode, which handles concurrent reads and writes reliably and provides production-level reliability despite being file-based.
| Table | Role |
|---|---|
sessions |
Session metadata — platform, model, start/end time, token count, cost, title |
messages |
Full message history — role, content, tool_calls, including reasoning tokens |
messages_fts |
FTS5 virtual table — automatically indexes content, tool_name, and tool_calls |
schema_version |
Single row tracking migration version |
messages_fts is an FTS5 (Full-Text Search 5) virtual table that is automatically synchronized via trigger whenever messages are written. Because it uses a "content table" approach, it stores only the index separately, minimizing DB size while allowing months of execution logs to be searched in milliseconds. The agent itself can perform full-text searches across all past conversations using the built-in session_search tool. I'll admit my first reaction was "SQLite can do that?" — but having used it in practice, even tens of thousands of messages are searched instantly.
WAL Mode (Write-Ahead Logging): Write operations are first recorded in a separate WAL file and then applied to the main DB at checkpoint time. Reads and writes don't block each other, dramatically improving concurrency.
There is one important caveat. The official documentation explicitly prohibits directly querying state.db from external sources, because the internal schema may change between releases. If you need data for audit purposes, it's safer to access it through the official API or the session_search tool.
Isolation Mechanism — Why There's No State Contamination Between Jobs
When running cron jobs in production, you sometimes find yourself wondering, "did the state from the previous job affect this one?" Hermes blocks this problem structurally.
Every time the scheduler executes a job, it creates a fresh AIAgent session with completely empty conversation history and context. Regardless of what state a previous job was in or what failures it encountered, the next job is completely unaffected. It's equivalent to the guarantee that container-based CI runners provide by spinning up a new image for every execution, delivered at the filesystem level.
Duplicate execution prevention is handled via a ~/.hermes/cron/.tick.lock file lock. Because it's a cross-process file lock, even if multiple Hermes instances start on the same machine, the same job is prevented from running twice simultaneously.
~/.hermes/
├── state.db # Persistent storage for all sessions and messages
├── cron/
│ ├── .tick.lock # File lock for duplicate execution prevention
│ ├── jobs/ # Job definition JSON files
│ └── output/
│ └── {job_id}/
│ └── {timestamp}.md # Per-job execution logs
└── scripts/ # Shell scripts for no_agent modeIsolation: Because each job starts in an independent session, a failure or side effect in job A cannot contaminate job B's execution environment.
Practical Applications
Example 1: CI Health Monitoring Gate (no_agent Watchdog Mode)
The no_agent option introduced in the v0.13.0 Tenacity release is honestly the feature I was most glad to see. For simple check jobs that don't need LLM reasoning, you can eliminate API costs entirely. Specify it as "no_agent": true in JSON config, or as no_agent=True when creating jobs directly via the Python API.
#!/bin/bash
# ~/.hermes/scripts/ci_check.sh
# Use jq -r flag to extract raw string without quotes
STATUS=$(curl -s https://api.github.com/repos/org/repo/actions/runs \
| jq -r '.workflow_runs[0].conclusion')
[ "$STATUS" != "success" ] && echo "CI FAILED: $STATUS"
# If stdout is empty, stay silent — no notification on success{
"name": "ci-health-watchdog",
"schedule": "*/5 * * * *",
"script": "~/.hermes/scripts/ci_check.sh",
"no_agent": true,
"delivery": {
"channel": "slack",
"target": "#alerts",
"only_if_output": true
}
}| Component | Role |
|---|---|
"no_agent": true |
Passes script stdout directly without any LLM calls |
only_if_output: true |
Silences empty stdout — "notify only on problems" pattern |
*/5 * * * * |
Runs every 5 minutes (minimum unit is 60 seconds) |
The strength of this pattern is zero API cost. For jobs that can be handled by scripts — CI status checks, disk space alerts, service health checks — registering them with "no_agent": true lets you run them at high frequency without worrying about costs.
Example 2: Nightly Automated Code Review Summary
This job retrieves the PR list every midnight, summarizes changes, categorizes high-risk items, and delivers them to Slack. Setting workdir to the repository root automatically injects AGENTS.md or CLAUDE.md, helping the agent understand the project context.
{
"name": "nightly-pr-review-summary",
"schedule": "0 0 * * *",
"skills": ["github-mcp"],
"workdir": "/path/to/myrepo",
"prompt": "Retrieve the list of PRs opened today, summarize the changes in each PR, categorize items with high deployment risk, and send the results to the Slack #dev-review channel",
"delivery": {
"channel": "slack",
"target": "#dev-review"
}
}To implement this with traditional CI/CD, you'd need to write the entire pipeline as scripts: GitHub API calls → diff parsing → risk classification logic → Slack message formatting. In Hermes, a single prompt sentence replaces all of that logic. LLM API costs are incurred, but compared to the development time it would take to implement this manually as scripts, that's a reasonable tradeoff.
Example 3: Chaining Dependent Jobs with context_from
Using the context_from field, you can automatically prepend job A's last execution stdout to job B's prompt, composing a sequential pipeline. What gets injected is the full text output of the preceding job, and there is a length limit — so be aware that if the preceding job's output is excessively long, it may be truncated.
[
{
"name": "run-tests",
"schedule": "0 1 * * *",
"skills": ["pytest-runner"],
"prompt": "Run the entire test suite and summarize any failures and coverage"
},
{
"name": "deploy-decision",
"schedule": "30 1 * * *",
"context_from": "run-tests",
"skills": ["git-ops", "docker-build"],
"workdir": "/srv/myapp",
"prompt": "Based on the test results, decide whether to proceed with a production deployment, and execute the deployment if it is safe to do so"
}
]| Field | Behavior |
|---|---|
context_from: "run-tests" |
Automatically prepends the last execution stdout of the run-tests job to the prompt |
"30 1 * * *" |
Set 30 minutes after the first job (01:00) to allow buffer for expected completion time (01:30) |
What makes this pattern interesting is that instead of hardcoding deployment decision logic like "deploy if test failures are 0 and coverage is above 80%," the agent reads the context of the test results — which modules failed, whether any tests were temporarily skipped, etc. — and makes the judgment. I was skeptical at first about whether this would actually be accurate, but with well-structured prompts and test reports injected together, it turns out to be more accurate than expected.
Pros and Cons Analysis
Advantages
| Item | Details |
|---|---|
| Configuration simplicity | Define scheduled tasks with a single natural language prompt sentence, no YAML |
| LLM reasoning | Can handle contextual decisions that rule-based CI/CD cannot — analyzing test failure causes, deciding whether to halt a deployment, etc. |
| No external dependencies | No serverless or SaaS accounts needed; self-contained in a single ~/.hermes/ directory |
| Isolation reliability | No state contamination between jobs; file lock guarantees no duplicate execution |
| Multi-channel delivery | Built-in support for delivering results to Slack, Discord, Email, and other channels |
| FTS5 audit trail | All execution history is automatically preserved in state.db in a full-text searchable form |
| no_agent cost optimization | Simple script jobs can run without LLM calls |
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Unstable state.db schema | Internal schema may change between releases; direct queries are officially prohibited | Use only the official API and session_search tool |
| 60-second minimum granularity | Scheduler tick interval is 60 seconds — sub-second precision is not possible | Keep traditional cron for jobs requiring sub-second execution |
| FTS5 dependency | SQLite FTS5 may be missing in some Python 3.11 macOS builds | Use official Docker images to standardize the environment |
| Single-machine limitation | Default setup is based on local filesystem — difficult for distributed team collaboration | Wait for Pluggable SessionDB RFC (#23717) to complete, or use Docker Compose for a shared environment |
| LLM costs | All jobs except no_agent incur LLM API calls |
Aggressively use "no_agent": true for high-frequency simple jobs |
| Debugging visibility | Failure logs exist only in ~/.hermes/cron/output/{job_id}/{timestamp}.md |
Recommend integrating Web Dashboard or a dedicated log aggregation tool |
FTS5 (Full-Text Search 5): A full-text search extension module built into SQLite. It is significantly faster than LIKE searches and supports tokenization, ranking, and phrase search. However, it must be enabled at compile time and may be missing from some package builds.
The Most Common Mistakes in Practice
-
Attempting to extract audit logs by directly querying state.db — The schema can change at any time, and it's also an officially prohibited use case. If you need an audit trail, using the
session_searchtool or the Web Dashboard's session viewer is the safe approach. -
Designing all jobs to run as LLM agents — If you omit the
"no_agent"option for simple jobs like disk checks or HTTP pings, API costs accumulate unnecessarily. It's important to develop the habit of classifying jobs that require no reasoning as"no_agent": truefrom the start. -
Timing misses in
context_fromchains — If the downstream job's schedule triggers while the upstream job is still running, it will receive the output from the previous run. It's best to schedule the downstream job with at least 10–15 minutes of buffer added to the upstream job's expected completion time.
Closing Thoughts
Hermes Agent's Crons system is a tool that lets you actually try, at a production level, a shift from the CI/CD paradigm of "explicitly specifying what to run in YAML" to an agentic execution model where the LLM directly decides whether to deploy. Thanks to two foundations — state.db's FTS5-based audit trail and fully isolated per-job sessions — you can weave the LLM's judgment into your pipeline without sacrificing the reproducibility and traceability that traditional CI/CD provides.
Three steps you can start with right now:
- It's recommended to first register a single
no_agentwatchdog job. Install withpip install hermes-agent, write a simple health check script in~/.hermes/scripts/, and place a JSON job file configured with"no_agent": truein the~/.hermes/cron/jobs/directory. You can see how the cron system works firsthand, with zero LLM API costs. - Running the Web Dashboard with the
hermes dashboardcommand is helpful. In the local dashboard — which integrates a cron manager, live log viewer, and session management — you can view the log files under~/.hermes/cron/output/through a UI. - It's also worth porting one of the most frequently touched jobs in your existing CI into an LLM agent job. It's easier to debug if you start as a standalone job without
context_fromfirst, and then connect it into acontext_fromchain once the behavior is confirmed stable.
References
Official Documentation
- Scheduled Tasks (Cron) | Hermes Agent Official Docs
- Cron Internals | Hermes Agent Developer Guide
- Script-Only Cron Jobs (No LLM) | Hermes Agent
- Sessions | Hermes Agent
Source Code & Releases
- hermes_state.py source code — NousResearch/hermes-agent
- Release v0.13.0 Tenacity — NousResearch/hermes-agent
- RFC: Pluggable SessionDB Provider — GitHub Issue #23717
In-Depth Analysis & Community