Hermes Agent: The Agent That Accumulates

How Nous Research built an AI agent that learns from its own work, and what it took to make that idea hold up in production.

Timeline

2025-07-22
Repository created by Nous Research; pre
public foundation (v0.1.0 internal)
2026-03-12
v0.2.0: first public tagged release; 216 merged PRs from 63 contributors; multi
platform messaging gateway, MCP client, skills ecosystem, 3,289 tests
2026-03-17
v0.3.0: unified streaming, first
class plugin architecture, native Anthropic provider, Honcho memory integration, voice mode, concurrent tool execution
2026-03-23
v0.4.0: OpenAI
compatible API server, six new messaging adapters, @ context references, MCP server management with OAuth 2.1, gateway prompt caching
2026-03-30
v0.6.0: multi
instance profiles, MCP server mode, official Docker container, ordered fallback provider chain
2026-04-08
v0.8.0: background task auto
notifications, live model switching across all platforms, security hardening pass, activity-based inactivity timeouts
2026-04-13
v0.9.0: local web dashboard, Android/Termux support, iMessage via BlueBubbles, WeChat and WeCom adapters, Fast Mode for OpenAI and Anthropic
2026-04-23
v0.11.0: full React/Ink TUI rewrite, pluggable transport architecture, native AWS Bedrock, QQBot (17th platform), GPT
5.5 via Codex OAuth
2026-04-30
v0.12.0: autonomous Curator agent, substantially upgraded self
improvement loop, 19th messaging platform, 213+ community contributors

The Problem With Agents That Forget

When Nous Research started building Hermes Agent in mid-2025, open-source AI agents were plentiful. Coding agents, research agents, shell agents. Most shared the same structural property: stateless between sessions. You could load a system prompt, inject files into context, but the agent had no mechanism for learning from what it had done. Every new session started fresh.

Several teams were working on this. The common solution was some form of memory: write selected facts to a file or database after each session, retrieve relevant ones at the start of the next. Hermes Agent does this too. But Nous Research's thesis was that memory alone wasn't enough. The missing piece was procedural knowledge.

Episodic memory captures facts: this user prefers Python over JavaScript, their production database is PostgreSQL 14, they work in the Pacific time zone. Procedural knowledge captures how to do things: when this user asks me to refactor a function, first run the test suite, then check for callers across the codebase, then make the change, then run tests again. These are different kinds of information requiring different storage mechanisms. Facts compress into bullet points. Procedures are more like small programs: ordered, conditional, reusable.

Skills in Hermes Agent are Markdown files. They live in ~/.hermes/skills/ and are loaded into context when the agent judges them relevant. A skill might describe how to deploy to a specific infrastructure, how to navigate a particular codebase, or how to interact with an API not covered by the built-in tools. They are written in natural language, editable by the user, and shareable via the agentskills.io hub.

Skills are created and improved by the agent itself. After each turn, a background process called the review fork re-runs the conversation through a restricted agent instance that has access only to memory and skills tools. This fork asks: did I use any skills in this turn? Were they effective? Should they be updated? Is there a new pattern worth capturing? If yes, it writes or updates the relevant files. The user doesn't have to do anything. The agent accumulates.

The Origin in Nous Research's Model Work

Nous Research is known primarily for its open-weight models: the Hermes and Nous-Hermes series of fine-tuned models widely used as base and chat models in the open-source community. Nous has been one of the more prolific producers of instruction-tuned models derived from Llama, Mistral, and other open bases. The agent grew directly from that work.

Hermes Agent ships with an environments/ directory containing RL training environments designed to plug into Nous's Atropos reinforcement learning framework. batch_runner.py supports parallel trajectory generation for training data. The README is explicit: Hermes is not just a tool for end users. It is also infrastructure for generating the agentic trajectories Nous uses to train the next generation of tool-calling models.

That dual purpose shapes the design in specific ways. The skills system produces structured records of what an agent did and what it learned. Session storage in SQLite with FTS5 search preserves full conversation history. The review fork produces explicit labels about skill quality. An agent that accumulates procedural knowledge over time generates labeled data about what procedures work. The two use cases are not in tension; they reinforce each other.

The project's name continues the naming convention from the model releases. Hermes was originally a Nous fine-tune of Llama-2 trained on function calling and instruction following datasets. The agent takes the name from that model series, which took it from the messenger god.

Launched by a Community, Not a Roadmap

The repository was created in July 2025. The first public tagged release, v0.2.0, came in March 2026, eight months later. By the time it was tagged, it already had 216 merged pull requests from 63 contributors.

The v0.2.0 release notes describe this directly: "In just over two weeks, Hermes Agent went from a small internal project to a full-featured AI agent platform, thanks to an explosion of community contributions." The two-week window refers to the period between the first public activity and the tagging of v0.2.0. Before that tag, the project already had a multi-platform messaging gateway covering Telegram, Discord, Slack, WhatsApp, Signal, Email, and Home Assistant. A native MCP client with stdio and HTTP transports. 70+ bundled skills across 15 categories. 3,289 tests.

The subsequent releases continued at a pace that reflects the breadth of the contributor base rather than a small team's roadmap. v0.3.0 landed five days after v0.2.0. v0.4.0, v0.5.0, and v0.6.0 followed at two-to-five day intervals. By v0.12.0 in late April 2026, the repository had 213 community contributors in a single release cycle (550 PRs in the preceding period), a test suite spanning ~700 files, and 19 supported messaging platforms.

This pace has a cost. The v0.10.0 release notes acknowledge deferred content: "This release includes 180+ commits with numerous bug fixes, platform improvements, and reliability enhancements... Full details will be published in the v0.11.0 changelog." The v0.11.0 notes fold in both that deferred content and two weeks of new work. The project is moving faster than its release notes can fully document, which is common for high-velocity open source projects and less common for ones with this level of functional breadth.

The Architecture Decision: Don't Lock the Model

The decision to be model-agnostic was made early and has held. Hermes Agent routes through a centralized provider client built around the OpenAI chat completions API shape (now the de facto standard interface across providers) and supports direct connections to Nous Portal, OpenRouter (200+ models), Anthropic, NVIDIA NIM, Google AI Studio, AWS Bedrock, Azure AI Foundry, Hugging Face, GitHub Copilot, OpenAI, and others. The hermes model command switches providers and models without restarting the process. v0.8.0 extended live model switching to work mid-session from any messaging platform.

The README states the motivation plainly: "no lock-in." Partly that's a user-facing feature. Partly it's an architectural consequence of Nous Research's position. An organization that produces models across multiple provider relationships cannot build its flagship agent on a single provider's infrastructure without creating a conflict of interest. Hermes runs with whatever model a user or organization chooses, including the Nous models themselves.

The cost is real. By v0.11.0, a pluggable transport architecture had abstracted format conversion into a separate agent/transports/ layer with distinct classes for Anthropic, ChatCompletions, Responses API, and Bedrock shapes. Credential pooling, fallback chains, OAuth flows for multiple providers, and per-provider tool-use enforcement guidance all accumulate there. The run_agent.py AIAgent.__init__ signature takes approximately 60 parameters in its full form.

Committing to one provider's API would have produced a simpler codebase. It also would have foreclosed the training data use case, since Nous needs to run their own models through the same infrastructure to generate trajectories.

The Gateway as the Real Product Surface

For many users, the messaging gateway is the primary interface. You configure Hermes once, run hermes gateway start, and the agent is available wherever you already communicate. Telegram and Discord are the most common entry points based on the density of platform-specific issues in the GitHub tracker. The platform list as of v0.12.0 also includes WeChat, WeCom, Feishu/Lark, iMessage, QQBot, Tencent Yuanbao, and Microsoft Teams via plugin.

The gateway maintains per-session state. Each platform instance can be isolated or shared depending on configuration. Prompt caching was added in v0.4.0, preserving Anthropic prompt cache across turns within a session, a latency and cost optimization that matters when sessions run long. v0.9.0 added a local web dashboard for users who prefer not to interact through a terminal or messaging app.

Terminal, messaging gateway, and web dashboard are three deployment models for the same underlying agent. A developer might use the terminal for interactive coding sessions, the gateway for background tasks and notifications, and the web dashboard for configuration. Memory, skills, and session history are shared across all three.

Self-Improvement as a Feature, Not a Claim

"The agent that grows with you" is the project's tagline. What that means mechanically is worth spelling out, because the phrase reads like marketing copy for something vague.

The review fork runs after each conversation turn. A background process evaluates what happened and updates skills or memory. The fork is restricted to memory and skills tools only; it cannot call arbitrary tools or produce side effects outside those two systems. The v0.12.0 release notes describe the upgrade in detail: the fork now evaluates against an explicit rubric (class-first, rather than free-form), prefers updating existing skills over creating new ones, and properly inherits the parent's live provider credentials. Known constraints, known failure modes. Not a black-box learning process.

The Curator, introduced in v0.12.0, runs on the gateway's cron scheduler on a seven-day cycle. It grades the full skill library, consolidates related skills, and prunes skills that are no longer useful. It writes per-run reports to logs/curator/. Defense-in-depth gates protect bundled and hub skills from being modified by the Curator. The hermes curator status command shows skill usage rankings, most-used and least-used, giving users visibility into what the system has accumulated.

No weights are updated. The "learning" is natural language generation (writing and revising Markdown files) combined with structured evaluation (the review fork's rubric). But it produces a measurable behavioral difference over time: an agent that has been running for weeks on a particular user's workflow will have accumulated skills specific to that workflow, and will behave differently than a fresh installation. That's what "grows with you" means.

Where Things Stand

At v0.12.0 in April 2026, the test suite covers ~700 files, 19 messaging platforms are supported, and the inference provider list exceeds 20 integrations. The GitHub issue tracker has 7,000+ open issues, which at this contributor and user scale reflects engagement, not dysfunction.

The open questions aren't about whether it works. They're about where the self-improvement thesis goes. The Curator is the most sophisticated piece of the autonomous learning system to date. The review fork is refined with each release. Whether the combination produces agents that are observably better at complex long-horizon tasks over weeks of use is an empirical question, and the project's trajectory generation infrastructure is presumably how Nous plans to answer it.

The MIT license and the explicit design for training data generation point the same direction: Nous Research is betting that running agents in the open, at scale, across real user workflows, produces something useful for both users and the model training pipeline. Hermes Agent is the interface for both bets.

Sources

NousResearch/hermes-agent README (fetched 2026-04-30)
NousResearch/hermes-agent AGENTS.md (fetched 2026-04-30): architecture overview, file dependency chain, AIAgent class documentation
RELEASE_v0.2.0.md — v0.2.0 release notes, March 12, 2026: 216 PRs, 63 contributors, first public release
RELEASE_v0.3.0.md — v0.3.0 release notes, March 17, 2026: streaming, plugins, native Anthropic provider, Honcho integration
RELEASE_v0.4.0.md — v0.4.0 release notes, March 23, 2026: API server, platform expansion, MCP OAuth 2.1
RELEASE_v0.8.0.md — v0.8.0 release notes, April 8, 2026: background notifications, live model switching, security hardening
RELEASE_v0.11.0.md — v0.11.0 release notes, April 23, 2026: React/Ink TUI, transport abstraction, Bedrock, QQBot
RELEASE_v0.12.0.md — v0.12.0 release notes, April 30, 2026: Curator agent, self-improvement loop upgrade, performance work
GitHub contributors API: — contributor rankings and profiles
Nous Research organization page: — model and project context
agentskills.io — skills hub referenced in README as open standard for skill sharing
Plastic Labs / Honcho: — memory backend integrated in v0.2.0/v0.3.0

Ready to explore the code?

Start the Hermes Agent tour