The Self-Improving Skill System: How Hermes Builds and Refines Its Own Procedures
How Hermes discovers, loads, preprocesses, invokes, and tracks its own skills across the full lifecycle
What you will learn
- How a skill is represented on disk: the YAML frontmatter schema inside `SKILL.md` and what fields drive behavior at runtime
- How `scan_skill_commands()` walks the filesystem to build the `/command` map that every user invocation resolves against
- How `iter_skill_index_files()` crawls local and external skill directories with a sorted, deduplication-safe walk
- How a user message with a `/command` token is resolved to a skill, loaded, and assembled into a formatted prompt payload
- What preprocessing steps run before skill content enters the model context: template variable substitution and optional inline shell expansion
- How `bump_use()` and the `.usage.json` sidecar record usage events and drive the Curator's lifecycle transitions from active to stale to archived
- How `skills_list()` exposes skills back to the user and agent via progressive disclosure: names and descriptions only, with `skill_view()` for full content
Prerequisites
- Familiarity with Python type hints and `pathlib`
- Basic understanding of YAML front matter (as used in Jekyll or similar)
- No prior Hermes knowledge required
What a Skill Is
agent/skill_utils.py:34The YAML frontmatter schema and the lazy YAML loader that parses it
A Hermes skill is a SKILL.md file in its own subdirectory under ~/.hermes/skills/. The YAML frontmatter encodes the skill's identity and runtime contract: name sets the human-readable label, description populates the /skills list summary, platform restricts OS compatibility, and metadata.hermes.config declares variables that must be resolved before the skill runs.
parse_frontmatter() reads that contract with a lazy-imported YAML parser (CSafeLoader when available, SafeLoader as fallback), avoiding a full PyYAML import at startup. If the YAML is malformed, a line-by-line key:value splitter rescues the record rather than dropping it silently. The (frontmatter_dict, body) tuple it returns is consumed by every downstream function from platform matching to config injection.
Every skill's behavior is encoded in its SKILL.md frontmatter. parse_frontmatter() is the single shared reader, with a rescue path that keeps malformed skills alive rather than dropping them.
---
def yaml_load(content: str):
"""Parse YAML with lazy import and CSafeLoader preference."""
global _yaml_load_fn
if _yaml_load_fn is None:
import yaml
loader = getattr(yaml, "CSafeLoader", None) or yaml.SafeLoader
def _load(value: str):
return yaml.load(value, Loader=loader)
_yaml_load_fn = _load
return _yaml_load_fn(content)
# ── Frontmatter parsing ──────────────────────────────────────────────────
def parse_frontmatter(content: str) -> Tuple[Dict[str, Any], str]:
"""Parse YAML frontmatter from a markdown string.
Uses yaml with CSafeLoader for full YAML support (nested metadata, lists)
with a fallback to simple key:value splitting for robustness.
Returns:
(frontmatter_dict, remaining_body)
"""
frontmatter: Dict[str, Any] = {}
body = content
if not content.startswith("---"):
return frontmatter, body
end_match = re.search(r"\n---\s*\n", content[3:])
if not end_match:
return frontmatter, body
yaml_content = content[3 : end_match.start() + 3]
body = content[end_match.end() + 3 :]
try:
parsed = yaml_load(yaml_content)
if isinstance(parsed, dict):
frontmatter = parsed
except Exception:
# Fallback: simple key:value parsing for malformed YAML
for line in yaml_content.strip().split("\n"):
if ":" not in line:
continue
key, value = line.split(":", 1)
frontmatter[key.strip()] = value.strip()
return frontmatter, body
# ── Platform matching ─────────────────────────────────────────────────────
Skill Creation from Files
agent/skill_commands.py:215How scan_skill_commands() turns SKILL.md files on disk into the slash-command registry
Skills are not registered by code; they are discovered by dropping a SKILL.md file into the skills directory. scan_skill_commands() turns those files into a live command registry at startup and again on /reload-skills. It scans ~/.hermes/skills/ first, then any skills.external_dirs from config.yaml, with seen_names ensuring local skills win on name collisions.
For each file found, the function applies three filters before adding the skill:
- Platform compatibility (macOS, Linux, or Windows)
- The user's
disabledlist from config - Deduplication against already-seen names
If the frontmatter has no description, the scanner falls back to the first non-heading prose line in the body. The skill name is then slugified (spaces and underscores become hyphens, invalid characters stripped) to produce a /command key safe for both Telegram bot names and the CLI.
Skill registration is a filesystem event. Dropping a SKILL.md and running /reload-skills is enough; scan_skill_commands() handles discovery, filtering, and slugification.
---
def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
"""Scan ~/.hermes/skills/ and return a mapping of /command -> skill info.
Returns:
Dict mapping "/skill-name" to {name, description, skill_md_path, skill_dir}.
"""
global _skill_commands
_skill_commands = {}
try:
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform, _get_disabled_skill_names
from agent.skill_utils import get_external_skills_dirs, iter_skill_index_files
disabled = _get_disabled_skill_names()
seen_names: set = set()
# Scan local dir first, then external dirs
dirs_to_scan = []
if SKILLS_DIR.exists():
dirs_to_scan.append(SKILLS_DIR)
dirs_to_scan.extend(get_external_skills_dirs())
for scan_dir in dirs_to_scan:
for skill_md in iter_skill_index_files(scan_dir, "SKILL.md"):
if any(part in ('.git', '.github', '.hub', '.archive') for part in skill_md.parts):
continue
try:
content = skill_md.read_text(encoding='utf-8')
frontmatter, body = _parse_frontmatter(content)
# Skip skills incompatible with the current OS platform
if not skill_matches_platform(frontmatter):
continue
name = frontmatter.get('name', skill_md.parent.name)
if name in seen_names:
continue
# Respect user's disabled skills config
if name in disabled:
continue
description = frontmatter.get('description', '')
if not description:
for line in body.strip().split('\n'):
line = line.strip()
if line and not line.startswith('#'):
description = line[:80]
break
seen_names.add(name)
# Normalize to hyphen-separated slug, stripping
# non-alnum chars (e.g. +, /) to avoid invalid
# Telegram command names downstream.
cmd_name = name.lower().replace(' ', '-').replace('_', '-')
cmd_name = _SKILL_INVALID_CHARS.sub('', cmd_name)
cmd_name = _SKILL_MULTI_HYPHEN.sub('-', cmd_name).strip('-')
if not cmd_name:
continue
_skill_commands[f"/{cmd_name}"] = {
"name": name,
"description": description or f"Invoke the {name} skill",
"skill_md_path": str(skill_md),
"skill_dir": str(skill_md.parent),
}
except Exception:
continue
except Exception:
pass
return _skill_commands
def get_skill_commands() -> Dict[str, Dict[str, Any]]:Skill Indexing
agent/skill_utils.py:440How iter_skill_index_files() walks skill directories to build the index
iter_skill_index_files() is the shared tree walker for the entire skill system. Both scan_skill_commands() and _find_all_skills() delegate to it. It uses os.walk with followlinks=True to traverse symlinked skill directories, and mutates dirs[:] in place to prune excluded paths before recursion. The excluded set (.git, .github, .hub, .archive) covers version control metadata, hub bundles, and the archived-skill holding area.
The two-phase approach (collect, then sort and yield) guarantees deterministic file order regardless of the OS's directory-entry ordering. Without a stable sort, which skill wins a name collision across directories would depend on filesystem ordering and be non-reproducible.
skills/index-cache/ holds pre-built JSON snapshots from hub providers. Those feed the hub installer; iter_skill_index_files only sees SKILL.md files already extracted onto disk.
Sort order is a correctness guarantee, not a convenience. iter_skill_index_files() sorts before yielding so the deduplication logic in callers always resolves collisions the same way.
---
def iter_skill_index_files(skills_dir: Path, filename: str):
"""Walk skills_dir yielding sorted paths matching *filename*.
Excludes ``.git``, ``.github``, ``.hub``, ``.archive`` directories.
"""
matches = []
for root, dirs, files in os.walk(skills_dir, followlinks=True):
dirs[:] = [d for d in dirs if d not in EXCLUDED_SKILL_DIRS]
if filename in files:
matches.append(Path(root) / filename)
for path in sorted(matches, key=lambda p: str(p.relative_to(skills_dir))):
yield path
# ── Namespace helpers for plugin-provided skills ───────────────────────────
_NAMESPACE_RE = re.compile(r"^[a-zA-Z0-9_-]+$")
Skill Retrieval at Runtime
agent/skill_commands.py:352How a user-typed /command is resolved to a skill and assembled into a prompt payload
Retrieval is exact slash-command matching with a single normalization step. When a user types /gif-search or /gif_search, resolve_skill_command_key() collapses underscores to hyphens and does a direct dictionary lookup. The Telegram constraint explains the design: Telegram bot commands cannot contain hyphens, so the same skill arrives as /gif_search from Telegram but is stored under /gif-search internally. Normalization bridges both without a separate registry.
build_skill_invocation_message() is the runtime entry point once a key is resolved. It loads SKILL.md via _load_skill_payload(), then calls bump_use() in a try/except. Usage tracking is best-effort and must never block an invocation. The activation note injected at the top of the assembled message tells the model the user invoked this skill intentionally. The user_instruction field carries any text typed after the command as an inline override the model can weigh against the skill's default behavior.
Skill retrieval is a dictionary lookup, not a vector search. Underscore-to-hyphen normalization is the sole disambiguation step, chosen to reconcile Telegram's naming constraint with the internal slug format. ---
def resolve_skill_command_key(command: str) -> Optional[str]:
"""Resolve a user-typed /command to its canonical skill_cmds key.
Skills are always stored with hyphens — ``scan_skill_commands`` normalizes
spaces and underscores to hyphens when building the key. Hyphens and
underscores are treated interchangeably in user input: this matches
``_check_unavailable_skill`` and accommodates Telegram bot-command names
(which disallow hyphens, so ``/claude-code`` is registered as
``/claude_code`` and comes back in the underscored form).
Returns the matching ``/slug`` key from ``get_skill_commands()`` or
``None`` if no match.
"""
if not command:
return None
cmd_key = f"/{command.replace('_', '-')}"
return cmd_key if cmd_key in get_skill_commands() else None
def build_skill_invocation_message(
cmd_key: str,
user_instruction: str = "",
task_id: str | None = None,
runtime_note: str = "",
) -> Optional[str]:
"""Build the user message content for a skill slash command invocation.
Args:
cmd_key: The command key including leading slash (e.g., "/gif-search").
user_instruction: Optional text the user typed after the command.
Returns:
The formatted message string, or None if the skill wasn't found.
"""
commands = get_skill_commands()
skill_info = commands.get(cmd_key)
if not skill_info:
return None
loaded = _load_skill_payload(skill_info["skill_dir"], task_id=task_id)
if not loaded:
return f"[Failed to load skill: {skill_info['name']}]"
loaded_skill, skill_dir, skill_name = loaded
# Track active usage for Curator lifecycle management (#17782)
try:
from tools.skill_usage import bump_use
bump_use(skill_name)
except Exception:
pass # Non-critical — skill invocation proceeds regardless
activation_note = (
f'[IMPORTANT: The user has invoked the "{skill_name}" skill, indicating they want '
"you to follow its instructions. The full skill content is loaded below.]"
)
return _build_skill_message(
loaded_skill,
skill_dir,
activation_note,
user_instruction=user_instruction,
runtime_note=runtime_note,
session_id=task_id,
)
Skill Preprocessing
agent/skill_preprocessing.py:115Template variable substitution and optional inline shell expansion before a skill enters the prompt
preprocess_skill_content() is the final transformation before skill markdown reaches the model. It runs two independent passes, each gated by a config flag.
The first pass (template variable substitution, on by default) replaces ${HERMES_SKILL_DIR} and ${HERMES_SESSION_ID} tokens with concrete runtime values. A skill bundling scripts can reference ${HERMES_SKILL_DIR}/scripts/run.sh so the agent invokes it by absolute path without a skill_view() round-trip. Unresolved tokens are left in place, making missing values visible rather than silent.
The second pass (inline shell expansion, off by default) replaces backtick snippets with their stdout. Output is capped at _INLINE_SHELL_MAX_OUTPUT (4000 characters), runs under a configurable timeout (default 10 seconds), and returns a [inline-shell error: ...] marker on failure rather than aborting the load. Because this pass is opt-in, the common case pays no subprocess cost. The function accepts a pre-loaded skills_cfg dict so callers that have already read config can avoid a redundant file read.
Preprocessing is two opt-in passes, both designed so failures surface visibly. Template substitution gives skills portable path references; inline shell expansion lets them embed live runtime data. ---
def preprocess_skill_content(
content: str,
skill_dir: Path | None,
session_id: str | None = None,
skills_cfg: dict | None = None,
) -> str:
"""Apply configured SKILL.md template and inline-shell preprocessing."""
if not content:
return content
cfg = skills_cfg if isinstance(skills_cfg, dict) else load_skills_config()
if cfg.get("template_vars", True):
content = substitute_template_vars(content, skill_dir, session_id)
if cfg.get("inline_shell", False):
timeout = int(cfg.get("inline_shell_timeout", 10) or 10)
content = expand_inline_shell(content, skill_dir, timeout)
return contentSkill Refinement and Lifecycle Tracking
tools/skill_usage.py:214How the usage sidecar records invocations and drives Curator lifecycle transitions
Hermes does not rewrite skill content on invocation; that is the Curator's job on a separate cadence. At invocation time, bump_use() records the event: it increments use_count and writes last_used_at to a JSON sidecar at ~/.hermes/skills/.usage.json. The sidecar is deliberately separate from SKILL.md to avoid merge conflicts for hub-fetched skills and to allow safe reads without touching skill content.
_mutate() is the shared write primitive and enforces one invariant: bundled and hub-installed skills are never recorded. The sidecar is exclusively for agent-created skills because the Curator only drives lifecycle transitions on skills the agent itself generated.
The lifecycle state machine has three states:
active— defaultstale— unused beyond a configurable thresholdarchived— moved to the.archive/subdirectory
reload_skills() performs a before/after diff against the in-memory _skill_commands map and returns {added, removed, unchanged, total}, letting the model learn what changed mid-session without a prompt-cache reset.
Skill refinement is lifecycle management, not automatic rewriting. bump_use() feeds the Curator timestamps; the Curator decides when to transition active to stale to archived, and only ever touches agent-created skills.
---
def _empty_record() -> Dict[str, Any]:
return {
"use_count": 0,
"view_count": 0,
"last_used_at": None,
"last_viewed_at": None,
"patch_count": 0,
"last_patched_at": None,
"created_at": _now_iso(),
"state": STATE_ACTIVE,
"pinned": False,
"archived_at": None,
}
def load_usage() -> Dict[str, Dict[str, Any]]:
"""Read the entire .usage.json map. Returns empty dict on missing/corrupt."""
path = _usage_file()
if not path.exists():
return {}
try:
data = json.loads(path.read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError) as e:
logger.debug("Failed to read %s: %s", path, e)
return {}
if not isinstance(data, dict):
return {}
# Defensive: coerce any non-dict values to a fresh empty record
clean: Dict[str, Dict[str, Any]] = {}
for k, v in data.items():
if isinstance(v, dict):
clean[str(k)] = v
return clean
def save_usage(data: Dict[str, Dict[str, Any]]) -> None:
"""Write the usage map atomically. Best-effort — errors are logged, not raised."""
path = _usage_file()
try:
path.parent.mkdir(parents=True, exist_ok=True)
fd, tmp_path = tempfile.mkstemp(
dir=str(path.parent), prefix=".usage_", suffix=".tmp"
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, sort_keys=True, ensure_ascii=False)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, path)
except BaseException:
try:
os.unlink(tmp_path)
except OSError:
pass
raise
except Exception as e:
logger.debug("Failed to write %s: %s", path, e, exc_info=True)
def get_record(skill_name: str) -> Dict[str, Any]:
"""Return the record for *skill_name*, creating a fresh one if missing."""
data = load_usage()
rec = data.get(skill_name)
if not isinstance(rec, dict):
return _empty_record()
# Backfill any missing keys so callers don't need to handle old files
base = _empty_record()
for k, v in base.items():
rec.setdefault(k, v)
return rec
def _mutate(skill_name: str, mutator) -> None:
"""Load, apply *mutator(record)* in place, save. Best-effort.
Bundled and hub-installed skills are NEVER recorded in the sidecar.
This keeps .usage.json focused on agent-created skills (the only ones
the curator considers) and prevents stale counters from hanging around
for upstream-managed skills.
"""
if not skill_name:
return
try:
if not is_agent_created(skill_name):
return
data = load_usage()
rec = data.get(skill_name)
if not isinstance(rec, dict):
rec = _empty_record()
mutator(rec)
data[skill_name] = rec
save_usage(data)
except Exception as e:
logger.debug("skill_usage._mutate(%s) failed: %s", skill_name, e, exc_info=True)
# ---------------------------------------------------------------------------
# Public counter-bump helpers
# ---------------------------------------------------------------------------
def bump_view(skill_name: str) -> None:
"""Bump view_count and last_viewed_at. Called from skill_view()."""
def _apply(rec: Dict[str, Any]) -> None:
rec["view_count"] = int(rec.get("view_count") or 0) + 1
rec["last_viewed_at"] = _now_iso()
_mutate(skill_name, _apply)
def bump_use(skill_name: str) -> None:
"""Bump use_count and last_used_at. Called when a skill is actively used
(e.g. loaded into the prompt path or referenced from an assistant turn)."""
def _apply(rec: Dict[str, Any]) -> None:
rec["use_count"] = int(rec.get("use_count") or 0) + 1
rec["last_used_at"] = _now_iso()
_mutate(skill_name, _apply)
def bump_patch(skill_name: str) -> None:Skill Discoverability
tools/skills_tool.py:674How skills_list() exposes skills to the user and agent with progressive disclosure
skills_list() is tier one of a two-tier discoverability system. It returns only {name, description, category} per skill and withholds full content, tags, and linked files. The hint field in the response points the agent toward skill_view(name) for the full payload. The split is a token-efficiency decision: a user with 30 skills would burn thousands of tokens loading full content on every listing.
_find_all_skills() delegates to iter_skill_index_files() for the walk (Stop 3) and applies the same platform, disabled, and deduplication filters as scan_skill_commands() (Stop 2). It derives category from the directory path relative to the skills root. The repository ships with 20+ category subdirectories (apple, creative, data-science, mlops, red-teaming, and others) that map directly to this field.
On first call with no skills present, skills_list() creates the skills directory and returns an empty response rather than an error, making the tool safe to call unconditionally on a fresh install.
Progressive disclosure is an explicit token-efficiency decision, not an oversight. skills_list() returns only name, description, and category, with a hint that teaches the model to call skill_view() when it needs full content.
---
def skills_list(category: str = None, task_id: str = None) -> str:
"""
List all available skills (progressive disclosure tier 1 - minimal metadata).
Returns only name + description to minimize token usage. Use skill_view() to
load full content, tags, related files, etc.
Args:
category: Optional category filter (e.g., "mlops")
task_id: Optional task identifier used to probe the active backend
Returns:
JSON string with minimal skill info: name, description, category
"""
try:
if not SKILLS_DIR.exists():
SKILLS_DIR.mkdir(parents=True, exist_ok=True)
return json.dumps(
{
"success": True,
"skills": [],
"categories": [],
"message": f"No skills found. Skills directory created at {display_hermes_home()}/skills/",
},
ensure_ascii=False,
)
# Find all skills
all_skills = _find_all_skills()
if not all_skills:
return json.dumps(
{
"success": True,
"skills": [],
"categories": [],
"message": "No skills found in skills/ directory.",
},
ensure_ascii=False,
)
# Filter by category if specified
if category:
all_skills = [s for s in all_skills if s.get("category") == category]
# Sort by category then name
all_skills = _sort_skills(all_skills)
# Extract unique categories
categories = sorted(
set(s.get("category") for s in all_skills if s.get("category"))
)
return json.dumps(
{
"success": True,
"skills": all_skills,
"categories": categories,
"count": len(all_skills),
"hint": "Use skill_view(name) to see full content, tags, and linked files",
},
ensure_ascii=False,
)
except Exception as e:
return tool_error(str(e), success=False)
# ── Plugin skill serving ──────────────────────────────────────────────────
def _serve_plugin_skill(You've walked through 7 key areas of the Hermes Agent codebase.
Continue: Hermes Memory Architecture: How an Agent Remembers You Across Sessions → Browse all projectsCreate code tours for your project
Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.
Install Intraview Free