Hermes Agent Python NousResearch/hermes-agent

Hermes Memory Architecture: How an Agent Remembers You Across Sessions

How a fact travels from a conversation into long-term storage and back into a future prompt

7 stops ~30 min Verified 2026-04-30

What you will learn

Why Hermes separates the memory contract (`MemoryProvider`) from its implementations — and how new backends plug in without touching the agent core
How `MemoryManager` enforces a one-external-provider-at-a-time rule and routes reads and writes across all registered backends in a single call
How the Curator distills accumulated skill data into a curated library — using a forked `AIAgent` as its own reviewer
How `InsightsEngine` aggregates raw session rows into patterns: cost breakdowns, streaks, skill rankings, and activity heat maps
How the holographic memory backend indexes every stored fact with SQLite FTS5 for sub-millisecond keyword recall
How the Honcho plugin models the user dialectically — calling `peer.chat()` against Honcho's AI-native memory to synthesize a representation before each turn
How `prefetch_all` assembles recalled context at session start so the model's first response already knows who it is talking to

Prerequisites

Comfortable reading Python with type annotations
Basic understanding of abstract base classes (`ABC`) and the Template Method pattern
Familiarity with SQLite is helpful but not required

1 / 7

The Memory Provider Interface

agent/memory_provider.py:43

The abstract base class every memory backend must implement

MemoryProvider defines a seven-method lifecycle: initialize, system_prompt_block, prefetch, sync_turn, get_tool_schemas, handle_tool_call, and shutdown. Three are @abstractmethod, so an incomplete backend fails at class definition time, surfacing the error at import rather than mid-session.

initialize fires once at startup to create tables and open connections. prefetch fires before every API call and must return immediately, serving from a background-populated cache. A slow remote backend never stalls the first response.

The optional hooks (on_turn_start, on_session_end, on_session_switch, on_pre_compress, on_delegation, on_memory_write) all default to no-ops. A backend overrides only the hooks it needs, so adding a new hook to the interface never breaks existing providers.

Key takeaway

MemoryProvider is a lifecycle definition, not just an API. The required/optional split between abstract methods and default no-ops lets a minimal file-store backend and a complex dialectic engine coexist under the same interface without either implementing what it doesn't need. ---

class MemoryProvider(ABC):
    """Abstract base class for memory providers."""

    @property
    @abstractmethod
    def name(self) -> str:
        """Short identifier for this provider (e.g. 'builtin', 'honcho', 'hindsight')."""

    # -- Core lifecycle (implement these) ------------------------------------

    @abstractmethod
    def is_available(self) -> bool:
        """Return True if this provider is configured, has credentials, and is ready.

        Called during agent init to decide whether to activate the provider.
        Should not make network calls — just check config and installed deps.
        """

    @abstractmethod
    def initialize(self, session_id: str, **kwargs) -> None:
        """Initialize for a session.

        Called once at agent startup. May create resources (banks, tables),
        establish connections, start background threads, etc.

        kwargs always include:
          - hermes_home (str): The active HERMES_HOME directory path. Use this
            for profile-scoped storage instead of hardcoding ``~/.hermes``.
          - platform (str): "cli", "telegram", "discord", "cron", etc.

        kwargs may also include:
          - agent_context (str): "primary", "subagent", "cron", or "flush".
            Providers should skip writes for non-primary contexts (cron system
            prompts would corrupt user representations).
          - agent_identity (str): Profile name (e.g. "coder"). Use for
            per-profile provider identity scoping.
          - agent_workspace (str): Shared workspace name (e.g. "hermes").
          - parent_session_id (str): For subagents, the parent's session_id.
          - user_id (str): Platform user identifier (gateway sessions).
        """

    def system_prompt_block(self) -> str:
        """Return text to include in the system prompt.

        Called during system prompt assembly. Return empty string to skip.
        This is for STATIC provider info (instructions, status). Prefetched
        recall context is injected separately via prefetch().
        """
        return ""

    def prefetch(self, query: str, *, session_id: str = "") -> str:
        """Recall relevant context for the upcoming turn.

        Called before each API call. Return formatted text to inject as
        context, or empty string if nothing relevant. Implementations
        should be fast — use background threads for the actual recall
        and return cached results here.

        session_id is provided for providers serving concurrent sessions
        (gateway group chats, cached agents). Providers that don't need
        per-session scoping can ignore it.
        """
        return ""

    def queue_prefetch(self, query: str, *, session_id: str = "") -> None:
        """Queue a background recall for the NEXT turn.

        Called after each turn completes. The result will be consumed
        by prefetch() on the next turn. Default is no-op — providers
        that do background prefetching should override this.
        """

    def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
        """Persist a completed turn to the backend.

        Called after each turn. Should be non-blocking — queue for
        background processing if the backend has latency.
        """

    @abstractmethod
    def get_tool_schemas(self) -> List[Dict[str, Any]]:
        """Return tool schemas this provider exposes.

        Each schema follows the OpenAI function calling format:
        {"name": "...", "description": "...", "parameters": {...}}

        Return empty list if this provider has no tools (context-only).
        """

    def handle_tool_call(self, tool_name: str, args: Dict[str, Any], **kwargs) -> str:
        """Handle a tool call for one of this provider's tools.

        Must return a JSON string (the tool result).
        Only called for tool names returned by get_tool_schemas().
        """
        raise NotImplementedError(f"Provider {self.name} does not handle tool {tool_name}")

    def shutdown(self) -> None:
        """Clean shutdown — flush queues, close connections."""

2 / 7

Memory Manager Orchestration

agent/memory_manager.py:192

How the manager registers backends, enforces the one-external-provider rule, and fans reads and writes across all providers

MemoryManager is the single wiring point in run_agent.py. Its call sequence is: register providers, build the system prompt, prefetch before each turn, sync after. The caller never needs to know which backends are active.

The one-external-provider rule lives in a single boolean flag _has_external. The builtin provider (MEMORY.md / USER.md) is always first and exempt from the limit. A second external provider logs a warning and returns early (no exception), so a misconfigured plugin cannot crash agent startup.

Tool routing is built at registration time: add_provider walks get_tool_schemas() and populates a flat _tool_to_provider dict. At call time, handle_tool_call does a single dict lookup. Name conflicts log a warning and the first registrant wins.

Every provider call in prefetch_all and sync_all is wrapped in try/except. The severity is deliberate: prefetch failures log at DEBUG (transient network issues are expected), while sync failures log at WARNING (data loss is possible).

Key takeaway

MemoryManager trades expressiveness for reliability. The one-external-provider limit prevents schema conflicts, and the per-call exception wrappers ensure one broken backend never stops the others from reading or writing. ---

class MemoryManager:
    """Orchestrates the built-in provider plus at most one external provider.

    The builtin provider is always first. Only one non-builtin (external)
    provider is allowed.  Failures in one provider never block the other.
    """

    def __init__(self) -> None:
        self._providers: List[MemoryProvider] = []
        self._tool_to_provider: Dict[str, MemoryProvider] = {}
        self._has_external: bool = False  # True once a non-builtin provider is added

    # -- Registration --------------------------------------------------------

    def add_provider(self, provider: MemoryProvider) -> None:
        """Register a memory provider.

        Built-in provider (name ``"builtin"``) is always accepted.
        Only **one** external (non-builtin) provider is allowed — a second
        attempt is rejected with a warning.
        """
        is_builtin = provider.name == "builtin"

        if not is_builtin:
            if self._has_external:
                existing = next(
                    (p.name for p in self._providers if p.name != "builtin"), "unknown"
                )
                logger.warning(
                    "Rejected memory provider '%s' — external provider '%s' is "
                    "already registered. Only one external memory provider is "
                    "allowed at a time. Configure which one via memory.provider "
                    "in config.yaml.",
                    provider.name, existing,
                )
                return
            self._has_external = True

        self._providers.append(provider)

        # Index tool names → provider for routing
        for schema in provider.get_tool_schemas():
            tool_name = schema.get("name", "")
            if tool_name and tool_name not in self._tool_to_provider:
                self._tool_to_provider[tool_name] = provider
            elif tool_name in self._tool_to_provider:
                logger.warning(
                    "Memory tool name conflict: '%s' already registered by %s, "
                    "ignoring from %s",
                    tool_name,
                    self._tool_to_provider[tool_name].name,
                    provider.name,
                )

        logger.info(
            "Memory provider '%s' registered (%d tools)",
            provider.name,
            len(provider.get_tool_schemas()),
        )

    @property
    def providers(self) -> List[MemoryProvider]:
        """All registered providers in order."""
        return list(self._providers)

    def get_provider(self, name: str) -> Optional[MemoryProvider]:
        """Get a provider by name, or None if not registered."""
        for p in self._providers:
            if p.name == name:
                return p
        return None

    # -- System prompt -------------------------------------------------------

    def build_system_prompt(self) -> str:
        """Collect system prompt blocks from all providers.

        Returns combined text, or empty string if no providers contribute.
        Each non-empty block is labeled with the provider name.
        """
        blocks = []
        for provider in self._providers:
            try:
                block = provider.system_prompt_block()
                if block and block.strip():
                    blocks.append(block)
            except Exception as e:
                logger.warning(
                    "Memory provider '%s' system_prompt_block() failed: %s",
                    provider.name, e,
                )
        return "\n\n".join(blocks)

    # -- Prefetch / recall ---------------------------------------------------

    def prefetch_all(self, query: str, *, session_id: str = "") -> str:
        """Collect prefetch context from all providers.

        Returns merged context text labeled by provider. Empty providers
        are skipped. Failures in one provider don't block others.
        """
        parts = []
        for provider in self._providers:
            try:
                result = provider.prefetch(query, session_id=session_id)
                if result and result.strip():
                    parts.append(result)
            except Exception as e:
                logger.debug(
                    "Memory provider '%s' prefetch failed (non-fatal): %s",
                    provider.name, e,
                )
        return "\n\n".join(parts)

    def queue_prefetch_all(self, query: str, *, session_id: str = "") -> None:
        """Queue background prefetch on all providers for the next turn."""
        for provider in self._providers:
            try:
                provider.queue_prefetch(query, session_id=session_id)
            except Exception as e:
                logger.debug(
                    "Memory provider '%s' queue_prefetch failed (non-fatal): %s",
                    provider.name, e,
                )

    # -- Sync ----------------------------------------------------------------

    def sync_all(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
        """Sync a completed turn to all providers."""
        for provider in self._providers:
            try:
                provider.sync_turn(user_content, assistant_content, session_id=session_id)
            except Exception as e:
                logger.warning(
                    "Memory provider '%s' sync_turn failed: %s",
                    provider.name, e,
                )

    # -- Tools ---------------------------------------------------------------

    def get_all_tool_schemas(self) -> List[Dict[str, Any]]:
        """Collect tool schemas from all providers."""
        schemas = []
        seen = set()
        for provider in self._providers:
            try:
                for schema in provider.get_tool_schemas():
                    name = schema.get("name", "")
                    if name and name not in seen:
                        schemas.append(schema)
                        seen.add(name)
            except Exception as e:
                logger.warning(
                    "Memory provider '%s' get_tool_schemas() failed: %s",
                    provider.name, e,
                )
        return schemas

    def get_all_tool_names(self) -> set:
        """Return set of all tool names across all providers."""
        return set(self._tool_to_provider.keys())

    def has_tool(self, tool_name: str) -> bool:
        """Check if any provider handles this tool."""
        return tool_name in self._tool_to_provider

    def handle_tool_call(
        self, tool_name: str, args: Dict[str, Any], **kwargs
    ) -> str:
        """Route a tool call to the correct provider.

        Returns JSON string result. Raises ValueError if no provider
        handles the tool.
        """
        provider = self._tool_to_provider.get(tool_name)
        if provider is None:
            return tool_error(f"No memory provider handles tool '{tool_name}'")
        try:
            return provider.handle_tool_call(tool_name, args, **kwargs)
        except Exception as e:
            logger.error(
                "Memory provider '%s' handle_tool_call(%s) failed: %s",
                provider.name, tool_name, e,
            )
            return tool_error(f"Memory tool '{tool_name}' failed: {e}")

3 / 7

The Curator — Distilling Conversation into Facts

agent/curator.py:214

How the Curator applies automatic state transitions and then spawns a forked agent to consolidate accumulated skill data into a curated library

The Curator runs on a configurable interval and does its work in two cleanly separated phases.

Phase one is apply_automatic_transitions: a pure function with no LLM calls. It walks every agent-created skill, computes an anchor date (last activity, falling back to creation date), and applies three transitions:

active → stale when the anchor passes stale_after_days
stale → archived when the anchor passes archive_after_days
stale → active when a skill is used again

Pass a now value in tests and the transitions become deterministic.

Phase two is the LLM pass inside _llm_pass. The Curator forks a full AIAgent with skip_memory=True and quiet_mode=True, then runs CURATOR_REVIEW_PROMPT against the current candidate list. The fork has access to skill_manage, so it can consolidate duplicates, archive dead weight, and create umbrella entries, using the same tool surface the agent exposes to users. Nudge intervals are zeroed out to prevent recursive triggering.

The last_run_at write happens before _llm_pass starts. If the LLM pass hangs or crashes, the timestamp is already persisted and prevents an immediate re-trigger on the next session start.

Key takeaway

The same model that created the skills decides which ones to keep. The Curator's quality scales with the model's judgment, not with the number of curator-specific rules. ---

def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
    """Walk every agent-created skill and move active/stale/archived based on
    the latest real activity timestamp. Pinned skills are never touched.
    Returns a counter dict describing what changed."""
    from tools import skill_usage as _u

    if now is None:
        now = datetime.now(timezone.utc)
    stale_cutoff = now - timedelta(days=get_stale_after_days())
    archive_cutoff = now - timedelta(days=get_archive_after_days())

    counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0}

    for row in _u.agent_created_report():
        counts["checked"] += 1
        name = row["name"]
        if row.get("pinned"):
            continue

        last_activity = _parse_iso(row.get("last_activity_at"))
        # If never active, treat created_at as the anchor so new skills don't
        # immediately archive themselves.
        anchor = last_activity or _parse_iso(row.get("created_at")) or now
        if anchor.tzinfo is None:
            anchor = anchor.replace(tzinfo=timezone.utc)

        current = row.get("state", _u.STATE_ACTIVE)

        if anchor <= archive_cutoff and current != _u.STATE_ARCHIVED:
            ok, _msg = _u.archive_skill(name)
            if ok:
                counts["archived"] += 1
        elif anchor <= stale_cutoff and current == _u.STATE_ACTIVE:
            _u.set_state(name, _u.STATE_STALE)
            counts["marked_stale"] += 1
        elif anchor > stale_cutoff and current == _u.STATE_STALE:
            # Skill got used again after being marked stale — reactivate.
            _u.set_state(name, _u.STATE_ACTIVE)
            counts["reactivated"] += 1

    return counts

def run_curator_review(
    on_summary: Optional[Callable[[str], None]] = None,
    synchronous: bool = False,
) -> Dict[str, Any]:
    """Execute a single curator review pass.

    Steps:
      1. Apply automatic state transitions (pure, no LLM).
      2. If there are agent-created skills, spawn a forked AIAgent that runs
         the LLM review prompt against the current candidate list.
      3. Update .curator_state with last_run_at and a one-line summary.
      4. Invoke *on_summary* with a user-visible description.

    If *synchronous* is True, the LLM review runs in the calling thread; the
    default is to spawn a daemon thread so the caller returns immediately.
    """
    start = datetime.now(timezone.utc)
    counts = apply_automatic_transitions(now=start)

    auto_summary_parts = []
    if counts["marked_stale"]:
        auto_summary_parts.append(f"{counts['marked_stale']} marked stale")
    if counts["archived"]:
        auto_summary_parts.append(f"{counts['archived']} archived")
    if counts["reactivated"]:
        auto_summary_parts.append(f"{counts['reactivated']} reactivated")
    auto_summary = ", ".join(auto_summary_parts) if auto_summary_parts else "no changes"

    # Persist state before the LLM pass so a crash mid-review still records
    # the run and doesn't immediately re-trigger.
    state = load_state()
    state["last_run_at"] = start.isoformat()
    state["run_count"] = int(state.get("run_count", 0)) + 1
    state["last_run_summary"] = f"auto: {auto_summary}"
    save_state(state)

    def _llm_pass():
        nonlocal auto_summary
        # Snapshot skill state BEFORE the LLM pass so the report can diff.
        try:
            before_report = skill_usage.agent_created_report()
        except Exception:
            before_report = []
        before_names = {r.get("name") for r in before_report if isinstance(r, dict)}

        llm_meta: Dict[str, Any] = {}
        try:
            candidate_list = _render_candidate_list()
            if "No agent-created skills" in candidate_list:
                final_summary = f"auto: {auto_summary}; llm: skipped (no candidates)"
                llm_meta = {
                    "final": "",
                    "summary": "skipped (no candidates)",
                    "model": "",
                    "provider": "",
                    "tool_calls": [],
                    "error": None,
                }
            else:
                prompt = f"{CURATOR_REVIEW_PROMPT}\n\n{candidate_list}"
                llm_meta = _run_llm_review(prompt)
                final_summary = (
                    f"auto: {auto_summary}; llm: {llm_meta.get('summary', 'no change')}"
                )
        except Exception as e:
            logger.debug("Curator LLM pass failed: %s", e, exc_info=True)
            final_summary = f"auto: {auto_summary}; llm: error ({e})"
            llm_meta = {
                "final": "",
                "summary": f"error ({e})",
                "model": "",
                "provider": "",
                "tool_calls": [],
                "error": str(e),
            }

        elapsed = (datetime.now(timezone.utc) - start).total_seconds()
        state2 = load_state()
        state2["last_run_duration_seconds"] = elapsed
        state2["last_run_summary"] = final_summary

        # Write the per-run report. Runs in a best-effort try so a
        # reporting bug never breaks the curator itself. Report path is
        # recorded in state so `hermes curator status` can point at it.
        try:
            after_report = skill_usage.agent_created_report()
        except Exception:
            after_report = []
        try:
            report_path = _write_run_report(
                started_at=start,
                elapsed_seconds=elapsed,
                auto_counts=counts,
                auto_summary=auto_summary,
                before_report=before_report,
                before_names=before_names,
                after_report=after_report,
                llm_meta=llm_meta,
            )
            if report_path is not None:
                state2["last_report_path"] = str(report_path)
        except Exception as e:
            logger.debug("Curator report write failed: %s", e, exc_info=True)

        save_state(state2)

        if on_summary:
            try:
                on_summary(f"curator: {final_summary}")
            except Exception:
                pass

    if synchronous:
        _llm_pass()
    else:
        t = threading.Thread(target=_llm_pass, daemon=True, name="curator-review")
        t.start()

    return {
        "started_at": start.isoformat(),
        "auto_transitions": counts,
        "summary_so_far": auto_summary,
    }

4 / 7

Insights — Higher-Order Patterns from Sessions

agent/insights.py:93

How InsightsEngine builds a multi-dimensional report over raw session rows

InsightsEngine is a read-only analytics layer over Hermes's SQLite session database. Its single entry point, generate, takes a days window and an optional source filter and returns a structured dict covering cost, model distribution, platform breakdown, tool usage, skill usage, activity patterns, and top sessions.

The pipeline is sequential: four SQL queries gather raw rows, then seven _compute_* methods each transform one slice. No compute method knows about another, which makes them independently testable and lets each formatter (terminal, gateway) pick only the slices it needs.

_compute_skill_breakdown shows the report's two-level structure. Raw rows carry view_count (skill loaded into context) and manage_count (skill edited or created). The method combines these into total_count, computes a percentage share, and sorts by a five-key tuple (total count first, then recency) so ties resolve deterministically. The summary sub-dict holds headline numbers; top_skills holds the ranked list.

_compute_activity_patterns walks sorted session dates pairwise and increments a streak when consecutive dates differ by exactly one day. A Monday-Wednesday-Friday user has a max streak of 2, not 3.

Key takeaway

InsightsEngine.generate is a one-in, one-out pipeline: raw session rows in, a single structured dict out. No shared mutable state between compute methods is why 930 lines stays readable. ---

class InsightsEngine:
    """
    Analyzes session history and produces usage insights.

    Works directly with a SessionDB instance (or raw sqlite3 connection)
    to query session and message data.
    """

    def __init__(self, db):
        """
        Initialize with a SessionDB instance.

        Args:
            db: A SessionDB instance (from hermes_state.py)
        """
        self.db = db
        self._conn = db._conn

    def generate(self, days: int = 30, source: str = None) -> Dict[str, Any]:
        """
        Generate a complete insights report.

        Args:
            days: Number of days to look back (default: 30)
            source: Optional filter by source platform

        Returns:
            Dict with all computed insights
        """
        cutoff = time.time() - (days * 86400)

        # Gather raw data
        sessions = self._get_sessions(cutoff, source)
        tool_usage = self._get_tool_usage(cutoff, source)
        skill_usage = self._get_skill_usage(cutoff, source)
        message_stats = self._get_message_stats(cutoff, source)

        if not sessions:
            return {
                "days": days,
                "source_filter": source,
                "empty": True,
                "overview": {},
                "models": [],
                "platforms": [],
                "tools": [],
                "skills": {
                    "summary": {
                        "total_skill_loads": 0,
                        "total_skill_edits": 0,
                        "total_skill_actions": 0,
                        "distinct_skills_used": 0,
                    },
                    "top_skills": [],
                },
                "activity": {},
                "top_sessions": [],
            }

        # Compute insights
        overview = self._compute_overview(sessions, message_stats)
        models = self._compute_model_breakdown(sessions)
        platforms = self._compute_platform_breakdown(sessions)
        tools = self._compute_tool_breakdown(tool_usage)
        skills = self._compute_skill_breakdown(skill_usage)
        activity = self._compute_activity_patterns(sessions)
        top_sessions = self._compute_top_sessions(sessions)

        return {
            "days": days,
            "source_filter": source,
            "empty": False,
            "generated_at": time.time(),
            "overview": overview,
            "models": models,
            "platforms": platforms,
            "tools": tools,
            "skills": skills,
            "activity": activity,
            "top_sessions": top_sessions,
        }

    def _compute_skill_breakdown(self, skill_usage: List[Dict]) -> Dict[str, Any]:
        """Process per-skill usage into summary + ranked list."""
        total_skill_loads = sum(s["view_count"] for s in skill_usage) if skill_usage else 0
        total_skill_edits = sum(s["manage_count"] for s in skill_usage) if skill_usage else 0
        total_skill_actions = total_skill_loads + total_skill_edits

        top_skills = []
        for skill in skill_usage:
            total_count = skill["view_count"] + skill["manage_count"]
            percentage = (total_count / total_skill_actions * 100) if total_skill_actions else 0
            top_skills.append({
                "skill": skill["skill"],
                "view_count": skill["view_count"],
                "manage_count": skill["manage_count"],
                "total_count": total_count,
                "percentage": percentage,
                "last_used_at": skill.get("last_used_at"),
            })

        top_skills.sort(
            key=lambda s: (
                s["total_count"],
                s["view_count"],
                s["manage_count"],
                s["last_used_at"] or 0,
                s["skill"],
            ),
            reverse=True,
        )

        return {
            "summary": {
                "total_skill_loads": total_skill_loads,
                "total_skill_edits": total_skill_edits,
                "total_skill_actions": total_skill_actions,
                "distinct_skills_used": len(skill_usage),
            },
            "top_skills": top_skills,
        }

5 / 7

FTS5 Full-Text Search

plugins/memory/holographic/store.py:16

How the holographic memory backend indexes facts with SQLite FTS5 triggers and queries them by keyword rank

The holographic backend stores facts as rows in a facts table and keeps them searchable via SQLite FTS5, built into every modern SQLite distribution with no extra dependencies.

facts_fts is a content table (content=facts, content_rowid=fact_id): FTS5 stores only the inverted index, not a copy of the text. Three triggers keep the index in sync automatically:

facts_ai — inserts the new row into the index
facts_ad — removes the deleted row from the index
facts_au — issues a 'delete' command then re-inserts, preventing stale entries from shadowing updated facts

search_facts optionally injects a category clause into an f-string template. The clause itself is a hardcoded string; the category value always travels as a bound parameter. ORDER BY fts.rank uses FTS5's BM25-derived rank column (lower is better), with trust score as a tiebreaker for facts with equal keyword relevance.

After retrieval, a single bulk UPDATE increments retrieval_count for every returned fact. The HRR reranker in retrieval.py later uses this signal to boost frequently-recalled facts over equally-scored but rarely-seen ones.

Key takeaway

FTS5's content table keeps the index thin with no data duplication, and three triggers guarantee it never drifts from facts. The retrieval_count increment on every search creates a self-reinforcing loop: the most recalled facts become the most discoverable. ---

_SCHEMA = """
CREATE TABLE IF NOT EXISTS facts (
    fact_id         INTEGER PRIMARY KEY AUTOINCREMENT,
    content         TEXT NOT NULL UNIQUE,
    category        TEXT DEFAULT 'general',
    tags            TEXT DEFAULT '',
    trust_score     REAL DEFAULT 0.5,
    retrieval_count INTEGER DEFAULT 0,
    helpful_count   INTEGER DEFAULT 0,
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    hrr_vector      BLOB
);

CREATE TABLE IF NOT EXISTS entities (
    entity_id   INTEGER PRIMARY KEY AUTOINCREMENT,
    name        TEXT NOT NULL,
    entity_type TEXT DEFAULT 'unknown',
    aliases     TEXT DEFAULT '',
    created_at  TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE IF NOT EXISTS fact_entities (
    fact_id   INTEGER REFERENCES facts(fact_id),
    entity_id INTEGER REFERENCES entities(entity_id),
    PRIMARY KEY (fact_id, entity_id)
);

CREATE INDEX IF NOT EXISTS idx_facts_trust    ON facts(trust_score DESC);
CREATE INDEX IF NOT EXISTS idx_facts_category ON facts(category);
CREATE INDEX IF NOT EXISTS idx_entities_name  ON entities(name);

CREATE VIRTUAL TABLE IF NOT EXISTS facts_fts
    USING fts5(content, tags, content=facts, content_rowid=fact_id);

CREATE TRIGGER IF NOT EXISTS facts_ai AFTER INSERT ON facts BEGIN
    INSERT INTO facts_fts(rowid, content, tags)
        VALUES (new.fact_id, new.content, new.tags);
END;

CREATE TRIGGER IF NOT EXISTS facts_ad AFTER DELETE ON facts BEGIN
    INSERT INTO facts_fts(facts_fts, rowid, content, tags)
        VALUES ('delete', old.fact_id, old.content, old.tags);
END;

CREATE TRIGGER IF NOT EXISTS facts_au AFTER UPDATE ON facts BEGIN
    INSERT INTO facts_fts(facts_fts, rowid, content, tags)
        VALUES ('delete', old.fact_id, old.content, old.tags);
    INSERT INTO facts_fts(rowid, content, tags)
        VALUES (new.fact_id, new.content, new.tags);
END;

    def search_facts(
        self,
        query: str,
        category: str | None = None,
        min_trust: float = 0.3,
        limit: int = 10,
    ) -> list[dict]:
        """Full-text search over facts using FTS5.

        Returns a list of fact dicts ordered by FTS5 rank, then trust_score
        descending. Also increments retrieval_count for matched facts.
        """
        with self._lock:
            query = query.strip()
            if not query:
                return []

            params: list = [query, min_trust]
            category_clause = ""
            if category is not None:
                category_clause = "AND f.category = ?"
                params.append(category)
            params.append(limit)

            sql = f"""
                SELECT f.fact_id, f.content, f.category, f.tags,
                       f.trust_score, f.retrieval_count, f.helpful_count,
                       f.created_at, f.updated_at
                FROM facts f
                JOIN facts_fts fts ON fts.rowid = f.fact_id
                WHERE facts_fts MATCH ?
                  AND f.trust_score >= ?
                  {category_clause}
                ORDER BY fts.rank, f.trust_score DESC
                LIMIT ?
            """

            rows = self._conn.execute(sql, params).fetchall()
            results = [self._row_to_dict(r) for r in rows]

            if results:
                ids = [r["fact_id"] for r in results]
                placeholders = ",".join("?" * len(ids))
                self._conn.execute(
                    f"UPDATE facts SET retrieval_count = retrieval_count + 1 WHERE fact_id IN ({placeholders})",
                    ids,
                )
                self._conn.commit()

            return results

6 / 7

Honcho Dialectic User Modeling

plugins/memory/honcho/__init__.py:949

How Hermes queries Honcho's AI-native memory to synthesize a user model before each turn

    def get_prefetch_context(self, session_key: str, user_message: str | None = None) -> dict[str, str]:
        """
        Pre-fetch user and AI peer context from Honcho.

        Fetches peer_representation and peer_card for both peers, plus the
        session summary when available.
        """
        session = self._cache.get(session_key)
        if not session:
            return {}

        result: dict[str, str] = {}

        # Session summary — provides session-scoped context.
        try:
            honcho_session = self._sessions_cache.get(session.honcho_session_id)
            if honcho_session:
                ctx = honcho_session.context(summary=True)
                if ctx.summary and getattr(ctx.summary, "content", None):
                    result["summary"] = ctx.summary.content
        except Exception as e:
            logger.debug("Failed to fetch session summary from Honcho: %s", e)

        try:
            user_ctx = self._fetch_peer_context(session.user_peer_id, target=session.user_peer_id)
            result["representation"] = user_ctx["representation"]
            result["card"] = "\n".join(user_ctx["card"])
        except Exception as e:
            logger.warning("Failed to fetch user context from Honcho: %s", e)

        # Also fetch AI peer's own representation so Hermes knows itself.
        try:
            ai_ctx = self._fetch_peer_context(session.assistant_peer_id, target=session.assistant_peer_id)
            result["ai_representation"] = ai_ctx["representation"]
            result["ai_card"] = "\n".join(ai_ctx["card"])
        except Exception as e:
            logger.debug("Failed to fetch AI peer context from Honcho: %s", e)

        return result

Honcho models users as peers with persistent representations. Rather than indexing raw turns for keyword search, it exposes a peer.chat() endpoint that synthesizes an answer from everything it has observed about a user. The call is closer to asking a second model "what do you know about this person?" than to querying a table.

_run_dialectic_depth drives Hermes's configurable multi-pass interaction with that endpoint. At depth 1 (the default) it fires a single .chat() call. At depth 2 or 3, each subsequent pass receives the prior result and refines it. _signal_sufficient short-circuits the loop when the first pass already returns high-confidence signal, capping API spend.

get_prefetch_context assembles three layers of what Honcho knows:

representation — Honcho's synthesized prose description of the user
card — structured key-value summary (preferences, patterns, communication style)
summary — session-scoped context when enough history exists

Both the user peer and the AI peer are fetched, so Hermes knows who it is talking to and how it itself has been behaving across sessions.

user_message in get_prefetch_context is intentionally unused. Passing the raw message to Honcho's search_query field would expose conversation content in server logs. The dialectic call that follows is where the current message shapes synthesis.

Key takeaway

Honcho's dialectic mode inverts the usual search pattern: instead of finding stored facts that match the current query, it asks Honcho's own model to synthesize what is known about the user. The depth setting lets Hermes trade latency against synthesis quality per deployment. ---

    def _run_dialectic_depth(self, query: str) -> str:
        """Execute up to dialecticDepth .chat() calls with conditional bail-out.

        Cold start (no base context): general user-oriented query.
        Warm session (base context exists): session-scoped query.
        Each pass is conditional — bails early if prior pass returned strong signal.
        Returns the best (usually last) result.
        """
        if not self._manager or not self._session_key:
            return ""

        is_cold = not self._base_context_cache
        results: list[str] = []

        for i in range(self._dialectic_depth):
            if i == 0:
                prompt = self._build_dialectic_prompt(0, results, is_cold)
            else:
                # Skip further passes if prior pass delivered strong signal
                if results and self._signal_sufficient(results[-1]):
                    logger.debug("Honcho dialectic depth %d: pass %d skipped, prior signal sufficient",
                                 self._dialectic_depth, i)
                    break
                prompt = self._build_dialectic_prompt(i, results, is_cold)

            level = self._resolve_pass_level(i, query=query)
            logger.debug("Honcho dialectic depth %d: pass %d, level=%s, cold=%s",
                         self._dialectic_depth, i, level, is_cold)

            result = self._manager.dialectic_query(
                self._session_key, prompt,
                reasoning_level=level,
                peer="user",
            )
            results.append(result or "")

        # Return the last non-empty result (deepest pass that ran)
        for r in reversed(results):
            if r and r.strip():
                return r
        return ""

7 / 7

Cross-Session Recall

agent/memory_manager.py:287

The moment a new session retrieves prior memories and injects them into the model's context

    def prefetch(self, query: str, *, session_id: str = "") -> str:
        """Return base context (representation + card) plus dialectic supplement.

        Assembles two layers:
        1. Base context from peer.context() — cached, refreshed on context_cadence
        2. Dialectic supplement — cached, refreshed on dialectic_cadence
        """
        if self._cron_skipped:
            return ""

        # B1: tools-only mode — no auto-injection
        if self._recall_mode == "tools":
            return ""

        # B5: injection_frequency — if "first-turn" and past first turn, return empty.
        if self._injection_frequency == "first-turn" and self._turn_count > 1:
            return ""

        # Trivial prompts ("ok", "yes", slash commands) carry no semantic signal.
        if self._is_trivial_prompt(query):
            return ""

        parts = []

        # ----- Layer 1: Base context (representation + card) -----
        # On first call, fetch synchronously so turn 1 isn't empty.
        # After that, serve from cache and refresh in background on cadence.
        with self._base_context_lock:
            if self._base_context_cache is None:
                # First call — synchronous fetch
                try:
                    ctx = self._manager.get_prefetch_context(self._session_key)
                    self._base_context_cache = self._format_first_turn_context(ctx) if ctx else ""
                    self._last_context_turn = self._turn_count
                except Exception as e:
                    logger.debug("Honcho base context fetch failed: %s", e)
                    self._base_context_cache = ""
            base_context = self._base_context_cache

        # Check if background context prefetch has a fresher result
        if self._manager:
            fresh_ctx = self._manager.pop_context_result(self._session_key)
            if fresh_ctx:
                formatted = self._format_first_turn_context(fresh_ctx)
                if formatted:
                    with self._base_context_lock:
                        self._base_context_cache = formatted
                    base_context = formatted

        if base_context:
            parts.append(base_context)

Cross-session recall is the composition of prefetch_all in MemoryManager with each provider's prefetch, executed just before Hermes sends the session's first message to the model.

prefetch_all calls provider.prefetch(query) on every registered backend in registration order, joins non-empty results with double newlines, and returns the combined block. run_agent.py wraps this in a <memory-context> fence via build_memory_context_block. StreamingContextScrubber then strips those fences from streamed output so they never appear in the visible response.

HonchoMemoryProvider.prefetch manages a two-tier cache. On the first call of a session (_base_context_cache is None) it fetches synchronously, because turn 1 cannot be empty. After that it returns the cache immediately and refreshes asynchronously via queue_prefetch. The pop_context_result check lets a result that arrived between queue_prefetch and the next prefetch call preempt the stale cache without waiting for the next turn.

_is_trivial_prompt filters out one-word acknowledgements ("ok", "yes", "cool") and slash commands. Injecting a full memory block in response to those wastes tokens and can push stale context into replies that should be brief.

Key takeaway

Cross-session recall is a two-stage pipeline: prefetch_all fans out to all providers and joins their results, while each provider manages its own cache tier, synchronous on turn 1 and background-refreshed after. The model sees one fenced block of prior knowledge before it processes any user input. ---

    def prefetch_all(self, query: str, *, session_id: str = "") -> str:
        """Collect prefetch context from all providers.

        Returns merged context text labeled by provider. Empty providers
        are skipped. Failures in one provider don't block others.
        """
        parts = []
        for provider in self._providers:
            try:
                result = provider.prefetch(query, session_id=session_id)
                if result and result.strip():
                    parts.append(result)
            except Exception as e:
                logger.debug(
                    "Memory provider '%s' prefetch failed (non-fatal): %s",
                    provider.name, e,
                )
        return "\n\n".join(parts)

Your codebase next

Create code tours for your project

Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.

Install Intraview Free