The Multi-Platform Gateway: One Agent, Six Chat Platforms, Shared Sessions
How a Telegram message and a Discord DM share the same agent state inside Hermes
What you will learn
- How Hermes registers each platform through a self-describing `PlatformEntry` dataclass so new platforms can be added without modifying the gateway core
- How the `GatewayRunner` starts up, acquires a PID lock, and brings each platform adapter online
- What contract every platform adapter must satisfy: the three abstract methods that let Telegram, Discord, Slack, and WhatsApp speak the same language
- How `build_session_key` maps a (platform, chat_id, user_id) tuple to a deterministic string that becomes the conversation's identity
- How an inbound message travels from raw platform event through authorization, session lookup, agent run, and back to the platform's send transport
- How a finished agent response reaches the user through a `DeliveryRouter` that resolves platform-specific adapters at send time
- How the `PairingStore` lets a user share their Telegram and Discord identities to preserve session continuity across both platforms
Prerequisites
- Comfortable reading Python async code (you don't need to write it, just follow the `await` chain)
- Basic understanding of what a chat bot does: receives a message, generates a reply, sends it back
- Familiarity with the concept of a session (stateful conversation) is helpful but not required
Platform Registry
gateway/platform_registry.py:39How platforms register themselves so the gateway discovers them without hardcoded if/elif chains
PlatformEntry is a dataclass that encodes everything the gateway needs to know about a platform: a config-file name, a human label, a factory callable, a check_fn for dependency availability, an optional validate_config, and a platform_hint injected into the system prompt. That last field is what tells the model not to use Discord markdown when answering an SMS. The factory pattern (adapter_factory: Callable) is deliberate: plugin authors can wrap the constructor or pass extra kwargs without touching the gateway's discovery code.
PlatformRegistry is a thin dict wrapper with a three-gate admission path in create_adapter():
check_fn()— are the required pip packages installed?validate_config()— is the config well-formed?adapter_factory(config)— instantiate the adapter.
All three gates fail with a logged warning rather than raising, so the gateway starts with whatever platforms pass validation. The module-level platform_registry singleton is the shared instance; every module that needs the registry imports it directly.
Adding a new platform means defining a PlatformEntry and calling platform_registry.register() — not editing a growing if/elif tree in the gateway core.
---
class PlatformEntry:
"""Metadata and factory for a single platform adapter."""
# Identifier used in config.yaml (e.g. "irc", "viber").
name: str
# Human-readable label (e.g. "IRC", "Viber").
label: str
# Factory callable: receives a PlatformConfig, returns an adapter instance.
# Using a factory instead of a bare class lets plugins do custom init
# (e.g. passing extra kwargs, wrapping in try/except).
adapter_factory: Callable[[Any], Any]
# Returns True when the platform's dependencies are available.
check_fn: Callable[[], bool]
# Optional: given a PlatformConfig, is it properly configured?
# If None, the registry skips config validation and lets the adapter
# fail at connect() time with a descriptive error.
validate_config: Optional[Callable[[Any], bool]] = None
# Optional: given a PlatformConfig, is the platform connected/enabled?
# Used by ``GatewayConfig.get_connected_platforms()`` and setup UI status.
# If None, falls back to ``validate_config`` or ``check_fn``.
is_connected: Optional[Callable[[Any], bool]] = None
# Env vars this platform needs (for ``hermes setup`` display).
required_env: list = field(default_factory=list)
# Hint shown when check_fn returns False.
install_hint: str = ""
# Optional setup function for interactive configuration.
# Signature: () -> None (prompts user, saves env vars).
# If None, falls back to _setup_standard_platform (needs token_var + vars)
# or a generic "set these env vars" display.
setup_fn: Optional[Callable[[], None]] = None
# "builtin" or "plugin"
source: str = "plugin"
# Name of the plugin manifest that registered this entry (empty for
# built-ins). Used by ``hermes gateway setup`` to auto-enable the
# owning plugin when the user configures its platform.
plugin_name: str = ""
# ── Auth env var names (for _is_user_authorized integration) ──
# E.g. "IRC_ALLOWED_USERS" — checked for comma-separated user IDs.
allowed_users_env: str = ""
# E.g. "IRC_ALLOW_ALL_USERS" — if truthy, all users authorized.
allow_all_env: str = ""
# ── Message limits ──
# Max message length for smart-chunking. 0 = no limit.
max_message_length: int = 0
# ── Privacy ──
# If True, session descriptions redact PII (phone numbers, etc.)
pii_safe: bool = False
# ── Display ──
# Emoji for CLI/gateway display (e.g. "💬")
emoji: str = "🔌"
# Whether this platform should appear in _UPDATE_ALLOWED_PLATFORMS
# (allows /update command from this platform).
allow_update_command: bool = True
# ── LLM guidance ──
# Platform hint injected into the system prompt (e.g. "You are on IRC.
# Do not use markdown."). Empty string = no hint.
platform_hint: str = ""
class PlatformRegistry:
"""Central registry of platform adapters.
Thread-safe for reads (dict lookups are atomic under GIL).
Writes happen at startup during sequential discovery.
"""
def __init__(self) -> None:
self._entries: dict[str, PlatformEntry] = {}
def register(self, entry: PlatformEntry) -> None:
"""Register a platform adapter entry.
If an entry with the same name exists, it is replaced (last writer
wins -- this lets plugins override built-in adapters if desired).
"""
if entry.name in self._entries:
prev = self._entries[entry.name]
logger.info(
"Platform '%s' re-registered (was %s, now %s)",
entry.name,
prev.source,
entry.source,
)
self._entries[entry.name] = entry
logger.debug("Registered platform adapter: %s (%s)", entry.name, entry.source)
def unregister(self, name: str) -> bool:
"""Remove a platform entry. Returns True if it existed."""
return self._entries.pop(name, None) is not None
def get(self, name: str) -> Optional[PlatformEntry]:
"""Look up a platform entry by name."""
return self._entries.get(name)
def all_entries(self) -> list[PlatformEntry]:
"""Return all registered platform entries."""
return list(self._entries.values())
def plugin_entries(self) -> list[PlatformEntry]:
"""Return only plugin-registered platform entries."""
return [e for e in self._entries.values() if e.source == "plugin"]
def is_registered(self, name: str) -> bool:
return name in self._entries
def create_adapter(self, name: str, config: Any) -> Optional[Any]:
"""Create an adapter instance for the given platform name.
Returns None if:
- No entry registered for *name*
- check_fn() returns False (missing deps)
- validate_config() returns False (misconfigured)
- The factory raises an exception
"""
entry = self._entries.get(name)
if entry is None:
return None
if not entry.check_fn():
hint = f" ({entry.install_hint})" if entry.install_hint else ""
logger.warning(
"Platform '%s' requirements not met%s",
entry.label,
hint,
)
return None
if entry.validate_config is not None:
try:
if not entry.validate_config(config):
logger.warning(
"Platform '%s' config validation failed",
entry.label,
)
return None
except Exception as e:
logger.warning(
"Platform '%s' config validation error: %s",
entry.label,
e,
)
return None
try:
adapter = entry.adapter_factory(config)
return adapter
except Exception as e:
logger.error(
"Failed to create adapter for platform '%s': %s",
entry.label,
e,
exc_info=True,
)
return None
# Module-level singleton
platform_registry = PlatformRegistry()The Gateway Entry Point
gateway/run.py:12487How start_gateway() and main() wire the gateway together
start_gateway() opens with a duplicate-instance check against the PID file in ~/.hermes/. When --replace is passed, it writes a takeover marker naming the old PID before sending SIGTERM. The old process reads that marker in its shutdown handler and exits with code 0, which prevents systemd's Restart=on-failure from reviving it and starting a flap loop.
The PID lock is acquired before any platform adapter touches an external service. Without that ordering, two concurrent gateway run --replace calls could both pass the existing_pid check and both open Telegram and Discord connections at the same time.
MCP tool discovery runs in a thread-pool executor via run_in_executor, not inline. discover_mcp_tools() can block up to 120 seconds waiting for configured MCP servers; running it on the event-loop thread would freeze Telegram's long-poll heartbeat and Discord's shard keepalive for the entire startup window.
main() is the thin CLI shell: parse --config, call asyncio.run(start_gateway(...)), exit with code 1 on failure so systemd retries transient errors.
start_gateway() is a startup sequencer: PID locking, takeover handshake, signal wiring, and MCP discovery all complete in a fixed order before the first platform adapter comes online.
---
async def start_gateway(config: Optional[GatewayConfig] = None, replace: bool = False, verbosity: Optional[int] = 0) -> bool:
"""
Start the gateway and run until interrupted.
This is the main entry point for running the gateway.
Returns True if the gateway ran successfully, False if it failed to start.
A False return causes a non-zero exit code so systemd can auto-restart.
Args:
config: Optional gateway configuration override.
replace: If True, kill any existing gateway instance before starting.
Useful for systemd services to avoid restart-loop deadlocks
when the previous process hasn't fully exited yet.
"""
# ── Duplicate-instance guard ──────────────────────────────────────
# Prevent two gateways from running under the same HERMES_HOME.
# The PID file is scoped to HERMES_HOME, so future multi-profile
# setups (each profile using a distinct HERMES_HOME) will naturally
# allow concurrent instances without tripping this guard.
from gateway.status import (
acquire_gateway_runtime_lock,
get_running_pid,
get_process_start_time,
release_gateway_runtime_lock,
remove_pid_file,
terminate_pid,
)
existing_pid = get_running_pid()
if existing_pid is not None and existing_pid != os.getpid():
if replace:
existing_start_time = get_process_start_time(existing_pid)
logger.info(
"Replacing existing gateway instance (PID %d) with --replace.",
existing_pid,
)
# Record a takeover marker so the target's shutdown handler
# recognises its SIGTERM as a planned takeover and exits 0
# (rather than exit 1, which would trigger systemd's
# Restart=on-failure and start a flap loop against us).
# Best-effort — proceed even if the write fails.
try:
from gateway.status import write_takeover_marker
write_takeover_marker(existing_pid)
except Exception as e:
logger.debug("Could not write takeover marker: %s", e)
try:
terminate_pid(existing_pid, force=False)
except ProcessLookupError:
pass # Already gone
except (PermissionError, OSError):
logger.error(
"Permission denied killing PID %d. Cannot replace.",
existing_pid,
)
# Marker is scoped to a specific target; clean it up on
# give-up so it doesn't grief an unrelated future shutdown.
try:
from gateway.status import clear_takeover_marker
clear_takeover_marker()
except Exception:
pass
return False
# Wait up to 10 seconds for the old process to exit
for _ in range(20):
try:
os.kill(existing_pid, 0)
time.sleep(0.5)
except (ProcessLookupError, PermissionError):
break # Process is gone
else:
# Still alive after 10s — force kill
logger.warning(
"Old gateway (PID %d) did not exit after SIGTERM, sending SIGKILL.",
existing_pid,
)
try:
terminate_pid(existing_pid, force=True)
time.sleep(0.5)
except (ProcessLookupError, PermissionError, OSError):
pass
remove_pid_file()
# remove_pid_file() is a no-op when the PID doesn't match.
# Force-unlink to cover the old-process-crashed case.
try:
(get_hermes_home() / "gateway.pid").unlink(missing_ok=True)
except Exception:
pass
# Clean up any takeover marker the old process didn't consume
# (e.g. SIGKILL'd before its shutdown handler could read it).
try:
from gateway.status import clear_takeover_marker
clear_takeover_marker()
except Exception:
pass
# Also release all scoped locks left by the old process.
# Stopped (Ctrl+Z) processes don't release locks on exit,
# leaving stale lock files that block the new gateway from starting.
try:
from gateway.status import release_all_scoped_locks
_released = release_all_scoped_locks(
owner_pid=existing_pid,
owner_start_time=existing_start_time,
)
if _released:
logger.info("Released %d stale scoped lock(s) from old gateway.", _released)
except Exception:
pass
else:
hermes_home = str(get_hermes_home())
logger.error(
"Another gateway instance is already running (PID %d, HERMES_HOME=%s). "
"Use 'hermes gateway restart' to replace it, or 'hermes gateway stop' first.",
existing_pid, hermes_home,
)
print(
f"\n❌ Gateway already running (PID {existing_pid}).\n"
f" Use 'hermes gateway restart' to replace it,\n"
f" or 'hermes gateway stop' to kill it first.\n"
f" Or use 'hermes gateway run --replace' to auto-replace.\n"
)
return False
# Sync bundled skills on gateway start (fast -- skips unchanged)
try:
from tools.skills_sync import sync_skills
sync_skills(quiet=True)
except Exception:
pass
# Centralized logging — agent.log (INFO+), errors.log (WARNING+),
# and gateway.log (INFO+, gateway-component records only).
# Idempotent, so repeated calls from AIAgent.__init__ won't duplicate.
from hermes_logging import setup_logging
setup_logging(hermes_home=_hermes_home, mode="gateway")
# Optional stderr handler — level driven by -v/-q flags on the CLI.
# verbosity=None (-q/--quiet): no stderr output
# verbosity=0 (default): WARNING and above
# verbosity=1 (-v): INFO and above
# verbosity=2+ (-vv/-vvv): DEBUG
if verbosity is not None:
from agent.redact import RedactingFormatter
_stderr_level = {0: logging.WARNING, 1: logging.INFO}.get(verbosity, logging.DEBUG)
_stderr_handler = logging.StreamHandler()
_stderr_handler.setLevel(_stderr_level)
_stderr_handler.setFormatter(RedactingFormatter('%(levelname)s %(name)s: %(message)s'))
logging.getLogger().addHandler(_stderr_handler)
# Lower root logger level if needed so DEBUG records can reach the handler
if _stderr_level < logging.getLogger().level:
logging.getLogger().setLevel(_stderr_level)
runner = GatewayRunner(config)
# Track whether a signal initiated the shutdown (vs. internal request).
# When an unexpected SIGTERM kills the gateway, we exit non-zero so
# systemd's Restart=on-failure revives the process. systemctl stop
# is safe: systemd tracks stop-requested state independently of exit
# code, so Restart= never fires for a deliberate stop.
_signal_initiated_shutdown = False
# Set up signal handlers
def shutdown_signal_handler():
nonlocal _signal_initiated_shutdown
# Planned --replace takeover check: when a sibling gateway is
# taking over via --replace, it wrote a marker naming this PID
# before sending SIGTERM. If present, treat the signal as a
# planned shutdown and exit 0 so systemd's Restart=on-failure
# doesn't revive us (which would flap-fight the replacer when
# both services are enabled, e.g. hermes.service + hermes-
# gateway.service from pre-rename installs).
planned_takeover = False
try:
from gateway.status import consume_takeover_marker_for_self
planned_takeover = consume_takeover_marker_for_self()
except Exception as e:
logger.debug("Takeover marker check failed: %s", e)
if planned_takeover:
logger.info(
"Received SIGTERM as a planned --replace takeover — exiting cleanly"
)
else:
_signal_initiated_shutdown = True
logger.info("Received SIGTERM/SIGINT — initiating shutdown")
# Diagnostic: log all hermes-related processes so we can identify
# what triggered the signal (hermes update, hermes gateway restart,
# a stale detached subprocess, etc.).
try:
import subprocess as _sp
_ps = _sp.run(
["ps", "aux"],
capture_output=True, text=True, timeout=3,
)
_hermes_procs = [
line for line in _ps.stdout.splitlines()
if ("hermes" in line.lower() or "gateway" in line.lower())
and str(os.getpid()) not in line.split()[1:2] # exclude self
]
if _hermes_procs:
logger.warning(
"Shutdown diagnostic — other hermes processes running:\n %s",
"\n ".join(_hermes_procs),
)
else:
logger.info("Shutdown diagnostic — no other hermes processes found")
except Exception:
pass
asyncio.create_task(runner.stop())
def restart_signal_handler():
runner.request_restart(detached=False, via_service=True)
loop = asyncio.get_running_loop()
if threading.current_thread() is threading.main_thread():
for sig in (signal.SIGINT, signal.SIGTERM):
try:
loop.add_signal_handler(sig, shutdown_signal_handler)
except NotImplementedError:
pass
if hasattr(signal, "SIGUSR1"):
try:
loop.add_signal_handler(signal.SIGUSR1, restart_signal_handler)
except NotImplementedError:
pass
else:
logger.info("Skipping signal handlers (not running in main thread).")
# Claim the PID file BEFORE bringing up any platform adapters.
# This closes the --replace race window: two concurrent `gateway run
# --replace` invocations both pass the termination-wait above, but
# only the winner of the O_CREAT|O_EXCL race below will ever open
# Telegram polling, Discord gateway sockets, etc. The loser exits
# cleanly before touching any external service.
import atexit
from gateway.status import write_pid_file, remove_pid_file, get_running_pid
_current_pid = get_running_pid()
if _current_pid is not None and _current_pid != os.getpid():
logger.error(
"Another gateway instance (PID %d) started during our startup. "
"Exiting to avoid double-running.", _current_pid
)
return False
if not acquire_gateway_runtime_lock():
logger.error(
"Gateway runtime lock is already held by another instance. Exiting."
)
return False
try:
write_pid_file()
except FileExistsError:
release_gateway_runtime_lock()
logger.error(
"PID file race lost to another gateway instance. Exiting."
)
return False
atexit.register(remove_pid_file)
atexit.register(release_gateway_runtime_lock)
# MCP tool discovery — run in an executor so the asyncio event loop
# stays responsive even when a configured MCP server is slow or
# unreachable. discover_mcp_tools() uses a blocking 120s wait
# internally; calling it from the loop thread would freeze platform
# heartbeats (Discord shard, Telegram polling) until it returned.
# See #16856.
try:
from tools.mcp_tool import discover_mcp_tools
_loop = asyncio.get_running_loop()
await _loop.run_in_executor(None, discover_mcp_tools)
except Exception as e:
logger.debug("MCP tool discovery failed: %s", e)
# Start the gateway
success = await runner.start()
if not success:
return False
if runner.should_exit_cleanly:
if runner.exit_reason:
logger.error("Gateway exiting cleanly: %s", runner.exit_reason)
return True
# Start background cron ticker so scheduled jobs fire automatically.
# Pass the event loop so cron delivery can use live adapters (E2EE support).
cron_stop = threading.Event()
cron_thread = threading.Thread(
target=_start_cron_ticker,
args=(cron_stop,),
kwargs={"adapters": runner.adapters, "loop": asyncio.get_running_loop()},
daemon=True,
name="cron-ticker",
)
cron_thread.start()
# Wait for shutdown
await runner.wait_for_shutdown()
if runner.should_exit_with_failure:
if runner.exit_reason:
logger.error("Gateway exiting with failure: %s", runner.exit_reason)
return False
# Stop cron ticker cleanly
cron_stop.set()
cron_thread.join(timeout=5)
# Close MCP server connections
try:
from tools.mcp_tool import shutdown_mcp_servers
shutdown_mcp_servers()
except Exception:
pass
if runner.exit_code is not None:
raise SystemExit(runner.exit_code)
# When a signal (SIGTERM/SIGINT) caused the shutdown and it wasn't a
# planned restart (/restart, /update, SIGUSR1), exit non-zero so
# systemd's Restart=on-failure revives the process. This covers:
# - hermes update killing the gateway mid-work
# - External kill commands
# - WSL2/container runtime sending unexpected signals
# systemctl stop is safe: systemd tracks "stop requested" state
# independently of exit code, so Restart= never fires for it.
if _signal_initiated_shutdown and not runner._restart_requested:
logger.info(
"Exiting with code 1 (signal-initiated shutdown without restart "
"request) so systemd Restart=on-failure can revive the gateway."
)
return False # → sys.exit(1) in the caller
return True
def main():
"""CLI entry point for the gateway."""
import argparse
parser = argparse.ArgumentParser(description="Hermes Gateway - Multi-platform messaging")
parser.add_argument("--config", "-c", help="Path to gateway config file")
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose output")
args = parser.parse_args()
config = None
if args.config:
import yaml
with open(args.config, encoding="utf-8") as f:
data = yaml.safe_load(f) or {}
config = GatewayConfig.from_dict(data)
# Run the gateway - exit with code 1 if no platforms connected,
# so systemd Restart=on-failure will retry on transient errors (e.g. DNS)
success = asyncio.run(start_gateway(config))
if not success:
sys.exit(1)
if __name__ == "__main__":Platform Adapter Interface
gateway/platforms/base.py:1160The abstract contract that every platform adapter must satisfy
Every platform adapter — Telegram, Discord, Slack, WhatsApp, Signal, and others — implements exactly three abstract methods: connect(), disconnect(), and send(). That is the entire surface the gateway core depends on. Everything else (webhook handlers, thread IDs, voice channels, reactions) stays inside the subclass and never reaches the routing layer.
__init__ sets up the interrupt infrastructure shared by all adapters. _active_sessions holds one asyncio.Event per live session key. When a second message arrives while the agent is still running, the adapter checks this dict and either interrupts the in-flight task or parks the new message in _pending_messages. _session_tasks maps session key to the specific asyncio.Task so /stop cancels exactly the right coroutine.
_post_delivery_callbacks is a one-shot registry that fires after the reply goes out. It decouples post-response behaviors — checkmark reactions, status bar updates — from the agent run path. The metadata dict on send() is the platform-specific escape hatch: thread_id for Slack, parse_mode for Telegram, embeds for Discord, all passed through without the base class knowing any of them.
Three abstract methods are the entire contract the gateway core needs from a platform; platform-specific behavior is contained inside the subclass and surfaced only through the opaque metadata dict.
---
class BasePlatformAdapter(ABC):
"""
Base class for platform adapters.
Subclasses implement platform-specific logic for:
- Connecting and authenticating
- Receiving messages
- Sending messages/responses
- Handling media
"""
def __init__(self, config: PlatformConfig, platform: Platform):
self.config = config
self.platform = platform
self._message_handler: Optional[MessageHandler] = None
self._running = False
self._fatal_error_code: Optional[str] = None
self._fatal_error_message: Optional[str] = None
self._fatal_error_retryable = True
self._fatal_error_handler: Optional[Callable[["BasePlatformAdapter"], Awaitable[None] | None]] = None
# Track active message handlers per session for interrupt support.
# _active_sessions stores the per-session interrupt Event; _session_tasks
# maps session → the specific Task currently processing it so that
# session-terminating commands (/stop, /new, /reset) can cancel the
# right task and release the adapter-level guard deterministically.
# Without the owner-task map, an old task's finally block could delete
# a newer task's guard, leaving stale busy state.
self._active_sessions: Dict[str, asyncio.Event] = {}
self._pending_messages: Dict[str, MessageEvent] = {}
self._session_tasks: Dict[str, asyncio.Task] = {}
# Background message-processing tasks spawned by handle_message().
# Gateway shutdown cancels these so an old gateway instance doesn't keep
# working on a task after --replace or manual restarts.
self._background_tasks: set[asyncio.Task] = set()
# One-shot callbacks to fire after the main response is delivered.
# Keyed by session_key. Values are either a bare callback (legacy) or
# a ``(generation, callback)`` tuple so GatewayRunner can make deferred
# deliveries generation-aware and avoid stale runs clearing callbacks
# registered by a fresher run for the same session.
self._post_delivery_callbacks: Dict[str, Any] = {}
self._expected_cancelled_tasks: set[asyncio.Task] = set()
self._busy_session_handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]] = None
# Auto-TTS on voice input: ``_auto_tts_default`` is the global default
# (``voice.auto_tts`` in config.yaml, pushed by GatewayRunner on connect).
# Per-chat overrides live in two sets populated from ``_voice_mode``:
# - ``_auto_tts_enabled_chats``: chat explicitly opted in via ``/voice on``
# or ``/voice tts`` (mode is ``voice_only`` or ``all``). Fires even when
# the global default is False.
# - ``_auto_tts_disabled_chats``: chat explicitly opted out via
# ``/voice off`` (mode is ``off``). Suppresses auto-TTS even when the
# global default is True.
# The gate in _process_message() is:
# fire if chat in _auto_tts_enabled_chats
# OR (_auto_tts_default and chat not in _auto_tts_disabled_chats)
self._auto_tts_default: bool = False
self._auto_tts_enabled_chats: set = set()
self._auto_tts_disabled_chats: set = set()
# Chats where typing indicator is paused (e.g. during approval waits).
# _keep_typing skips send_typing when the chat_id is in this set.
self._typing_paused: set = set()
@property
def has_fatal_error(self) -> bool:
return self._fatal_error_message is not None
@property
def fatal_error_message(self) -> Optional[str]:
return self._fatal_error_message
@property
def fatal_error_code(self) -> Optional[str]:
return self._fatal_error_code
@property
def fatal_error_retryable(self) -> bool:
return self._fatal_error_retryable
def _should_auto_tts_for_chat(self, chat_id: str) -> bool:
"""Whether auto-TTS on voice input should fire for ``chat_id``.
Decision layers (Issue #16007):
1. Explicit ``/voice on`` or ``/voice tts`` → always fire (even if
``voice.auto_tts`` is False).
2. Explicit ``/voice off`` → never fire.
3. Fall back to the global ``voice.auto_tts`` config default.
"""
if chat_id in self._auto_tts_enabled_chats:
return True
if chat_id in self._auto_tts_disabled_chats:
return False
return bool(self._auto_tts_default)
def set_fatal_error_handler(self, handler: Callable[["BasePlatformAdapter"], Awaitable[None] | None]) -> None:
self._fatal_error_handler = handler
def _mark_connected(self) -> None:
self._running = True
self._fatal_error_code = None
self._fatal_error_message = None
self._fatal_error_retryable = True
try:
from gateway.status import write_runtime_status
write_runtime_status(platform=self.platform.value, platform_state="connected", error_code=None, error_message=None)
except Exception:
pass
def _mark_disconnected(self) -> None:
self._running = False
if self.has_fatal_error:
return
try:
from gateway.status import write_runtime_status
write_runtime_status(platform=self.platform.value, platform_state="disconnected", error_code=None, error_message=None)
except Exception:
pass
def _set_fatal_error(self, code: str, message: str, *, retryable: bool) -> None:
self._running = False
self._fatal_error_code = code
self._fatal_error_message = message
self._fatal_error_retryable = retryable
try:
from gateway.status import write_runtime_status
write_runtime_status(
platform=self.platform.value,
platform_state="fatal",
error_code=code,
error_message=message,
)
except Exception:
pass
async def _notify_fatal_error(self) -> None:
handler = self._fatal_error_handler
if not handler:
return
result = handler(self)
if asyncio.iscoroutine(result):
await result
def _acquire_platform_lock(self, scope: str, identity: str, resource_desc: str) -> bool:
"""Acquire a scoped lock for this adapter. Returns True on success."""
from gateway.status import acquire_scoped_lock
self._platform_lock_scope = scope
self._platform_lock_identity = identity
acquired, existing = acquire_scoped_lock(
scope, identity, metadata={'platform': self.platform.value}
)
if acquired:
return True
owner_pid = existing.get('pid') if isinstance(existing, dict) else None
message = (
f'{resource_desc} already in use'
+ (f' (PID {owner_pid})' if owner_pid else '')
+ '. Stop the other gateway first.'
)
logger.error('[%s] %s', self.name, message)
self._set_fatal_error(f'{scope}_lock', message, retryable=False)
return False
def _release_platform_lock(self) -> None:
"""Release the scoped lock acquired by _acquire_platform_lock."""
identity = getattr(self, '_platform_lock_identity', None)
if not identity:
return
from gateway.status import release_scoped_lock
release_scoped_lock(self._platform_lock_scope, identity)
self._platform_lock_identity = None
@property
def name(self) -> str:
"""Human-readable name for this adapter."""
return self.platform.value.title()
@property
def is_connected(self) -> bool:
"""Check if adapter is currently connected."""
return self._running
def set_message_handler(self, handler: MessageHandler) -> None:
"""
Set the handler for incoming messages.
The handler receives a MessageEvent and should return
an optional response string.
"""
self._message_handler = handler
def set_busy_session_handler(self, handler: Optional[Callable[[MessageEvent, str], Awaitable[bool]]]) -> None:
"""Set an optional handler for messages arriving during active sessions."""
self._busy_session_handler = handler
def set_session_store(self, session_store: Any) -> None:
"""
Set the session store for checking active sessions.
Used by adapters that need to check if a thread/conversation
has an active session before processing messages (e.g., Slack
thread replies without explicit mentions).
"""
self._session_store = session_store
@abstractmethod
async def connect(self) -> bool:
"""
Connect to the platform and start receiving messages.
Returns True if connection was successful.
"""
pass
@abstractmethod
async def disconnect(self) -> None:
"""Disconnect from the platform."""
pass
@abstractmethod
async def send(
self,
chat_id: str,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None
) -> SendResult:
"""
Send a message to a chat.
Args:
chat_id: The chat/channel ID to send to
content: Message content (may be markdown)
reply_to: Optional message ID to reply to
metadata: Additional platform-specific options
Returns:
SendResult with success status and message ID
"""
pass
# Default: the adapter treats ``finalize=True`` on edit_message as a
# no-op and is happy to have the stream consumer skip redundant finalSession Resolution
gateway/session.py:583How build_session_key and get_or_create_session map a platform message to a shared conversation
build_session_key produces human-readable strings like agent:main:telegram:dm:123456789 for a Telegram DM, or agent:main:discord:group:987654321:user42 for a Discord channel with per-user isolation. The format is stable as both a dict key and a SQLite primary key, which is what makes cross-platform pairing possible: two platform-specific keys can resolve to the same underlying session.
The WhatsApp special case (canonical_whatsapp_identifier) exists because WhatsApp bridges sometimes switch between a user's JID and LID form mid-conversation. Without canonicalization, a single user accumulates two separate sessions after an alias-reshuffle.
The thread isolation defaults reflect different social expectations:
- Threads (
thread_sessions_per_user=False): shared by default — a Slack thread is a group discussion. - Groups (
group_sessions_per_user=True): isolated by default — three people in a Discord channel should not inherit each other's conversation state.
get_or_create_session runs inside a mutex to prevent races when two messages from the same user arrive milliseconds apart. The resume_pending flag handles crash recovery: the next message reloads the transcript intact rather than starting a fresh session.
Session identity is a deterministic string, not an opaque UUID — which is precisely what lets a cross-platform pairing point two different platform keys at the same underlying conversation. ---
def build_session_key(
source: SessionSource,
group_sessions_per_user: bool = True,
thread_sessions_per_user: bool = False,
) -> str:
"""Build a deterministic session key from a message source.
This is the single source of truth for session key construction.
DM rules:
- DMs include chat_id when present, so each private conversation is isolated.
- thread_id further differentiates threaded DMs within the same DM chat.
- Without chat_id, thread_id is used as a best-effort fallback.
- Without thread_id or chat_id, DMs share a single session.
Group/channel rules:
- chat_id identifies the parent group/channel.
- user_id/user_id_alt isolates participants within that parent chat when available when
``group_sessions_per_user`` is enabled.
- thread_id differentiates threads within that parent chat. When
``thread_sessions_per_user`` is False (default), threads are *shared* across all
participants — user_id is NOT appended, so every user in the thread
shares a single session. This is the expected UX for threaded
conversations (Telegram forum topics, Discord threads, Slack threads).
- Without participant identifiers, or when isolation is disabled, messages fall back to one
shared session per chat.
- Without identifiers, messages fall back to one session per platform/chat_type.
"""
platform = source.platform.value
if source.chat_type == "dm":
dm_chat_id = source.chat_id
if source.platform == Platform.WHATSAPP:
dm_chat_id = canonical_whatsapp_identifier(source.chat_id)
if dm_chat_id:
if source.thread_id:
return f"agent:main:{platform}:dm:{dm_chat_id}:{source.thread_id}"
return f"agent:main:{platform}:dm:{dm_chat_id}"
if source.thread_id:
return f"agent:main:{platform}:dm:{source.thread_id}"
return f"agent:main:{platform}:dm"
participant_id = source.user_id_alt or source.user_id
if participant_id and source.platform == Platform.WHATSAPP:
# Same JID/LID-flip bug as the DM case: without canonicalisation, a
# single group member gets two isolated per-user sessions when the
# bridge reshuffles alias forms.
participant_id = canonical_whatsapp_identifier(str(participant_id)) or participant_id
key_parts = ["agent:main", platform, source.chat_type]
if source.chat_id:
key_parts.append(source.chat_id)
if source.thread_id:
key_parts.append(source.thread_id)
# In threads, default to shared sessions (all participants see the same
# conversation). Per-user isolation only applies when explicitly enabled
# via thread_sessions_per_user, or when there is no thread (regular group).
isolate_user = group_sessions_per_user
if source.thread_id and not thread_sessions_per_user:
isolate_user = False
if isolate_user and participant_id:
key_parts.append(str(participant_id))
return ":".join(key_parts)
def get_or_create_session(
self,
source: SessionSource,
force_new: bool = False
) -> SessionEntry:
"""
Get an existing session or create a new one.
Evaluates reset policy to determine if the existing session is stale.
Creates a session record in SQLite when a new session starts.
"""
session_key = self._generate_session_key(source)
now = _now()
# SQLite calls are made outside the lock to avoid holding it during I/O.
# All _entries / _loaded mutations are protected by self._lock.
db_end_session_id = None
db_create_kwargs = None
with self._lock:
self._ensure_loaded_locked()
if session_key in self._entries and not force_new:
entry = self._entries[session_key]
# Auto-reset sessions marked as suspended (e.g. after /stop
# broke a stuck loop — #7536). ``suspended`` is the hard
# forced-wipe signal and always wins over ``resume_pending``,
# so repeated interrupted restarts that escalate via the
# existing ``.restart_failure_counts`` stuck-loop counter
# still converge to a clean slate.
if entry.suspended:
reset_reason = "suspended"
elif entry.resume_pending:
# Restart-interrupted session: preserve the session_id
# and return the existing entry so the transcript
# reloads intact. ``resume_pending`` is cleared after
# the NEXT successful turn completes (not here), which
# means a re-interrupted retry keeps trying — the
# stuck-loop counter handles terminal escalation.
entry.updated_at = now
self._save()
return entry
else:
reset_reason = self._should_reset(entry, source)
if not reset_reason:
entry.updated_at = now
self._save()
return entry
else:
# Session is being auto-reset.
was_auto_reset = True
auto_reset_reason = reset_reason
# Track whether the expired session had any real conversation
reset_had_activity = entry.total_tokens > 0
db_end_session_id = entry.session_id
else:
was_auto_reset = False
auto_reset_reason = None
reset_had_activity = False
# Create new session
session_id = f"{now.strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:8]}"
entry = SessionEntry(
session_key=session_key,
session_id=session_id,
created_at=now,
updated_at=now,
origin=source,
display_name=source.chat_name,
platform=source.platform,
chat_type=source.chat_type,
was_auto_reset=was_auto_reset,
auto_reset_reason=auto_reset_reason,
reset_had_activity=reset_had_activity,
)
self._entries[session_key] = entry
self._save()
db_create_kwargs = {
"session_id": session_id,
"source": source.platform.value,
"user_id": source.user_id,
}
# SQLite operations outside the lock
if self._db and db_end_session_id:
try:
self._db.end_session(db_end_session_id, "session_reset")
except Exception as e:
logger.debug("Session DB operation failed: %s", e)
if self._db and db_create_kwargs:
try:
self._db.create_session(**db_create_kwargs)
except Exception as e:
print(f"[gateway] Warning: Failed to create SQLite session: {e}")
return entry
def update_session(Inbound Message Flow
gateway/run.py:3749How a raw platform event travels through authorization, session lookup, and agent dispatch
_handle_message is the single entry point for every inbound event, regardless of platform. By the time a message arrives here, the platform adapter has already normalized it: a Telegram Update, a Discord Message, and a Slack event_callback are all the same MessageEvent dataclass with a source: SessionSource and a text: str.
The pre_gateway_dispatch hook runs before authorization by design. A plugin can handle a customer-support handover — receiving messages from users not yet in the allowlist — without hitting the pairing gate. The three possible responses cover the full integration surface:
skip— swallow and handle elsewhererewrite— transform the text, then continueallow/None— pass through unchanged
When authorization fails for a DM, the gateway calls pairing_store.generate_code() and sends the user a one-time code rather than silently dropping the message. The rate limit applies to the response itself (one per user per 10 minutes), so a spamming user receives at most one reply.
_handle_message_with_agent is the inner continuation after auth passes. The run_generation counter is a monotonic integer per session key — if a second message interrupts the first before the agent finishes, the stale result is discarded.
The inbound pipeline enforces a fixed order: normalize -> plugin hook -> authorization -> session lookup. Platform-specific behavior cannot short-circuit the security or session logic. ---
async def _handle_message(self, event: MessageEvent) -> Optional[str]:
"""
Handle an incoming message from any platform.
This is the core message processing pipeline:
1. Check user authorization
2. Check for commands (/new, /reset, etc.)
3. Check for running agent and interrupt if needed
4. Get or create session
5. Build context for agent
6. Run agent conversation
7. Return response
"""
source = event.source
# Internal events (e.g. background-process completion notifications)
# are system-generated and must skip user authorization.
is_internal = bool(getattr(event, "internal", False))
# Fire pre_gateway_dispatch plugin hook for user-originated messages.
# Plugins receive the MessageEvent and may return a dict influencing flow:
# {"action": "skip", "reason": ...} -> drop (no reply, plugin handled)
# {"action": "rewrite", "text": ...} -> replace event.text, continue
# {"action": "allow"} / None -> normal dispatch
# Hook runs BEFORE auth so plugins can handle unauthorized senders
# (e.g. customer handover ingest) without triggering the pairing flow.
if not is_internal:
try:
from hermes_cli.plugins import invoke_hook as _invoke_hook
_hook_results = _invoke_hook(
"pre_gateway_dispatch",
event=event,
gateway=self,
session_store=self.session_store,
)
except Exception as _hook_exc:
logger.warning("pre_gateway_dispatch invocation failed: %s", _hook_exc)
_hook_results = []
for _result in _hook_results:
if not isinstance(_result, dict):
continue
_action = _result.get("action")
if _action == "skip":
logger.info(
"pre_gateway_dispatch skip: reason=%s platform=%s chat=%s",
_result.get("reason"),
source.platform.value if source.platform else "unknown",
source.chat_id or "unknown",
)
return None
if _action == "rewrite":
_new_text = _result.get("text")
if isinstance(_new_text, str):
event = dataclasses.replace(event, text=_new_text)
source = event.source
break
if _action == "allow":
break
if is_internal:
pass
elif source.user_id is None:
# Messages with no user identity (Telegram service messages,
# channel forwards, anonymous admin actions) cannot be
# authorized — drop silently instead of triggering the pairing
# flow with a None user_id.
logger.debug("Ignoring message with no user_id from %s", source.platform.value)
return None
elif not self._is_user_authorized(source):
logger.warning("Unauthorized user: %s (%s) on %s", source.user_id, source.user_name, source.platform.value)
# In DMs: offer pairing code. In groups: silently ignore.
if source.chat_type == "dm" and self._get_unauthorized_dm_behavior(source.platform) == "pair":
platform_name = source.platform.value if source.platform else "unknown"
# Rate-limit ALL pairing responses (code or rejection) to
# prevent spamming the user with repeated messages when
# multiple DMs arrive in quick succession.
if self.pairing_store._is_rate_limited(platform_name, source.user_id):
return None
code = self.pairing_store.generate_code(
platform_name, source.user_id, source.user_name or ""
)
if code:
adapter = self.adapters.get(source.platform)
if adapter:
await adapter.send(
source.chat_id,
f"Hi~ I don't recognize you yet!\n\n"
f"Here's your pairing code: `{code}`\n\n"
f"Ask the bot owner to run:\n"
f"`hermes pairing approve {platform_name} {code}`"
)
else:
adapter = self.adapters.get(source.platform)
if adapter:
await adapter.send(
source.chat_id,
async def _handle_message_with_agent(self, event, source, _quick_key: str, run_generation: int):
"""Inner handler that runs under the _running_agents sentinel guard."""
_msg_start_time = time.time()
_platform_name = source.platform.value if hasattr(source.platform, "value") else str(source.platform)
_msg_preview = (event.text or "")[:80].replace("\n", " ")
logger.info(
"inbound message: platform=%s user=%s chat=%s msg=%r",
_platform_name, source.user_name or source.user_id or "unknown",
source.chat_id or "unknown", _msg_preview,
)
# Get or create session
session_entry = self.session_store.get_or_create_session(source)
session_key = session_entry.session_key
if getattr(session_entry, "was_auto_reset", False):
# Treat auto-reset as a full conversation boundary — drop every
# session-scoped transient state so the fresh session does not
# inherit the previous conversation's model/reasoning overrides
# or a queued "/model switched" note.
self._session_model_overrides.pop(session_key, None)
self._set_session_reasoning_override(session_key, None)
if hasattr(self, "_pending_model_notes"):
self._pending_model_notes.pop(session_key, None)
# Emit session:start for new or auto-reset sessions
_is_new_session = (
session_entry.created_at == session_entry.updated_at
or getattr(session_entry, "was_auto_reset", False)
)
if _is_new_session:
await self.hooks.emit("session:start", {
"platform": source.platform.value if source.platform else "",
"user_id": source.user_id,
"session_id": session_entry.session_id,
"session_key": session_key,
})
# Build session context
context = build_session_context(source, self.config, session_entry)Outbound Message Flow
gateway/run.py:5382How a finished agent response reaches the user through the right platform transport
Both outbound paths — interactive replies and scheduled cron deliveries — converge on a single call to adapter.send(chat_id, content, metadata=...). The routing layer never knows whether it is calling bot.send_message(), channel.send(), or client.chat_postMessage().
For conversational replies, _handle_message_with_agent gets the agent's final_response, stops the typing indicator before returning (Telegram and Discord both show a "typing..." animation while the agent runs), then hands the string to the platform adapter's message handler. The typing stop lives in the gateway layer, not inside _run_agent(), because the agent has no reference to the adapter that started the indicator.
For cron and scheduled deliveries, DeliveryRouter._deliver_to_platform resolves the adapter, applies a length guard (oversized output is saved to ~/.hermes/cron/output/ and replaced with a truncation notice), merges thread_id into the metadata dict, then calls the same adapter.send() contract. The metadata dict carries platform-specific options — thread_id for Slack, silently ignored by Telegram — without any special-casing in the delivery layer.
Every reply converges on one call to adapter.send() — typing indicators, thread handling, and length guards are all resolved in the gateway layer before that call, so the delivery path stays platform-agnostic.
---
# Run the agent
agent_result = await self._run_agent(
message=message_text,
context_prompt=context_prompt,
history=history,
source=source,
session_id=session_entry.session_id,
session_key=session_key,
run_generation=run_generation,
event_message_id=event.message_id,
channel_prompt=event.channel_prompt,
)
# Stop persistent typing indicator now that the agent is done
try:
_typing_adapter = self.adapters.get(source.platform)
if _typing_adapter and hasattr(_typing_adapter, "stop_typing"):
await _typing_adapter.stop_typing(source.chat_id)
except Exception:
pass
if not self._is_session_run_current(_quick_key, run_generation):
logger.info(
"Discarding stale agent result for %s — generation %d is no longer current",
_quick_key or "?",
run_generation,
)
_stale_adapter = self.adapters.get(source.platform)
if getattr(type(_stale_adapter), "pop_post_delivery_callback", None) is not None:
_stale_adapter.pop_post_delivery_callback(
_quick_key,
generation=run_generation,
)
elif _stale_adapter and hasattr(_stale_adapter, "_post_delivery_callbacks"):
_stale_adapter._post_delivery_callbacks.pop(_quick_key, None)
return None
response = agent_result.get("final_response") or ""
# Convert the agent's internal "(empty)" sentinel into a
# user-friendly message. "(empty)" means the model failed to
# produce visible content after exhausting all retries (nudge,
# prefill, empty-retry, fallback). Sending the raw sentinel
# looks like a bug; a short explanation is more helpful.
class DeliveryRouter:
"""
Routes messages to appropriate destinations.
Handles the logic of resolving delivery targets and dispatching
messages to the right platform adapters.
"""
def __init__(self, config: GatewayConfig, adapters: Dict[Platform, Any] = None):
"""
Initialize the delivery router.
Args:
config: Gateway configuration
adapters: Dict mapping platforms to their adapter instances
"""
self.config = config
self.adapters = adapters or {}
self.output_dir = get_hermes_home() / "cron" / "output"
async def deliver(
self,
content: str,
targets: List[DeliveryTarget],
job_id: Optional[str] = None,
job_name: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Deliver content to all specified targets.
Args:
content: The message/output to deliver
targets: List of delivery targets
job_id: Optional job ID (for cron jobs)
job_name: Optional job name
metadata: Additional metadata to include
Returns:
Dict with delivery results per target
"""
results = {}
for target in targets:
try:
if target.platform == Platform.LOCAL:
result = self._deliver_local(content, job_id, job_name, metadata)
else:
result = await self._deliver_to_platform(target, content, metadata)
results[target.to_string()] = {
"success": True,
"result": result
}
except Exception as e:
results[target.to_string()] = {
"success": False,
"error": str(e)
}
return results
def _deliver_local(
self,
content: str,
job_id: Optional[str],
job_name: Optional[str],
metadata: Optional[Dict[str, Any]]
) -> Dict[str, Any]:
"""Save content to local files."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
if job_id:
output_path = self.output_dir / job_id / f"{timestamp}.md"
else:
output_path = self.output_dir / "misc" / f"{timestamp}.md"
output_path.parent.mkdir(parents=True, exist_ok=True)
# Build the output document
lines = []
if job_name:
lines.append(f"# {job_name}")
else:
lines.append("# Delivery Output")
lines.append("")
lines.append(f"**Timestamp:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
if job_id:
lines.append(f"**Job ID:** {job_id}")
if metadata:
for key, value in metadata.items():
lines.append(f"**{key}:** {value}")
lines.append("")
lines.append("---")
lines.append("")
lines.append(content)
output_path.write_text("\n".join(lines))
return {
"path": str(output_path),
"timestamp": timestamp
}
def _save_full_output(self, content: str, job_id: str) -> Path:
"""Save full cron output to disk and return the file path."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
out_dir = get_hermes_home() / "cron" / "output"
out_dir.mkdir(parents=True, exist_ok=True)
path = out_dir / f"{job_id}_{timestamp}.txt"
path.write_text(content)
return path
async def _deliver_to_platform(
self,
target: DeliveryTarget,
content: str,
metadata: Optional[Dict[str, Any]]
) -> Dict[str, Any]:
"""Deliver content to a messaging platform."""
adapter = self.adapters.get(target.platform)
if not adapter:
raise ValueError(f"No adapter configured for {target.platform.value}")
if not target.chat_id:
raise ValueError(f"No chat ID for {target.platform.value} delivery")
# Guard: truncate oversized cron output to stay within platform limits
if len(content) > MAX_PLATFORM_OUTPUT:
job_id = (metadata or {}).get("job_id", "unknown")
saved_path = self._save_full_output(content, job_id)
logger.info("Cron output truncated (%d chars) — full output: %s", len(content), saved_path)
content = (
content[:TRUNCATED_VISIBLE]
+ f"\n\n... [truncated, full output saved to {saved_path}]"
)
send_metadata = dict(metadata or {})
if target.thread_id and "thread_id" not in send_metadata:
send_metadata["thread_id"] = target.thread_id
return await adapter.send(target.chat_id, content, metadata=send_metadata or None)
Cross-Platform Pairing
gateway/pairing.py:35How a user links their Telegram and Discord identities to share session state
Pairing replaces static allowlists with a one-time-code flow. A new Discord user messages the bot; generate_code() picks 8 characters from a 32-character unambiguous alphabet (no 0/O/1/I) using secrets.choice() and sends the code to the user. The bot owner runs hermes pairing approve discord XXXXXX, which calls approve_code() — that moves the entry from discord-pending.json to discord-approved.json. Subsequent messages from that user pass _is_user_authorized() and enter the normal pipeline.
The same flow enables cross-platform session continuity. When a Telegram-approved user pairs their Discord identity, build_session_key produces matching keys for both platforms, and get_or_create_session returns the same SessionEntry regardless of which platform sends the next message.
The security model addresses several attack surfaces:
- Codes expire after 1 hour (
CODE_TTL_SECONDS = 3600) - At most 3 pending codes per platform at once
- 5 failed approval attempts trigger a 1-hour platform-scoped lockout
The lockout is platform-scoped rather than user-scoped because repeated failures indicate code-guessing against the approval endpoint, not per-user abuse. All data files are written atomically via temp-file-plus-rename with chmod 0600, so a crash mid-write never leaves a partial JSON file.
A bot-owner-approved code is the only mechanism that links a new platform user ID to an existing session — giving Hermes a secure, consistent onboarding flow across all supported platforms. ---
ALPHABET = "ABCDEFGHJKLMNPQRSTUVWXYZ23456789"
CODE_LENGTH = 8
# Timing constants
CODE_TTL_SECONDS = 3600 # Codes expire after 1 hour
RATE_LIMIT_SECONDS = 600 # 1 request per user per 10 minutes
LOCKOUT_SECONDS = 3600 # Lockout duration after too many failures
# Limits
MAX_PENDING_PER_PLATFORM = 3 # Max pending codes per platform
MAX_FAILED_ATTEMPTS = 5 # Failed approvals before lockout
PAIRING_DIR = get_hermes_dir("platforms/pairing", "pairing")
def _secure_write(path: Path, data: str) -> None:
"""Write data to file with restrictive permissions (owner read/write only).
Uses a temp-file + atomic rename so readers always see either the old
complete file or the new one — never a partial write.
"""
path.parent.mkdir(parents=True, exist_ok=True)
fd, tmp_path = tempfile.mkstemp(dir=str(path.parent), suffix=".tmp")
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
f.write(data)
f.flush()
os.fsync(f.fileno())
atomic_replace(tmp_path, path)
try:
os.chmod(path, 0o600)
except OSError:
pass # Windows doesn't support chmod the same way
except BaseException:
try:
os.unlink(tmp_path)
except OSError:
pass
raise
class PairingStore:
"""
Manages pairing codes and approved user lists.
Data files per platform:
- {platform}-pending.json : pending pairing requests
- {platform}-approved.json : approved (paired) users
- _rate_limits.json : rate limit tracking
"""
def __init__(self):
PAIRING_DIR.mkdir(parents=True, exist_ok=True)
# Protects all read-modify-write cycles. The gateway runs multiple
# platform adapters concurrently in threads sharing one PairingStore.
self._lock = threading.RLock()
def _pending_path(self, platform: str) -> Path:
return PAIRING_DIR / f"{platform}-pending.json"
def _approved_path(self, platform: str) -> Path:
return PAIRING_DIR / f"{platform}-approved.json"
def _rate_limit_path(self) -> Path:
return PAIRING_DIR / "_rate_limits.json"
def _load_json(self, path: Path) -> dict:
if path.exists():
try:
return json.loads(path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError):
return {}
return {}
def _save_json(self, path: Path, data: dict) -> None:
_secure_write(path, json.dumps(data, indent=2, ensure_ascii=False))
# ----- Approved users -----
def is_approved(self, platform: str, user_id: str) -> bool:
"""Check if a user is approved (paired) on a platform."""
approved = self._load_json(self._approved_path(platform))
return user_id in approved
def list_approved(self, platform: str = None) -> list:
"""List approved users, optionally filtered by platform."""
results = []
platforms = [platform] if platform else self._all_platforms("approved")
for p in platforms:
approved = self._load_json(self._approved_path(p))
for uid, info in approved.items():
results.append({"platform": p, "user_id": uid, **info})
return results
def _approve_user(self, platform: str, user_id: str, user_name: str = "") -> None:
"""Add a user to the approved list. Must be called under self._lock."""
approved = self._load_json(self._approved_path(platform))
approved[user_id] = {
"user_name": user_name,
"approved_at": time.time(),
}
self._save_json(self._approved_path(platform), approved)
def revoke(self, platform: str, user_id: str) -> bool:
"""Remove a user from the approved list. Returns True if found."""
path = self._approved_path(platform)
with self._lock:
approved = self._load_json(path)
if user_id in approved:
del approved[user_id]
self._save_json(path, approved)
return True
return False
# ----- Pending codes -----
def generate_code(
self, platform: str, user_id: str, user_name: str = ""
) -> Optional[str]:
"""
Generate a pairing code for a new user.
Returns the code string, or None if:
- User is rate-limited (too recent request)
- Max pending codes reached for this platform
- User/platform is in lockout due to failed attempts
"""
with self._lock:
self._cleanup_expired(platform)
# Check lockout
if self._is_locked_out(platform):
return None
# Check rate limit for this specific user
if self._is_rate_limited(platform, user_id):
return None
# Check max pending
pending = self._load_json(self._pending_path(platform))
if len(pending) >= MAX_PENDING_PER_PLATFORM:
return None
# Generate cryptographically random code
code = "".join(secrets.choice(ALPHABET) for _ in range(CODE_LENGTH))
# Store pending request
pending[code] = {
"user_id": user_id,
"user_name": user_name,
"created_at": time.time(),
}
self._save_json(self._pending_path(platform), pending)
# Record rate limit
self._record_rate_limit(platform, user_id)
return code
def approve_code(self, platform: str, code: str) -> Optional[dict]:
"""
Approve a pairing code. Adds the user to the approved list.
Returns {user_id, user_name} on success, None if code is invalid/expired.
"""
with self._lock:
self._cleanup_expired(platform)
code = code.upper().strip()
pending = self._load_json(self._pending_path(platform))
if code not in pending:
self._record_failed_attempt(platform)
return None
entry = pending.pop(code)
self._save_json(self._pending_path(platform), pending)
# Add to approved list
self._approve_user(platform, entry["user_id"], entry.get("user_name", ""))
return {
"user_id": entry["user_id"],
"user_name": entry.get("user_name", ""),
}
def list_pending(self, platform: str = None) -> list:You've walked through 7 key areas of the Hermes Agent codebase.
Continue: Context Compression: Keeping Long Conversations Inside Token Budgets → Browse all projectsCreate code tours for your project
Intraview lets AI create interactive walkthroughs of any codebase. Install the free VS Code extension and generate your first tour in minutes.
Install Intraview Free