The architecture has been updated

This commit is contained in:
Skyber_2 2026-03-31 23:31:36 +03:00
parent 805f7a017e
commit a01257ead9
1119 changed files with 226 additions and 352 deletions

View file

@ -0,0 +1,8 @@
{
"label": "Developer Guide",
"position": 3,
"link": {
"type": "generated-index",
"description": "Contribute to Hermes Agent — architecture, tools, skills, and more."
}
}

View file

@ -0,0 +1,182 @@
---
sidebar_position: 2
title: "ACP Internals"
description: "How the ACP adapter works: lifecycle, sessions, event bridge, approvals, and tool rendering"
---
# ACP Internals
The ACP adapter wraps Hermes' synchronous `AIAgent` in an async JSON-RPC stdio server.
Key implementation files:
- `acp_adapter/entry.py`
- `acp_adapter/server.py`
- `acp_adapter/session.py`
- `acp_adapter/events.py`
- `acp_adapter/permissions.py`
- `acp_adapter/tools.py`
- `acp_adapter/auth.py`
- `acp_registry/agent.json`
## Boot flow
```text
hermes acp / hermes-acp / python -m acp_adapter
-> acp_adapter.entry.main()
-> load ~/.hermes/.env
-> configure stderr logging
-> construct HermesACPAgent
-> acp.run_agent(agent)
```
Stdout is reserved for ACP JSON-RPC transport. Human-readable logs go to stderr.
## Major components
### `HermesACPAgent`
`acp_adapter/server.py` implements the ACP agent protocol.
Responsibilities:
- initialize / authenticate
- new/load/resume/fork/list/cancel session methods
- prompt execution
- session model switching
- wiring sync AIAgent callbacks into ACP async notifications
### `SessionManager`
`acp_adapter/session.py` tracks live ACP sessions.
Each session stores:
- `session_id`
- `agent`
- `cwd`
- `model`
- `history`
- `cancel_event`
The manager is thread-safe and supports:
- create
- get
- remove
- fork
- list
- cleanup
- cwd updates
### Event bridge
`acp_adapter/events.py` converts AIAgent callbacks into ACP `session_update` events.
Bridged callbacks:
- `tool_progress_callback`
- `thinking_callback`
- `step_callback`
- `message_callback`
Because `AIAgent` runs in a worker thread while ACP I/O lives on the main event loop, the bridge uses:
```python
asyncio.run_coroutine_threadsafe(...)
```
### Permission bridge
`acp_adapter/permissions.py` adapts dangerous terminal approval prompts into ACP permission requests.
Mapping:
- `allow_once` -> Hermes `once`
- `allow_always` -> Hermes `always`
- reject options -> Hermes `deny`
Timeouts and bridge failures deny by default.
### Tool rendering helpers
`acp_adapter/tools.py` maps Hermes tools to ACP tool kinds and builds editor-facing content.
Examples:
- `patch` / `write_file` -> file diffs
- `terminal` -> shell command text
- `read_file` / `search_files` -> text previews
- large results -> truncated text blocks for UI safety
## Session lifecycle
```text
new_session(cwd)
-> create SessionState
-> create AIAgent(platform="acp", enabled_toolsets=["hermes-acp"])
-> bind task_id/session_id to cwd override
prompt(..., session_id)
-> extract text from ACP content blocks
-> reset cancel event
-> install callbacks + approval bridge
-> run AIAgent in ThreadPoolExecutor
-> update session history
-> emit final agent message chunk
```
### Cancelation
`cancel(session_id)`:
- sets the session cancel event
- calls `agent.interrupt()` when available
- causes the prompt response to return `stop_reason="cancelled"`
### Forking
`fork_session()` deep-copies message history into a new live session, preserving conversation state while giving the fork its own session ID and cwd.
## Provider/auth behavior
ACP does not implement its own auth store.
Instead it reuses Hermes' runtime resolver:
- `acp_adapter/auth.py`
- `hermes_cli/runtime_provider.py`
So ACP advertises and uses the currently configured Hermes provider/credentials.
## Working directory binding
ACP sessions carry an editor cwd.
The session manager binds that cwd to the ACP session ID via task-scoped terminal/file overrides, so file and terminal tools operate relative to the editor workspace.
## Duplicate same-name tool calls
The event bridge tracks tool IDs FIFO per tool name, not just one ID per name. This is important for:
- parallel same-name calls
- repeated same-name calls in one step
Without FIFO queues, completion events would attach to the wrong tool invocation.
## Approval callback restoration
ACP temporarily installs an approval callback on the terminal tool during prompt execution, then restores the previous callback afterward. This avoids leaving ACP session-specific approval handlers installed globally forever.
## Current limitations
- ACP sessions are process-local from the ACP server's point of view
- non-text prompt blocks are currently ignored for request text extraction
- editor-specific UX varies by ACP client implementation
## Related files
- `tests/acp/` — ACP test suite
- `toolsets.py``hermes-acp` toolset definition
- `hermes_cli/main.py``hermes acp` CLI subcommand
- `pyproject.toml``[acp]` optional dependency + `hermes-acp` script

View file

@ -0,0 +1,424 @@
---
sidebar_position: 5
title: "Adding Providers"
description: "How to add a new inference provider to Hermes Agent — auth, runtime resolution, CLI flows, adapters, tests, and docs"
---
# Adding Providers
Hermes can already talk to any OpenAI-compatible endpoint through the custom provider path. Do not add a built-in provider unless you want first-class UX for that service:
- provider-specific auth or token refresh
- a curated model catalog
- setup / `hermes model` menu entries
- provider aliases for `provider:model` syntax
- a non-OpenAI API shape that needs an adapter
If the provider is just "another OpenAI-compatible base URL and API key", a named custom provider may be enough.
## The mental model
A built-in provider has to line up across a few layers:
1. `hermes_cli/auth.py` decides how credentials are found.
2. `hermes_cli/runtime_provider.py` turns that into runtime data:
- `provider`
- `api_mode`
- `base_url`
- `api_key`
- `source`
3. `run_agent.py` uses `api_mode` to decide how requests are built and sent.
4. `hermes_cli/models.py`, `hermes_cli/main.py`, and `hermes_cli/setup.py` make the provider show up in the CLI.
5. `agent/auxiliary_client.py` and `agent/model_metadata.py` keep side tasks and token budgeting working.
The important abstraction is `api_mode`.
- Most providers use `chat_completions`.
- Codex uses `codex_responses`.
- Anthropic uses `anthropic_messages`.
- A new non-OpenAI protocol usually means adding a new adapter and a new `api_mode` branch.
## Choose the implementation path first
### Path A — OpenAI-compatible provider
Use this when the provider accepts standard chat-completions style requests.
Typical work:
- add auth metadata
- add model catalog / aliases
- add runtime resolution
- add CLI menu wiring
- add aux-model defaults
- add tests and user docs
You usually do not need a new adapter or a new `api_mode`.
### Path B — Native provider
Use this when the provider does not behave like OpenAI chat completions.
Examples in-tree today:
- `codex_responses`
- `anthropic_messages`
This path includes everything from Path A plus:
- a provider adapter in `agent/`
- `run_agent.py` branches for request building, dispatch, usage extraction, interrupt handling, and response normalization
- adapter tests
## File checklist
### Required for every built-in provider
1. `hermes_cli/auth.py`
2. `hermes_cli/models.py`
3. `hermes_cli/runtime_provider.py`
4. `hermes_cli/main.py`
5. `hermes_cli/setup.py`
6. `agent/auxiliary_client.py`
7. `agent/model_metadata.py`
8. tests
9. user-facing docs under `website/docs/`
### Additional for native / non-OpenAI providers
10. `agent/<provider>_adapter.py`
11. `run_agent.py`
12. `pyproject.toml` if a provider SDK is required
## Step 1: Pick one canonical provider id
Choose a single provider id and use it everywhere.
Examples from the repo:
- `openai-codex`
- `kimi-coding`
- `minimax-cn`
That same id should appear in:
- `PROVIDER_REGISTRY` in `hermes_cli/auth.py`
- `_PROVIDER_LABELS` in `hermes_cli/models.py`
- `_PROVIDER_ALIASES` in both `hermes_cli/auth.py` and `hermes_cli/models.py`
- CLI `--provider` choices in `hermes_cli/main.py`
- setup / model selection branches
- auxiliary-model defaults
- tests
If the id differs between those files, the provider will feel half-wired: auth may work while `/model`, setup, or runtime resolution silently misses it.
## Step 2: Add auth metadata in `hermes_cli/auth.py`
For API-key providers, add a `ProviderConfig` entry to `PROVIDER_REGISTRY` with:
- `id`
- `name`
- `auth_type="api_key"`
- `inference_base_url`
- `api_key_env_vars`
- optional `base_url_env_var`
Also add aliases to `_PROVIDER_ALIASES`.
Use the existing providers as templates:
- simple API-key path: Z.AI, MiniMax
- API-key path with endpoint detection: Kimi, Z.AI
- native token resolution: Anthropic
- OAuth / auth-store path: Nous, OpenAI Codex
Questions to answer here:
- What env vars should Hermes check, and in what priority order?
- Does the provider need base-URL overrides?
- Does it need endpoint probing or token refresh?
- What should the auth error say when credentials are missing?
If the provider needs something more than "look up an API key", add a dedicated credential resolver instead of shoving logic into unrelated branches.
## Step 3: Add model catalog and aliases in `hermes_cli/models.py`
Update the provider catalog so the provider works in menus and in `provider:model` syntax.
Typical edits:
- `_PROVIDER_MODELS`
- `_PROVIDER_LABELS`
- `_PROVIDER_ALIASES`
- provider display order inside `list_available_providers()`
- `provider_model_ids()` if the provider supports a live `/models` fetch
If the provider exposes a live model list, prefer that first and keep `_PROVIDER_MODELS` as the static fallback.
This file is also what makes inputs like these work:
```text
anthropic:claude-sonnet-4-6
kimi:model-name
```
If aliases are missing here, the provider may authenticate correctly but still fail in `/model` parsing.
## Step 4: Resolve runtime data in `hermes_cli/runtime_provider.py`
`resolve_runtime_provider()` is the shared path used by CLI, gateway, cron, ACP, and helper clients.
Add a branch that returns a dict with at least:
```python
{
"provider": "your-provider",
"api_mode": "chat_completions", # or your native mode
"base_url": "https://...",
"api_key": "...",
"source": "env|portal|auth-store|explicit",
"requested_provider": requested_provider,
}
```
If the provider is OpenAI-compatible, `api_mode` should usually stay `chat_completions`.
Be careful with API-key precedence. Hermes already contains logic to avoid leaking an OpenRouter key to unrelated endpoints. A new provider should be equally explicit about which key goes to which base URL.
## Step 5: Wire the CLI in `hermes_cli/main.py` and `hermes_cli/setup.py`
A provider is not discoverable until it shows up in the interactive flows.
Update:
### `hermes_cli/main.py`
- `provider_labels`
- provider dispatch inside the `model` command
- `--provider` argument choices
- login/logout choices if the provider supports those flows
- a `_model_flow_<provider>()` function, or reuse `_model_flow_api_key_provider()` if it fits
### `hermes_cli/setup.py`
- `provider_choices`
- auth branch for the provider
- model-selection branch
- any provider-specific explanatory text
- any place where a provider should be excluded from OpenRouter-only prompts or routing settings
If you only update one of these files, `hermes model` and `hermes setup` will drift.
## Step 6: Keep auxiliary calls working
Two files matter here:
### `agent/auxiliary_client.py`
Add a cheap / fast default aux model to `_API_KEY_PROVIDER_AUX_MODELS` if this is a direct API-key provider.
Auxiliary tasks include things like:
- vision summarization
- web extraction summarization
- context compression summaries
- session-search summaries
- memory flushes
If the provider has no sensible aux default, side tasks may fall back badly or use an expensive main model unexpectedly.
### `agent/model_metadata.py`
Add context lengths for the provider's models so token budgeting, compression thresholds, and limits stay sane.
## Step 7: If the provider is native, add an adapter and `run_agent.py` support
If the provider is not plain chat completions, isolate the provider-specific logic in `agent/<provider>_adapter.py`.
Keep `run_agent.py` focused on orchestration. It should call adapter helpers, not hand-build provider payloads inline all over the file.
A native provider usually needs work in these places:
### New adapter file
Typical responsibilities:
- build the SDK / HTTP client
- resolve tokens
- convert OpenAI-style conversation messages to the provider's request format
- convert tool schemas if needed
- normalize provider responses back into what `run_agent.py` expects
- extract usage and finish-reason data
### `run_agent.py`
Search for `api_mode` and audit every switch point. At minimum, verify:
- `__init__` chooses the new `api_mode`
- client construction works for the provider
- `_build_api_kwargs()` knows how to format requests
- `_api_call_with_interrupt()` dispatches to the right client call
- interrupt / client rebuild paths work
- response validation accepts the provider's shape
- finish-reason extraction is correct
- token-usage extraction is correct
- fallback-model activation can switch into the new provider cleanly
- summary-generation and memory-flush paths still work
Also search `run_agent.py` for `self.client.`. Any code path that assumes the standard OpenAI client exists can break when a native provider uses a different client object or `self.client = None`.
### Prompt caching and provider-specific request fields
Prompt caching and provider-specific knobs are easy to regress.
Examples already in-tree:
- Anthropic has a native prompt-caching path
- OpenRouter gets provider-routing fields
- not every provider should receive every request-side option
When you add a native provider, double-check that Hermes is only sending fields that provider actually understands.
## Step 8: Tests
At minimum, touch the tests that guard provider wiring.
Common places:
- `tests/test_runtime_provider_resolution.py`
- `tests/test_cli_provider_resolution.py`
- `tests/test_cli_model_command.py`
- `tests/test_setup_model_selection.py`
- `tests/test_provider_parity.py`
- `tests/test_run_agent.py`
- `tests/test_<provider>_adapter.py` for a native provider
For docs-only examples, the exact file set may differ. The point is to cover:
- auth resolution
- CLI menu / provider selection
- runtime provider resolution
- agent execution path
- provider:model parsing
- any adapter-specific message conversion
Run tests with xdist disabled:
```bash
source venv/bin/activate
python -m pytest tests/test_runtime_provider_resolution.py tests/test_cli_provider_resolution.py tests/test_cli_model_command.py tests/test_setup_model_selection.py -n0 -q
```
For deeper changes, run the full suite before pushing:
```bash
source venv/bin/activate
python -m pytest tests/ -n0 -q
```
## Step 9: Live verification
After tests, run a real smoke test.
```bash
source venv/bin/activate
python -m hermes_cli.main chat -q "Say hello" --provider your-provider --model your-model
```
Also test the interactive flows if you changed menus:
```bash
source venv/bin/activate
python -m hermes_cli.main model
python -m hermes_cli.main setup
```
For native providers, verify at least one tool call too, not just a plain text response.
## Step 10: Update user-facing docs
If the provider is meant to ship as a first-class option, update the user docs too:
- `website/docs/getting-started/quickstart.md`
- `website/docs/user-guide/configuration.md`
- `website/docs/reference/environment-variables.md`
A developer can wire the provider perfectly and still leave users unable to discover the required env vars or setup flow.
## OpenAI-compatible provider checklist
Use this if the provider is standard chat completions.
- [ ] `ProviderConfig` added in `hermes_cli/auth.py`
- [ ] aliases added in `hermes_cli/auth.py` and `hermes_cli/models.py`
- [ ] model catalog added in `hermes_cli/models.py`
- [ ] runtime branch added in `hermes_cli/runtime_provider.py`
- [ ] CLI wiring added in `hermes_cli/main.py`
- [ ] setup wiring added in `hermes_cli/setup.py`
- [ ] aux model added in `agent/auxiliary_client.py`
- [ ] context lengths added in `agent/model_metadata.py`
- [ ] runtime / CLI tests updated
- [ ] user docs updated
## Native provider checklist
Use this when the provider needs a new protocol path.
- [ ] everything in the OpenAI-compatible checklist
- [ ] adapter added in `agent/<provider>_adapter.py`
- [ ] new `api_mode` supported in `run_agent.py`
- [ ] interrupt / rebuild path works
- [ ] usage and finish-reason extraction works
- [ ] fallback path works
- [ ] adapter tests added
- [ ] live smoke test passes
## Common pitfalls
### 1. Adding the provider to auth but not to model parsing
That makes credentials resolve correctly while `/model` and `provider:model` inputs fail.
### 2. Forgetting that `config["model"]` can be a string or a dict
A lot of provider-selection code has to normalize both forms.
### 3. Assuming a built-in provider is required
If the service is just OpenAI-compatible, a custom provider may already solve the user problem with less maintenance.
### 4. Forgetting auxiliary paths
The main chat path can work while summarization, memory flushes, or vision helpers fail because aux routing was never updated.
### 5. Native-provider branches hiding in `run_agent.py`
Search for `api_mode` and `self.client.`. Do not assume the obvious request path is the only one.
### 6. Sending OpenRouter-only knobs to other providers
Fields like provider routing belong only on the providers that support them.
### 7. Updating `hermes model` but not `hermes setup`
Both flows need to know about the provider.
## Good search targets while implementing
If you are hunting for all the places a provider touches, search these symbols:
- `PROVIDER_REGISTRY`
- `_PROVIDER_ALIASES`
- `_PROVIDER_MODELS`
- `resolve_runtime_provider`
- `_model_flow_`
- `provider_choices`
- `api_mode`
- `_API_KEY_PROVIDER_AUX_MODELS`
- `self.client.`
## Related docs
- [Provider Runtime Resolution](./provider-runtime.md)
- [Architecture](./architecture.md)
- [Contributing](./contributing.md)

View file

@ -0,0 +1,208 @@
---
sidebar_position: 2
title: "Adding Tools"
description: "How to add a new tool to Hermes Agent — schemas, handlers, registration, and toolsets"
---
# Adding Tools
Before writing a tool, ask yourself: **should this be a [skill](creating-skills.md) instead?**
Make it a **Skill** when the capability can be expressed as instructions + shell commands + existing tools (arXiv search, git workflows, Docker management, PDF processing).
Make it a **Tool** when it requires end-to-end integration with API keys, custom processing logic, binary data handling, or streaming (browser automation, TTS, vision analysis).
## Overview
Adding a tool touches **3 files**:
1. **`tools/your_tool.py`** — handler, schema, check function, `registry.register()` call
2. **`toolsets.py`** — add tool name to `_HERMES_CORE_TOOLS` (or a specific toolset)
3. **`model_tools.py`** — add `"tools.your_tool"` to the `_discover_tools()` list
## Step 1: Create the Tool File
Every tool file follows the same structure:
```python
# tools/weather_tool.py
"""Weather Tool -- look up current weather for a location."""
import json
import os
import logging
logger = logging.getLogger(__name__)
# --- Availability check ---
def check_weather_requirements() -> bool:
"""Return True if the tool's dependencies are available."""
return bool(os.getenv("WEATHER_API_KEY"))
# --- Handler ---
def weather_tool(location: str, units: str = "metric") -> str:
"""Fetch weather for a location. Returns JSON string."""
api_key = os.getenv("WEATHER_API_KEY")
if not api_key:
return json.dumps({"error": "WEATHER_API_KEY not configured"})
try:
# ... call weather API ...
return json.dumps({"location": location, "temp": 22, "units": units})
except Exception as e:
return json.dumps({"error": str(e)})
# --- Schema ---
WEATHER_SCHEMA = {
"name": "weather",
"description": "Get current weather for a location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or coordinates (e.g. 'London' or '51.5,-0.1')"
},
"units": {
"type": "string",
"enum": ["metric", "imperial"],
"description": "Temperature units (default: metric)",
"default": "metric"
}
},
"required": ["location"]
}
}
# --- Registration ---
from tools.registry import registry
registry.register(
name="weather",
toolset="weather",
schema=WEATHER_SCHEMA,
handler=lambda args, **kw: weather_tool(
location=args.get("location", ""),
units=args.get("units", "metric")),
check_fn=check_weather_requirements,
requires_env=["WEATHER_API_KEY"],
)
```
### Key Rules
:::danger Important
- Handlers **MUST** return a JSON string (via `json.dumps()`), never raw dicts
- Errors **MUST** be returned as `{"error": "message"}`, never raised as exceptions
- The `check_fn` is called when building tool definitions — if it returns `False`, the tool is silently excluded
- The `handler` receives `(args: dict, **kwargs)` where `args` is the LLM's tool call arguments
:::
## Step 2: Add to a Toolset
In `toolsets.py`, add the tool name:
```python
# If it should be available on all platforms (CLI + messaging):
_HERMES_CORE_TOOLS = [
...
"weather", # <-- add here
]
# Or create a new standalone toolset:
"weather": {
"description": "Weather lookup tools",
"tools": ["weather"],
"includes": []
},
```
## Step 3: Add Discovery Import
In `model_tools.py`, add the module to the `_discover_tools()` list:
```python
def _discover_tools():
_modules = [
...
"tools.weather_tool", # <-- add here
]
```
This import triggers the `registry.register()` call at the bottom of your tool file.
## Async Handlers
If your handler needs async code, mark it with `is_async=True`:
```python
async def weather_tool_async(location: str) -> str:
async with aiohttp.ClientSession() as session:
...
return json.dumps(result)
registry.register(
name="weather",
toolset="weather",
schema=WEATHER_SCHEMA,
handler=lambda args, **kw: weather_tool_async(args.get("location", "")),
check_fn=check_weather_requirements,
is_async=True, # registry calls _run_async() automatically
)
```
The registry handles async bridging transparently — you never call `asyncio.run()` yourself.
## Handlers That Need task_id
Tools that manage per-session state receive `task_id` via `**kwargs`:
```python
def _handle_weather(args, **kw):
task_id = kw.get("task_id")
return weather_tool(args.get("location", ""), task_id=task_id)
registry.register(
name="weather",
...
handler=_handle_weather,
)
```
## Agent-Loop Intercepted Tools
Some tools (`todo`, `memory`, `session_search`, `delegate_task`) need access to per-session agent state. These are intercepted by `run_agent.py` before reaching the registry. The registry still holds their schemas, but `dispatch()` returns a fallback error if the intercept is bypassed.
## Optional: Setup Wizard Integration
If your tool requires an API key, add it to `hermes_cli/config.py`:
```python
OPTIONAL_ENV_VARS = {
...
"WEATHER_API_KEY": {
"description": "Weather API key for weather lookup",
"prompt": "Weather API key",
"url": "https://weatherapi.com/",
"tools": ["weather"],
"password": True,
},
}
```
## Checklist
- [ ] Tool file created with handler, schema, check function, and registration
- [ ] Added to appropriate toolset in `toolsets.py`
- [ ] Discovery import added to `model_tools.py`
- [ ] Handler returns JSON strings, errors returned as `{"error": "..."}`
- [ ] Optional: API key added to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
- [ ] Optional: Added to `toolset_distributions.py` for batch processing
- [ ] Tested with `hermes chat -q "Use the weather tool for London"`

View file

@ -0,0 +1,112 @@
---
sidebar_position: 3
title: "Agent Loop Internals"
description: "Detailed walkthrough of AIAgent execution, API modes, tools, callbacks, and fallback behavior"
---
# Agent Loop Internals
The core orchestration engine is `run_agent.py`'s `AIAgent`.
## Core responsibilities
`AIAgent` is responsible for:
- assembling the effective prompt and tool schemas
- selecting the correct provider/API mode
- making interruptible model calls
- executing tool calls (sequentially or concurrently)
- maintaining session history
- handling compression, retries, and fallback models
## API modes
Hermes currently supports three API execution modes:
| API mode | Used for |
|----------|----------|
| `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints |
| `codex_responses` | OpenAI Codex / Responses API path |
| `anthropic_messages` | Native Anthropic Messages API |
The mode is resolved from explicit args, provider selection, and base URL heuristics.
## Turn lifecycle
```text
run_conversation()
-> generate effective task_id
-> append current user message
-> load or build cached system prompt
-> maybe preflight-compress
-> build api_messages
-> inject ephemeral prompt layers
-> apply prompt caching if appropriate
-> make interruptible API call
-> if tool calls: execute them, append tool results, loop
-> if final text: persist, cleanup, return response
```
## Interruptible API calls
Hermes wraps API requests so they can be interrupted from the CLI or gateway.
This matters because:
- the agent may be in a long LLM call
- the user may send a new message mid-flight
- background systems may need cancellation semantics
## Tool execution modes
Hermes uses two execution strategies:
- sequential execution for single or interactive tools
- concurrent execution for multiple non-interactive tools
Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history.
## Callback surfaces
`AIAgent` supports platform/integration callbacks such as:
- `tool_progress_callback`
- `thinking_callback`
- `reasoning_callback`
- `clarify_callback`
- `step_callback`
- `stream_delta_callback`
- `tool_gen_callback`
- `status_callback`
These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows.
## Budget and fallback behavior
Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window.
Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths.
## Compression and persistence
Before and during long runs, Hermes may:
- flush memory before context loss
- compress middle conversation turns
- split the session lineage into a new session ID after compression
- preserve recent context and structural tool-call/result consistency
## Key files to read next
- `run_agent.py`
- `agent/prompt_builder.py`
- `agent/context_compressor.py`
- `agent/prompt_caching.py`
- `model_tools.py`
## Related docs
- [Provider Runtime Resolution](./provider-runtime.md)
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
- [Tools Runtime](./tools-runtime.md)

View file

@ -0,0 +1,152 @@
---
sidebar_position: 1
title: "Architecture"
description: "Hermes Agent internals — major subsystems, execution paths, and where to read next"
---
# Architecture
This page is the top-level map of Hermes Agent internals. The project has grown beyond a single monolithic loop, so the best way to understand it is by subsystem.
## High-level structure
```text
hermes-agent/
├── run_agent.py # AIAgent core loop
├── cli.py # interactive terminal UI
├── model_tools.py # tool discovery/orchestration
├── toolsets.py # tool groupings and presets
├── hermes_state.py # SQLite session/state database
├── batch_runner.py # batch trajectory generation
├── agent/ # prompt building, compression, caching, metadata, trajectories
├── hermes_cli/ # command entrypoints, auth, setup, models, config, doctor
├── tools/ # tool implementations and terminal environments
├── gateway/ # messaging gateway, session routing, delivery, pairing, hooks
├── cron/ # scheduled job storage and scheduler
├── honcho_integration/ # Honcho memory integration
├── acp_adapter/ # ACP editor integration server
├── acp_registry/ # ACP registry manifest + icon
├── environments/ # Hermes RL / benchmark environment framework
├── skills/ # bundled skills
├── optional-skills/ # official optional skills
└── tests/ # test suite
```
## Recommended reading order
If you are new to the codebase, read in this order:
1. this page
2. [Agent Loop Internals](./agent-loop.md)
3. [Prompt Assembly](./prompt-assembly.md)
4. [Provider Runtime Resolution](./provider-runtime.md)
5. [Adding Providers](./adding-providers.md)
6. [Tools Runtime](./tools-runtime.md)
7. [Session Storage](./session-storage.md)
8. [Gateway Internals](./gateway-internals.md)
9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
10. [ACP Internals](./acp-internals.md)
11. [Environments, Benchmarks & Data Generation](./environments.md)
## Major subsystems
### Agent loop
The core synchronous orchestration engine is `AIAgent` in `run_agent.py`.
It is responsible for:
- provider/API-mode selection
- prompt construction
- tool execution
- retries and fallback
- callbacks
- compression and persistence
See [Agent Loop Internals](./agent-loop.md).
### Prompt system
Prompt-building logic is split between:
- `run_agent.py`
- `agent/prompt_builder.py`
- `agent/prompt_caching.py`
- `agent/context_compressor.py`
See:
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
### Provider/runtime resolution
Hermes has a shared runtime provider resolver used by CLI, gateway, cron, ACP, and auxiliary calls.
See [Provider Runtime Resolution](./provider-runtime.md).
### Tooling runtime
The tool registry, toolsets, terminal backends, process manager, and dispatch rules form a subsystem of their own.
See [Tools Runtime](./tools-runtime.md).
### Session persistence
Historical session state is stored primarily in SQLite, with lineage preserved across compression splits.
See [Session Storage](./session-storage.md).
### Messaging gateway
The gateway is a long-running orchestration layer for platform adapters, session routing, pairing, delivery, and cron ticking.
See [Gateway Internals](./gateway-internals.md).
### ACP integration
ACP exposes Hermes as an editor-native agent over stdio/JSON-RPC.
See:
- [ACP Editor Integration](../user-guide/features/acp.md)
- [ACP Internals](./acp-internals.md)
### Cron
Cron jobs are implemented as first-class agent tasks, not just shell tasks.
See [Cron Internals](./cron-internals.md).
### RL / environments / trajectories
Hermes ships a full environment framework for evaluation, RL integration, and SFT data generation.
See:
- [Environments, Benchmarks & Data Generation](./environments.md)
- [Trajectories & Training Format](./trajectory-format.md)
## Design themes
Several cross-cutting design themes appear throughout the codebase:
- prompt stability matters
- tool execution must be observable and interruptible
- session persistence must survive long-running use
- platform frontends should share one agent core
- optional subsystems should remain loosely coupled where possible
## Implementation notes
The older mental model of Hermes as “one OpenAI-compatible chat loop plus some tools” is no longer sufficient. Current Hermes includes:
- multiple API modes
- auxiliary model routing
- ACP editor integration
- gateway-specific session and delivery semantics
- RL environment infrastructure
- prompt-caching and compression logic with lineage-aware persistence
Use this page as the map, then dive into subsystem-specific docs for the real implementation details.

View file

@ -0,0 +1,72 @@
---
sidebar_position: 6
title: "Context Compression & Prompt Caching"
description: "How Hermes compresses long conversations and applies provider-side prompt caching"
---
# Context Compression & Prompt Caching
Hermes manages long conversations with two complementary mechanisms:
- prompt caching
- context compression
Primary files:
- `agent/prompt_caching.py`
- `agent/context_compressor.py`
- `run_agent.py`
## Prompt caching
For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
Current strategy:
- cache the system prompt
- cache the last 3 non-system messages
- default TTL is 5 minutes unless explicitly extended
This is implemented in `agent/prompt_caching.py`.
## Why prompt stability matters
Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
## Compression trigger
Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
## Compression algorithm
The compressor protects:
- the first N turns
- the last N turns
and summarizes the middle section.
It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
## Pre-compression memory flush
Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.
## Session lineage after compression
Compression can split the session into a new session ID while preserving parent lineage in the state DB.
This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
## Re-injected state after compression
After compression, Hermes may re-inject compact operational state such as:
- todo snapshot
- prior-read-files summary
## Related docs
- [Prompt Assembly](./prompt-assembly.md)
- [Session Storage](./session-storage.md)
- [Agent Loop Internals](./agent-loop.md)

View file

@ -0,0 +1,232 @@
---
sidebar_position: 4
title: "Contributing"
description: "How to contribute to Hermes Agent — dev setup, code style, PR process"
---
# Contributing
Thank you for contributing to Hermes Agent! This guide covers setting up your dev environment, understanding the codebase, and getting your PR merged.
## Contribution Priorities
We value contributions in this order:
1. **Bug fixes** — crashes, incorrect behavior, data loss
2. **Cross-platform compatibility** — macOS, different Linux distros, WSL2
3. **Security hardening** — shell injection, prompt injection, path traversal
4. **Performance and robustness** — retry logic, error handling, graceful degradation
5. **New skills** — broadly useful ones (see [Creating Skills](creating-skills.md))
6. **New tools** — rarely needed; most capabilities should be skills
7. **Documentation** — fixes, clarifications, new examples
## Common contribution paths
- Building a new tool? Start with [Adding Tools](./adding-tools.md)
- Building a new skill? Start with [Creating Skills](./creating-skills.md)
- Building a new inference provider? Start with [Adding Providers](./adding-providers.md)
## Development Setup
### Prerequisites
| Requirement | Notes |
|-------------|-------|
| **Git** | With `--recurse-submodules` support |
| **Python 3.10+** | uv will install it if missing |
| **uv** | Fast Python package manager ([install](https://docs.astral.sh/uv/)) |
| **Node.js 18+** | Optional — needed for browser tools and WhatsApp bridge |
### Clone and Install
```bash
git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
# Create venv with Python 3.11
uv venv venv --python 3.11
export VIRTUAL_ENV="$(pwd)/venv"
# Install with all extras (messaging, cron, CLI menus, dev tools)
uv pip install -e ".[all,dev]"
uv pip install -e "./tinker-atropos"
# Optional: browser tools
npm install
```
### Configure for Development
```bash
mkdir -p ~/.hermes/{cron,sessions,logs,memories,skills}
cp cli-config.yaml.example ~/.hermes/config.yaml
touch ~/.hermes/.env
# Add at minimum an LLM provider key:
echo 'OPENROUTER_API_KEY=sk-or-v1-your-key' >> ~/.hermes/.env
```
### Run
```bash
# Symlink for global access
mkdir -p ~/.local/bin
ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
# Verify
hermes doctor
hermes chat -q "Hello"
```
### Run Tests
```bash
pytest tests/ -v
```
## Code Style
- **PEP 8** with practical exceptions (no strict line length enforcement)
- **Comments**: Only when explaining non-obvious intent, trade-offs, or API quirks
- **Error handling**: Catch specific exceptions. Use `logger.warning()`/`logger.error()` with `exc_info=True` for unexpected errors
- **Cross-platform**: Never assume Unix (see below)
## Cross-Platform Compatibility
Hermes officially supports Linux, macOS, and WSL2. Native Windows is **not supported**, but the codebase includes some defensive coding patterns to avoid hard crashes in edge cases. Key rules:
### 1. `termios` and `fcntl` are Unix-only
Always catch both `ImportError` and `NotImplementedError`:
```python
try:
from simple_term_menu import TerminalMenu
menu = TerminalMenu(options)
idx = menu.show()
except (ImportError, NotImplementedError):
# Fallback: numbered menu
for i, opt in enumerate(options):
print(f" {i+1}. {opt}")
idx = int(input("Choice: ")) - 1
```
### 2. File encoding
Some environments may save `.env` files in non-UTF-8 encodings:
```python
try:
load_dotenv(env_path)
except UnicodeDecodeError:
load_dotenv(env_path, encoding="latin-1")
```
### 3. Process management
`os.setsid()`, `os.killpg()`, and signal handling differ across platforms:
```python
import platform
if platform.system() != "Windows":
kwargs["preexec_fn"] = os.setsid
```
### 4. Path separators
Use `pathlib.Path` instead of string concatenation with `/`.
## Security Considerations
Hermes has terminal access. Security matters.
### Existing Protections
| Layer | Implementation |
|-------|---------------|
| **Sudo password piping** | Uses `shlex.quote()` to prevent shell injection |
| **Dangerous command detection** | Regex patterns in `tools/approval.py` with user approval flow |
| **Cron prompt injection** | Scanner blocks instruction-override patterns |
| **Write deny list** | Protected paths resolved via `os.path.realpath()` to prevent symlink bypass |
| **Skills guard** | Security scanner for hub-installed skills |
| **Code execution sandbox** | Child process runs with API keys stripped |
| **Container hardening** | Docker: all capabilities dropped, no privilege escalation, PID limits |
### Contributing Security-Sensitive Code
- Always use `shlex.quote()` when interpolating user input into shell commands
- Resolve symlinks with `os.path.realpath()` before access control checks
- Don't log secrets
- Catch broad exceptions around tool execution
- Test on all platforms if your change touches file paths or processes
## Pull Request Process
### Branch Naming
```
fix/description # Bug fixes
feat/description # New features
docs/description # Documentation
test/description # Tests
refactor/description # Code restructuring
```
### Before Submitting
1. **Run tests**: `pytest tests/ -v`
2. **Test manually**: Run `hermes` and exercise the code path you changed
3. **Check cross-platform impact**: Consider macOS and different Linux distros
4. **Keep PRs focused**: One logical change per PR
### PR Description
Include:
- **What** changed and **why**
- **How to test** it
- **What platforms** you tested on
- Reference any related issues
### Commit Messages
We use [Conventional Commits](https://www.conventionalcommits.org/):
```
<type>(<scope>): <description>
```
| Type | Use for |
|------|---------|
| `fix` | Bug fixes |
| `feat` | New features |
| `docs` | Documentation |
| `test` | Tests |
| `refactor` | Code restructuring |
| `chore` | Build, CI, dependency updates |
Scopes: `cli`, `gateway`, `tools`, `skills`, `agent`, `install`, `whatsapp`, `security`
Examples:
```
fix(cli): prevent crash in save_config_value when model is a string
feat(gateway): add WhatsApp multi-user session isolation
fix(security): prevent shell injection in sudo password piping
```
## Reporting Issues
- Use [GitHub Issues](https://github.com/NousResearch/hermes-agent/issues)
- Include: OS, Python version, Hermes version (`hermes version`), full error traceback
- Include steps to reproduce
- Check existing issues before creating duplicates
- For security vulnerabilities, please report privately
## Community
- **Discord**: [discord.gg/NousResearch](https://discord.gg/NousResearch)
- **GitHub Discussions**: For design proposals and architecture discussions
- **Skills Hub**: Upload specialized skills and share with the community
## License
By contributing, you agree that your contributions will be licensed under the [MIT License](https://github.com/NousResearch/hermes-agent/blob/main/LICENSE).

View file

@ -0,0 +1,247 @@
---
sidebar_position: 3
title: "Creating Skills"
description: "How to create skills for Hermes Agent — SKILL.md format, guidelines, and publishing"
---
# Creating Skills
Skills are the preferred way to add new capabilities to Hermes Agent. They're easier to create than tools, require no code changes to the agent, and can be shared with the community.
## Should it be a Skill or a Tool?
Make it a **Skill** when:
- The capability can be expressed as instructions + shell commands + existing tools
- It wraps an external CLI or API that the agent can call via `terminal` or `web_extract`
- It doesn't need custom Python integration or API key management baked into the agent
- Examples: arXiv search, git workflows, Docker management, PDF processing, email via CLI tools
Make it a **Tool** when:
- It requires end-to-end integration with API keys, auth flows, or multi-component configuration
- It needs custom processing logic that must execute precisely every time
- It handles binary data, streaming, or real-time events
- Examples: browser automation, TTS, vision analysis
## Skill Directory Structure
Bundled skills live in `skills/` organized by category. Official optional skills use the same structure in `optional-skills/`:
```text
skills/
├── research/
│ └── arxiv/
│ ├── SKILL.md # Required: main instructions
│ └── scripts/ # Optional: helper scripts
│ └── search_arxiv.py
├── productivity/
│ └── ocr-and-documents/
│ ├── SKILL.md
│ ├── scripts/
│ └── references/
└── ...
```
## SKILL.md Format
```markdown
---
name: my-skill
description: Brief description (shown in skill search results)
version: 1.0.0
author: Your Name
license: MIT
platforms: [macos, linux] # Optional — restrict to specific OS platforms
# Valid: macos, linux, windows
# Omit to load on all platforms (default)
metadata:
hermes:
tags: [Category, Subcategory, Keywords]
related_skills: [other-skill-name]
requires_toolsets: [web] # Optional — only show when these toolsets are active
requires_tools: [web_search] # Optional — only show when these tools are available
fallback_for_toolsets: [browser] # Optional — hide when these toolsets are active
fallback_for_tools: [browser_navigate] # Optional — hide when these tools exist
required_environment_variables: # Optional — env vars the skill needs
- name: MY_API_KEY
prompt: "Enter your API key"
help: "Get one at https://example.com"
required_for: "API access"
---
# Skill Title
Brief intro.
## When to Use
Trigger conditions — when should the agent load this skill?
## Quick Reference
Table of common commands or API calls.
## Procedure
Step-by-step instructions the agent follows.
## Pitfalls
Known failure modes and how to handle them.
## Verification
How the agent confirms it worked.
```
### Platform-Specific Skills
Skills can restrict themselves to specific operating systems using the `platforms` field:
```yaml
platforms: [macos] # macOS only (e.g., iMessage, Apple Reminders)
platforms: [macos, linux] # macOS and Linux
platforms: [windows] # Windows only
```
When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted or empty, the skill loads on all platforms (backward compatible).
### Conditional Skill Activation
Skills can declare dependencies on specific tools or toolsets. This controls whether the skill appears in the system prompt for a given session.
```yaml
metadata:
hermes:
requires_toolsets: [web] # Hide if the web toolset is NOT active
requires_tools: [web_search] # Hide if web_search tool is NOT available
fallback_for_toolsets: [browser] # Hide if the browser toolset IS active
fallback_for_tools: [browser_navigate] # Hide if browser_navigate IS available
```
| Field | Behavior |
|-------|----------|
| `requires_toolsets` | Skill is **hidden** when ANY listed toolset is **not** available |
| `requires_tools` | Skill is **hidden** when ANY listed tool is **not** available |
| `fallback_for_toolsets` | Skill is **hidden** when ANY listed toolset **is** available |
| `fallback_for_tools` | Skill is **hidden** when ANY listed tool **is** available |
**Use case for `fallback_for_*`:** Create a skill that serves as a workaround when a primary tool isn't available. For example, a `duckduckgo-search` skill with `fallback_for_tools: [web_search]` only shows when the web search tool (which requires an API key) is not configured.
**Use case for `requires_*`:** Create a skill that only makes sense when certain tools are present. For example, a web scraping workflow skill with `requires_toolsets: [web]` won't clutter the prompt when web tools are disabled.
### Environment Variable Requirements
Skills can declare environment variables they need. When a skill is loaded via `skill_view`, its required vars are automatically registered for passthrough into sandboxed execution environments (terminal, execute_code).
```yaml
required_environment_variables:
- name: TENOR_API_KEY
prompt: "Tenor API key" # Shown when prompting user
help: "Get your key at https://tenor.com" # Help text or URL
required_for: "GIF search functionality" # What needs this var
```
Each entry supports:
- `name` (required) — the environment variable name
- `prompt` (optional) — prompt text when asking the user for the value
- `help` (optional) — help text or URL for obtaining the value
- `required_for` (optional) — describes which feature needs this variable
Users can also manually configure passthrough variables in `config.yaml`:
```yaml
terminal:
env_passthrough:
- MY_CUSTOM_VAR
- ANOTHER_VAR
```
See `skills/apple/` for examples of macOS-only skills.
## Secure Setup on Load
Use `required_environment_variables` when a skill needs an API key or token. Missing values do **not** hide the skill from discovery. Instead, Hermes prompts for them securely when the skill is loaded in the local CLI.
```yaml
required_environment_variables:
- name: TENOR_API_KEY
prompt: Tenor API key
help: Get a key from https://developers.google.com/tenor
required_for: full functionality
```
The user can skip setup and keep loading the skill. Hermes never exposes the raw secret value to the model. Gateway and messaging sessions show local setup guidance instead of collecting secrets in-band.
:::tip Sandbox Passthrough
When your skill is loaded, any declared `required_environment_variables` that are set are **automatically passed through** to `execute_code` and `terminal` sandboxes. Your skill's scripts can access `$TENOR_API_KEY` (or `os.environ["TENOR_API_KEY"]` in Python) without the user needing to configure anything extra. See [Environment Variable Passthrough](/docs/user-guide/security#environment-variable-passthrough) for details.
:::
Legacy `prerequisites.env_vars` remains supported as a backward-compatible alias.
## Skill Guidelines
### No External Dependencies
Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`). If a dependency is needed, document installation steps in the skill.
### Progressive Disclosure
Put the most common workflow first. Edge cases and advanced usage go at the bottom. This keeps token usage low for common tasks.
### Include Helper Scripts
For XML/JSON parsing or complex logic, include helper scripts in `scripts/` — don't expect the LLM to write parsers inline every time.
### Test It
Run the skill and verify the agent follows the instructions correctly:
```bash
hermes chat --toolsets skills -q "Use the X skill to do Y"
```
## Where Should the Skill Live?
Bundled skills (in `skills/`) ship with every Hermes install. They should be **broadly useful to most users**:
- Document handling, web research, common dev workflows, system administration
- Used regularly by a wide range of people
If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo, is discoverable via `hermes skills browse` (labeled "official"), and installs with builtin trust.
If your skill is specialized, community-contributed, or niche, it's better suited for a **Skills Hub** — upload it to a registry and share it via `hermes skills install`.
## Publishing Skills
### To the Skills Hub
```bash
hermes skills publish skills/my-skill --to github --repo owner/repo
```
### To a Custom Repository
Add your repo as a tap:
```bash
hermes skills tap add owner/repo
```
Users can then search and install from your repository.
## Security Scanning
All hub-installed skills go through a security scanner that checks for:
- Data exfiltration patterns
- Prompt injection attempts
- Destructive commands
- Shell injection
Trust levels:
- `builtin` — ships with Hermes (always trusted)
- `official` — from `optional-skills/` in the repo (builtin trust, no third-party warning)
- `trusted` — from openai/skills, anthropics/skills
- `community` — non-dangerous findings can be overridden with `--force`; `dangerous` verdicts remain blocked
Hermes can now consume third-party skills from multiple external discovery models:
- direct GitHub identifiers (for example `openai/skills/k8s`)
- `skills.sh` identifiers (for example `skills-sh/vercel-labs/json-render/json-render-react`)
- well-known endpoints served from `/.well-known/skills/index.json`
If you want your skills to be discoverable without a GitHub-specific installer, consider serving them from a well-known endpoint in addition to publishing them in a repo or marketplace.

View file

@ -0,0 +1,90 @@
---
sidebar_position: 11
title: "Cron Internals"
description: "How Hermes stores, schedules, edits, pauses, skill-loads, and delivers cron jobs"
---
# Cron Internals
Hermes cron support is implemented primarily in:
- `cron/jobs.py`
- `cron/scheduler.py`
- `tools/cronjob_tools.py`
- `gateway/run.py`
- `hermes_cli/cron.py`
## Scheduling model
Hermes supports:
- one-shot delays
- intervals
- cron expressions
- explicit timestamps
The model-facing surface is a single `cronjob` tool with action-style operations:
- `create`
- `list`
- `update`
- `pause`
- `resume`
- `run`
- `remove`
## Job storage
Cron jobs are stored in Hermes-managed local state (`~/.hermes/cron/jobs.json`) with atomic write semantics.
Each job can carry:
- prompt
- schedule metadata
- repeat counters
- delivery target
- lifecycle state (`scheduled`, `paused`, `completed`, etc.)
- zero, one, or multiple attached skills
Backward compatibility is preserved for older jobs that only stored a legacy single `skill` field or none of the newer lifecycle fields.
## Runtime behavior
The scheduler:
- loads jobs
- computes due work
- executes jobs in fresh agent sessions
- optionally injects one or more skills before the prompt
- handles repeat counters
- updates next-run metadata and state
In gateway mode, cron ticking is integrated into the long-running gateway loop.
## Skill-backed jobs
A cron job may attach multiple skills. At runtime, Hermes loads those skills in order and then appends the job prompt as the task instruction.
This gives scheduled jobs reusable guidance without requiring the user to paste full skill bodies into the cron prompt.
## Recursion guard
Cron-run sessions disable the `cronjob` toolset. This prevents a scheduled job from recursively creating or mutating more cron jobs and accidentally exploding token usage or scheduler load.
## Delivery model
Cron jobs can deliver to:
- origin chat
- local files
- platform home channels
- explicit platform/chat IDs
## Locking
Hermes uses lock-based protections so overlapping scheduler ticks do not execute the same due-job batch twice.
## Related docs
- [Cron feature guide](../user-guide/features/cron.md)
- [Gateway Internals](./gateway-internals.md)

View file

@ -0,0 +1,520 @@
---
sidebar_position: 5
title: "Environments, Benchmarks & Data Generation"
description: "Building RL training environments, running evaluation benchmarks, and generating SFT data with the Hermes-Agent Atropos integration"
---
# Environments, Benchmarks & Data Generation
Hermes Agent includes a full environment framework that connects its tool-calling capabilities to the [Atropos](https://github.com/NousResearch/atropos) RL training framework. This enables three workflows:
1. **RL Training** — Train language models on multi-turn agentic tasks with GRPO
2. **Benchmarks** — Evaluate models on standardised agentic benchmarks
3. **Data Generation** — Generate SFT training data from agent rollouts
All three share the same core: an **environment** class that defines tasks, runs an agent loop, and scores the output.
:::info Repo environments vs RL training tools
The Python environment framework documented here lives under the repo's `environments/` directory and is the implementation-level API for Hermes/Atropos integration. This is separate from the user-facing `rl_*` tools, which operate as an orchestration surface for remote RL training workflows.
:::
:::tip Quick Links
- **Want to run benchmarks?** Jump to [Available Benchmarks](#available-benchmarks)
- **Want to train with RL?** See [RL Training Tools](/user-guide/features/rl-training) for the agent-driven interface, or [Running Environments](#running-environments) for manual execution
- **Want to create a new environment?** See [Creating Environments](#creating-environments)
:::
## Architecture
The environment system is built on a three-layer inheritance chain:
```mermaid
classDiagram
class BaseEnv {
Server management
Worker scheduling
Wandb logging
CLI: serve / process / evaluate
}
class HermesAgentBaseEnv {
Terminal backend configuration
Tool resolution
Agent loop engine
ToolContext access
}
class TerminalTestEnv {
Stack testing
}
class HermesSweEnv {
SWE training
}
class TerminalBench2EvalEnv {
Benchmark evaluation
}
class TBLiteEvalEnv {
Fast benchmark
}
class YCBenchEvalEnv {
Long-horizon benchmark
}
BaseEnv <|-- HermesAgentBaseEnv
HermesAgentBaseEnv <|-- TerminalTestEnv
HermesAgentBaseEnv <|-- HermesSweEnv
HermesAgentBaseEnv <|-- TerminalBench2EvalEnv
TerminalBench2EvalEnv <|-- TBLiteEvalEnv
TerminalBench2EvalEnv <|-- YCBenchEvalEnv
```
### BaseEnv (Atropos)
The foundation from `atroposlib`. Provides:
- **Server management** — connects to OpenAI-compatible APIs (VLLM, SGLang, OpenRouter)
- **Worker scheduling** — parallel rollout coordination
- **Wandb integration** — metrics logging and rollout visualisation
- **CLI interface** — three subcommands: `serve`, `process`, `evaluate`
- **Eval logging**`evaluate_log()` saves results to JSON + JSONL
### HermesAgentBaseEnv
The hermes-agent layer (`environments/hermes_base_env.py`). Adds:
- **Terminal backend configuration** — sets `TERMINAL_ENV` for sandboxed execution (local, Docker, Modal, Daytona, SSH, Singularity)
- **Tool resolution**`_resolve_tools_for_group()` calls hermes-agent's `get_tool_definitions()` to get the right tool schemas based on enabled/disabled toolsets
- **Agent loop integration**`collect_trajectory()` runs `HermesAgentLoop` and scores the result
- **Two-phase operation** — Phase 1 (OpenAI server) for eval/SFT, Phase 2 (VLLM ManagedServer) for full RL with logprobs
- **Async safety patches** — monkey-patches Modal backend to work inside Atropos's event loop
### Concrete Environments
Your environment inherits from `HermesAgentBaseEnv` and implements five methods:
| Method | Purpose |
|--------|---------|
| `setup()` | Load dataset, initialise state |
| `get_next_item()` | Return the next item for rollout |
| `format_prompt(item)` | Convert an item into the user message |
| `compute_reward(item, result, ctx)` | Score the rollout (0.01.0) |
| `evaluate()` | Periodic evaluation logic |
## Core Components
### Agent Loop
`HermesAgentLoop` (`environments/agent_loop.py`) is the reusable multi-turn agent engine. It runs the same tool-calling pattern as hermes-agent's main loop:
1. Send messages + tool schemas to the API via `server.chat_completion()`
2. If the response contains `tool_calls`, dispatch each via `handle_function_call()`
3. Append tool results to the conversation, go back to step 1
4. If no `tool_calls`, the agent is done
Tool calls execute in a thread pool (`ThreadPoolExecutor(128)`) so that async backends (Modal, Docker) don't deadlock inside Atropos's event loop.
Returns an `AgentResult`:
```python
@dataclass
class AgentResult:
messages: List[Dict[str, Any]] # Full conversation history
turns_used: int # Number of LLM calls made
finished_naturally: bool # True if model stopped on its own
reasoning_per_turn: List[Optional[str]] # Extracted reasoning content
tool_errors: List[ToolError] # Errors encountered during tool dispatch
managed_state: Optional[Dict] # VLLM ManagedServer state (Phase 2)
```
### Tool Context
`ToolContext` (`environments/tool_context.py`) gives reward functions direct access to the **same sandbox** the model used during its rollout. The `task_id` scoping means all state (files, processes, browser tabs) is preserved.
```python
async def compute_reward(self, item, result, ctx: ToolContext):
# Run tests in the model's terminal sandbox
test = ctx.terminal("pytest -v")
if test["exit_code"] == 0:
return 1.0
# Check if a file was created
content = ctx.read_file("/workspace/solution.py")
if content.get("content"):
return 0.5
# Download files for local verification
ctx.download_file("/remote/output.bin", "/local/output.bin")
return 0.0
```
Available methods:
| Category | Methods |
|----------|---------|
| **Terminal** | `terminal(command, timeout)` |
| **Files** | `read_file(path)`, `write_file(path, content)`, `search(query, path)` |
| **Transfers** | `upload_file()`, `upload_dir()`, `download_file()`, `download_dir()` |
| **Web** | `web_search(query)`, `web_extract(urls)` |
| **Browser** | `browser_navigate(url)`, `browser_snapshot()` |
| **Generic** | `call_tool(name, args)` — escape hatch for any hermes-agent tool |
| **Cleanup** | `cleanup()` — release all resources |
### Tool Call Parsers
For **Phase 2** (VLLM ManagedServer), the server returns raw text without structured tool calls. Client-side parsers in `environments/tool_call_parsers/` extract `tool_calls` from raw output:
```python
from environments.tool_call_parsers import get_parser
parser = get_parser("hermes") # or "mistral", "llama3_json", "qwen", "deepseek_v3", etc.
content, tool_calls = parser.parse(raw_model_output)
```
Available parsers: `hermes`, `mistral`, `llama3_json`, `qwen`, `qwen3_coder`, `deepseek_v3`, `deepseek_v3_1`, `kimi_k2`, `longcat`, `glm45`, `glm47`.
In Phase 1 (OpenAI server type), parsers are not needed — the server handles tool call parsing natively.
## Available Benchmarks
### TerminalBench2
**89 challenging terminal tasks** with per-task Docker sandbox environments.
| | |
|---|---|
| **What it tests** | Single-task coding/sysadmin ability |
| **Scoring** | Binary pass/fail (test suite verification) |
| **Sandbox** | Modal cloud sandboxes (per-task Docker images) |
| **Tools** | `terminal` + `file` |
| **Tasks** | 89 tasks across multiple categories |
| **Cost** | ~$50200 for full eval (parallel execution) |
| **Time** | ~24 hours |
```bash
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--config environments/benchmarks/terminalbench_2/default.yaml
# Run specific tasks
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--config environments/benchmarks/terminalbench_2/default.yaml \
--env.task_filter fix-git,git-multibranch
```
Dataset: [NousResearch/terminal-bench-2](https://huggingface.co/datasets/NousResearch/terminal-bench-2) on HuggingFace.
### TBLite (OpenThoughts Terminal Bench Lite)
**100 difficulty-calibrated tasks** — a faster proxy for TerminalBench2.
| | |
|---|---|
| **What it tests** | Same as TB2 (coding/sysadmin), calibrated difficulty tiers |
| **Scoring** | Binary pass/fail |
| **Sandbox** | Modal cloud sandboxes |
| **Tools** | `terminal` + `file` |
| **Tasks** | 100 tasks: Easy (40), Medium (26), Hard (26), Extreme (8) |
| **Correlation** | r=0.911 with full TB2 |
| **Speed** | 2.68× faster than TB2 |
```bash
python environments/benchmarks/tblite/tblite_env.py evaluate \
--config environments/benchmarks/tblite/default.yaml
```
TBLite is a thin subclass of TerminalBench2 — only the dataset and timeouts differ. Created by the OpenThoughts Agent team (Snorkel AI + Bespoke Labs). Dataset: [NousResearch/openthoughts-tblite](https://huggingface.co/datasets/NousResearch/openthoughts-tblite).
### YC-Bench
**Long-horizon strategic benchmark** — the agent plays CEO of an AI startup.
| | |
|---|---|
| **What it tests** | Multi-turn strategic coherence over hundreds of turns |
| **Scoring** | Composite: `0.5 × survival + 0.5 × normalised_funds` |
| **Sandbox** | Local terminal (no Modal needed) |
| **Tools** | `terminal` only |
| **Runs** | 9 default (3 presets × 3 seeds), sequential |
| **Cost** | ~$50200 for full eval |
| **Time** | ~36 hours |
```bash
# Install yc-bench (optional dependency)
pip install "hermes-agent[yc-bench]"
# Run evaluation
bash environments/benchmarks/yc_bench/run_eval.sh
# Or directly
python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
--config environments/benchmarks/yc_bench/default.yaml
# Quick single-preset test
python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
--config environments/benchmarks/yc_bench/default.yaml \
--env.presets '["fast_test"]' --env.seeds '[1]'
```
YC-Bench uses [collinear-ai/yc-bench](https://github.com/collinear-ai/yc-bench) — a deterministic simulation with 4 skill domains (research, inference, data_environment, training), prestige system, employee management, and financial pressure. Unlike TB2's per-task binary scoring, YC-Bench measures whether an agent can maintain coherent strategy over hundreds of compounding decisions.
## Training Environments
### TerminalTestEnv
A minimal self-contained environment with inline tasks (no external dataset). Used for **validating the full stack** end-to-end. Each task asks the model to create a file at a known path; the verifier checks the content.
```bash
# Process mode (saves rollouts to JSONL, no training server needed)
python environments/terminal_test_env/terminal_test_env.py process \
--env.data_path_to_save_groups terminal_test_output.jsonl
# Serve mode (connects to Atropos API for RL training)
python environments/terminal_test_env/terminal_test_env.py serve
```
### HermesSweEnv
SWE-bench style training environment. The model gets a coding task, uses terminal + file + web tools to solve it, and the reward function runs tests in the same Modal sandbox.
```bash
python environments/hermes_swe_env/hermes_swe_env.py serve \
--openai.model_name YourModel \
--env.dataset_name bigcode/humanevalpack \
--env.terminal_backend modal
```
## Running Environments
Every environment is a standalone Python script with three CLI subcommands:
### `evaluate` — Run a benchmark
For eval-only environments (benchmarks). Runs all items, computes metrics, logs to wandb.
```bash
python environments/benchmarks/tblite/tblite_env.py evaluate \
--config environments/benchmarks/tblite/default.yaml \
--openai.model_name anthropic/claude-sonnet-4.6
```
No training server or `run-api` needed. The environment handles everything.
### `process` — Generate SFT data
Runs rollouts and saves scored trajectories to JSONL. Useful for generating training data without a full RL loop.
```bash
python environments/terminal_test_env/terminal_test_env.py process \
--env.data_path_to_save_groups output.jsonl \
--openai.model_name anthropic/claude-sonnet-4.6
```
Output format: each line is a scored trajectory with the full conversation history, reward, and metadata.
### `serve` — Connect to Atropos for RL training
Connects the environment to a running Atropos API server (`run-api`). Used during live RL training.
```bash
# Terminal 1: Start the Atropos API
run-api
# Terminal 2: Start the environment
python environments/hermes_swe_env/hermes_swe_env.py serve \
--openai.model_name YourModel
```
The environment receives items from Atropos, runs agent rollouts, computes rewards, and sends scored trajectories back for training.
## Two-Phase Operation
### Phase 1: OpenAI Server (Eval / SFT)
Uses `server.chat_completion()` with `tools=` parameter. The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing natively. Returns `ChatCompletion` objects with structured `tool_calls`.
- **Use for**: evaluation, SFT data generation, benchmarks, testing
- **Placeholder tokens** are created for the Atropos pipeline (since real token IDs aren't available from the OpenAI API)
### Phase 2: VLLM ManagedServer (Full RL)
Uses ManagedServer for exact token IDs + logprobs via `/generate`. A client-side [tool call parser](#tool-call-parsers) reconstructs structured `tool_calls` from raw output.
- **Use for**: full RL training with GRPO/PPO
- **Real tokens**, masks, and logprobs flow through the pipeline
- Set `tool_call_parser` in config to match your model's format (e.g., `"hermes"`, `"qwen"`, `"mistral"`)
## Creating Environments
### Training Environment
```python
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
from atroposlib.envs.server_handling.server_manager import APIServerConfig
class MyEnvConfig(HermesAgentEnvConfig):
my_custom_field: str = "default_value"
class MyEnv(HermesAgentBaseEnv):
name = "my-env"
env_config_cls = MyEnvConfig
@classmethod
def config_init(cls):
env_config = MyEnvConfig(
enabled_toolsets=["terminal", "file"],
terminal_backend="modal",
max_agent_turns=30,
)
server_configs = [APIServerConfig(
base_url="https://openrouter.ai/api/v1",
model_name="anthropic/claude-sonnet-4.6",
server_type="openai",
)]
return env_config, server_configs
async def setup(self):
from datasets import load_dataset
self.dataset = list(load_dataset("my-dataset", split="train"))
self.iter = 0
async def get_next_item(self):
item = self.dataset[self.iter % len(self.dataset)]
self.iter += 1
return item
def format_prompt(self, item):
return item["instruction"]
async def compute_reward(self, item, result, ctx):
# ctx gives full tool access to the rollout's sandbox
test = ctx.terminal("pytest -v")
return 1.0 if test["exit_code"] == 0 else 0.0
async def evaluate(self, *args, **kwargs):
# Periodic evaluation during training
pass
if __name__ == "__main__":
MyEnv.cli()
```
### Eval-Only Benchmark
For benchmarks, follow the pattern used by TerminalBench2, TBLite, and YC-Bench:
1. **Create under** `environments/benchmarks/your-benchmark/`
2. **Set eval-only config**: `eval_handling=STOP_TRAIN`, `steps_per_eval=1`, `total_steps=1`
3. **Stub training methods**: `collect_trajectories()` returns `(None, [])`, `score()` returns `None`
4. **Implement** `rollout_and_score_eval(eval_item)` — the per-item agent loop + scoring
5. **Implement** `evaluate()` — orchestrates all runs, computes aggregate metrics
6. **Add streaming JSONL** for crash-safe result persistence
7. **Add cleanup**: `KeyboardInterrupt` handling, `cleanup_all_environments()`, `_tool_executor.shutdown()`
8. **Run with** `evaluate` subcommand
See `environments/benchmarks/yc_bench/yc_bench_env.py` for a clean, well-documented reference implementation.
## Configuration Reference
### HermesAgentEnvConfig Fields
| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `enabled_toolsets` | `List[str]` | `None` (all) | Which hermes toolsets to enable |
| `disabled_toolsets` | `List[str]` | `None` | Toolsets to filter out |
| `distribution` | `str` | `None` | Probabilistic toolset distribution name |
| `max_agent_turns` | `int` | `30` | Max LLM calls per rollout |
| `agent_temperature` | `float` | `1.0` | Sampling temperature |
| `system_prompt` | `str` | `None` | System message for the agent |
| `terminal_backend` | `str` | `"local"` | `local`, `docker`, `modal`, `daytona`, `ssh`, `singularity` |
| `terminal_timeout` | `int` | `120` | Seconds per terminal command |
| `terminal_lifetime` | `int` | `3600` | Max sandbox lifetime |
| `dataset_name` | `str` | `None` | HuggingFace dataset identifier |
| `tool_pool_size` | `int` | `128` | Thread pool size for tool execution |
| `tool_call_parser` | `str` | `"hermes"` | Parser for Phase 2 raw output |
| `extra_body` | `Dict` | `None` | Extra params for OpenAI API (e.g., OpenRouter provider prefs) |
| `eval_handling` | `Enum` | `STOP_TRAIN` | `STOP_TRAIN`, `LIMIT_TRAIN`, `NONE` |
### YAML Configuration
Environments can be configured via YAML files passed with `--config`:
```yaml
env:
enabled_toolsets: ["terminal", "file"]
max_agent_turns: 60
max_token_length: 32000
agent_temperature: 0.8
terminal_backend: "modal"
terminal_timeout: 300
dataset_name: "NousResearch/terminal-bench-2"
tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
use_wandb: true
wandb_name: "my-benchmark"
openai:
base_url: "https://openrouter.ai/api/v1"
model_name: "anthropic/claude-sonnet-4.6"
server_type: "openai"
health_check: false
```
YAML values override `config_init()` defaults. CLI arguments override YAML values:
```bash
python my_env.py evaluate \
--config my_config.yaml \
--openai.model_name anthropic/claude-opus-4.6 # overrides YAML
```
## Prerequisites
### For all environments
- Python >= 3.11
- `atroposlib`: `pip install git+https://github.com/NousResearch/atropos.git`
- An LLM API key (OpenRouter, OpenAI, or self-hosted VLLM/SGLang)
### For Modal-sandboxed benchmarks (TB2, TBLite)
- [Modal](https://modal.com) account and CLI: `pip install "hermes-agent[modal]"`
- `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` environment variables
### For YC-Bench
- `pip install "hermes-agent[yc-bench]"` (installs the yc-bench CLI + SQLAlchemy)
- No Modal needed — runs with local terminal backend
### For RL training
- `TINKER_API_KEY` — API key for the [Tinker](https://tinker.computer) training service
- `WANDB_API_KEY` — for Weights & Biases metrics tracking
- The `tinker-atropos` submodule (at `tinker-atropos/` in the repo)
See [RL Training](/user-guide/features/rl-training) for the agent-driven RL workflow.
## Directory Structure
```
environments/
├── hermes_base_env.py # Abstract base class (HermesAgentBaseEnv)
├── agent_loop.py # Multi-turn agent engine (HermesAgentLoop)
├── tool_context.py # Per-rollout tool access for reward functions
├── patches.py # Async-safety patches for Modal backend
├── tool_call_parsers/ # Phase 2 client-side parsers
│ ├── hermes_parser.py # Hermes/ChatML <tool_call> format
│ ├── mistral_parser.py # Mistral [TOOL_CALLS] format
│ ├── llama_parser.py # Llama 3 JSON tool calling
│ ├── qwen_parser.py # Qwen format
│ ├── deepseek_v3_parser.py # DeepSeek V3 format
│ └── ... # + kimi_k2, longcat, glm45/47, etc.
├── terminal_test_env/ # Stack validation (inline tasks)
├── hermes_swe_env/ # SWE-bench training environment
└── benchmarks/ # Evaluation benchmarks
├── terminalbench_2/ # 89 terminal tasks, Modal sandboxes
├── tblite/ # 100 calibrated tasks (fast TB2 proxy)
└── yc_bench/ # Long-horizon strategic benchmark
```

View file

@ -0,0 +1,190 @@
---
sidebar_position: 8
title: "Extending the CLI"
description: "Build wrapper CLIs that extend the Hermes TUI with custom widgets, keybindings, and layout changes"
---
# Extending the CLI
Hermes exposes protected extension hooks on `HermesCLI` so wrapper CLIs can add widgets, keybindings, and layout customizations without overriding the 1000+ line `run()` method. This keeps your extension decoupled from internal changes.
## Extension points
There are five extension seams available:
| Hook | Purpose | Override when... |
|------|---------|------------------|
| `_get_extra_tui_widgets()` | Inject widgets into the layout | You need a persistent UI element (panel, status line, mini-player) |
| `_register_extra_tui_keybindings(kb, *, input_area)` | Add keyboard shortcuts | You need hotkeys (toggle panels, transport controls, modal shortcuts) |
| `_build_tui_layout_children(**widgets)` | Full control over widget ordering | You need to reorder or wrap existing widgets (rare) |
| `process_command()` | Add custom slash commands | You need `/mycommand` handling (pre-existing hook) |
| `_build_tui_style_dict()` | Custom prompt_toolkit styles | You need custom colors or styling (pre-existing hook) |
The first three are new protected hooks. The last two already existed.
## Quick start: a wrapper CLI
```python
#!/usr/bin/env python3
"""my_cli.py — Example wrapper CLI that extends Hermes."""
from cli import HermesCLI
from prompt_toolkit.layout import FormattedTextControl, Window
from prompt_toolkit.filters import Condition
class MyCLI(HermesCLI):
def __init__(self, **kwargs):
super().__init__(**kwargs)
self._panel_visible = False
def _get_extra_tui_widgets(self):
"""Add a toggleable info panel above the status bar."""
cli_ref = self
return [
Window(
FormattedTextControl(lambda: "📊 My custom panel content"),
height=1,
filter=Condition(lambda: cli_ref._panel_visible),
),
]
def _register_extra_tui_keybindings(self, kb, *, input_area):
"""F2 toggles the custom panel."""
cli_ref = self
@kb.add("f2")
def _toggle_panel(event):
cli_ref._panel_visible = not cli_ref._panel_visible
def process_command(self, cmd: str) -> bool:
"""Add a /panel slash command."""
if cmd.strip().lower() == "/panel":
self._panel_visible = not self._panel_visible
state = "visible" if self._panel_visible else "hidden"
print(f"Panel is now {state}")
return True
return super().process_command(cmd)
if __name__ == "__main__":
cli = MyCLI()
cli.run()
```
Run it:
```bash
cd ~/.hermes/hermes-agent
source .venv/bin/activate
python my_cli.py
```
## Hook reference
### `_get_extra_tui_widgets()`
Returns a list of prompt_toolkit widgets to insert into the TUI layout. Widgets appear **between the spacer and the status bar** — above the input area but below the main output.
```python
def _get_extra_tui_widgets(self) -> list:
return [] # default: no extra widgets
```
Each widget should be a prompt_toolkit container (e.g., `Window`, `ConditionalContainer`, `HSplit`). Use `ConditionalContainer` or `filter=Condition(...)` to make widgets toggleable.
```python
from prompt_toolkit.layout import ConditionalContainer, Window, FormattedTextControl
from prompt_toolkit.filters import Condition
def _get_extra_tui_widgets(self):
return [
ConditionalContainer(
Window(FormattedTextControl("Status: connected"), height=1),
filter=Condition(lambda: self._show_status),
),
]
```
### `_register_extra_tui_keybindings(kb, *, input_area)`
Called after Hermes registers its own keybindings and before the layout is built. Add your keybindings to `kb`.
```python
def _register_extra_tui_keybindings(self, kb, *, input_area):
pass # default: no extra keybindings
```
Parameters:
- **`kb`** — The `KeyBindings` instance for the prompt_toolkit application
- **`input_area`** — The main `TextArea` widget, if you need to read or manipulate user input
```python
def _register_extra_tui_keybindings(self, kb, *, input_area):
cli_ref = self
@kb.add("f3")
def _clear_input(event):
input_area.text = ""
@kb.add("f4")
def _insert_template(event):
input_area.text = "/search "
```
**Avoid conflicts** with built-in keybindings: `Enter` (submit), `Escape Enter` (newline), `Ctrl-C` (interrupt), `Ctrl-D` (exit), `Tab` (auto-suggest accept). Function keys F2+ and Ctrl-combinations are generally safe.
### `_build_tui_layout_children(**widgets)`
Override this only when you need full control over widget ordering. Most extensions should use `_get_extra_tui_widgets()` instead.
```python
def _build_tui_layout_children(self, *, sudo_widget, secret_widget,
approval_widget, clarify_widget, spinner_widget, spacer,
status_bar, input_rule_top, image_bar, input_area,
input_rule_bot, voice_status_bar, completions_menu) -> list:
```
The default implementation returns:
```python
[
Window(height=0), # anchor
sudo_widget, # sudo password prompt (conditional)
secret_widget, # secret input prompt (conditional)
approval_widget, # dangerous command approval (conditional)
clarify_widget, # clarify question UI (conditional)
spinner_widget, # thinking spinner (conditional)
spacer, # fills remaining vertical space
*self._get_extra_tui_widgets(), # YOUR WIDGETS GO HERE
status_bar, # model/token/context status line
input_rule_top, # ─── border above input
image_bar, # attached images indicator
input_area, # user text input
input_rule_bot, # ─── border below input
voice_status_bar, # voice mode status (conditional)
completions_menu, # autocomplete dropdown
]
```
## Layout diagram
The default layout from top to bottom:
1. **Output area** — scrolling conversation history
2. **Spacer**
3. **Extra widgets** — from `_get_extra_tui_widgets()`
4. **Status bar** — model, context %, elapsed time
5. **Image bar** — attached image count
6. **Input area** — user prompt
7. **Voice status** — recording indicator
8. **Completions menu** — autocomplete suggestions
## Tips
- **Invalidate the display** after state changes: call `self._invalidate()` to trigger a prompt_toolkit redraw.
- **Access agent state**: `self.agent`, `self.model`, `self.conversation_history` are all available.
- **Custom styles**: Override `_build_tui_style_dict()` and add entries for your custom style classes.
- **Slash commands**: Override `process_command()`, handle your commands, and call `super().process_command(cmd)` for everything else.
- **Don't override `run()`** unless absolutely necessary — the extension hooks exist specifically to avoid that coupling.

View file

@ -0,0 +1,121 @@
---
sidebar_position: 7
title: "Gateway Internals"
description: "How the messaging gateway boots, authorizes users, routes sessions, and delivers messages"
---
# Gateway Internals
The messaging gateway is the long-running process that connects Hermes to external platforms.
Key files:
- `gateway/run.py`
- `gateway/config.py`
- `gateway/session.py`
- `gateway/delivery.py`
- `gateway/pairing.py`
- `gateway/channel_directory.py`
- `gateway/hooks.py`
- `gateway/mirror.py`
- `gateway/platforms/*`
## Core responsibilities
The gateway process is responsible for:
- loading configuration from `.env`, `config.yaml`, and `gateway.json`
- starting platform adapters
- authorizing users
- routing incoming events to sessions
- maintaining per-chat session continuity
- dispatching messages to `AIAgent`
- running cron ticks and background maintenance tasks
- mirroring/proactively delivering output to configured channels
## Config sources
The gateway has a multi-source config model:
- environment variables
- `~/.hermes/gateway.json`
- selected bridged values from `~/.hermes/config.yaml`
## Session routing
`gateway/session.py` and `GatewayRunner` cooperate to map incoming messages to active session IDs.
Session keying can depend on:
- platform
- user/chat identity
- thread/topic identity
- special platform-specific routing behavior
## Authorization layers
The gateway can authorize through:
- platform allowlists
- gateway-wide allowlists
- DM pairing flows
- explicit allow-all settings
Pairing support is implemented in `gateway/pairing.py`.
## Delivery path
Outgoing deliveries are handled by `gateway/delivery.py`, which knows how to:
- deliver to a home channel
- resolve explicit targets
- mirror some remote deliveries back into local history/session tracking
## Hooks
Gateway events emit hook callbacks through `gateway/hooks.py`. Hooks are local trusted Python code and can observe or extend gateway lifecycle events.
## Background maintenance
The gateway also runs maintenance tasks such as:
- cron ticking
- cache refreshes
- session expiry checks
- proactive memory flush before reset/expiry
## Honcho interaction
When Honcho is enabled, the gateway keeps persistent Honcho managers aligned with session lifetimes and platform-specific session keys.
### Session routing
Honcho tools (`honcho_profile`, `honcho_search`, `honcho_context`, `honcho_conclude`) need to execute against the correct user's Honcho session. In a multi-user gateway, the process-global module state in `tools/honcho_tools.py` is insufficient — multiple sessions may be active concurrently.
The solution threads session context through the call chain:
```
AIAgent._invoke_tool()
→ handle_function_call(honcho_manager=..., honcho_session_key=...)
→ registry.dispatch(**kwargs)
→ _handle_honcho_*(args, **kw)
→ _resolve_session_context(**kw) # prefers explicit kwargs over module globals
```
`_resolve_session_context()` in `honcho_tools.py` checks for `honcho_manager` and `honcho_session_key` in the kwargs first, falling back to the module-global `_session_manager` / `_session_key` for CLI mode where there's only one session.
### Memory flush lifecycle
When a session is reset, resumed, or expires, the gateway flushes memories before discarding context. The flush creates a temporary `AIAgent` with:
- `session_id` set to the old session's ID (so transcripts load correctly)
- `honcho_session_key` set to the gateway session key (so Honcho writes go to the right place)
- `sync_honcho=False` passed to `run_conversation()` (so the synthetic flush turn doesn't write back to Honcho's conversation history)
After the flush completes, any queued Honcho writes are drained and the gateway-level Honcho manager is shut down for that session key.
## Related docs
- [Session Storage](./session-storage.md)
- [Cron Internals](./cron-internals.md)
- [ACP Internals](./acp-internals.md)

View file

@ -0,0 +1,89 @@
---
sidebar_position: 5
title: "Prompt Assembly"
description: "How Hermes builds the system prompt, preserves cache stability, and injects ephemeral layers"
---
# Prompt Assembly
Hermes deliberately separates:
- **cached system prompt state**
- **ephemeral API-call-time additions**
This is one of the most important design choices in the project because it affects:
- token usage
- prompt caching effectiveness
- session continuity
- memory correctness
Primary files:
- `run_agent.py`
- `agent/prompt_builder.py`
- `tools/memory_tool.py`
## Cached system prompt layers
The cached system prompt is assembled in roughly this order:
1. agent identity — `SOUL.md` from `HERMES_HOME` when available, otherwise falls back to `DEFAULT_AGENT_IDENTITY` in `prompt_builder.py`
2. tool-aware behavior guidance
3. Honcho static block (when active)
4. optional system message
5. frozen MEMORY snapshot
6. frozen USER profile snapshot
7. skills index
8. context files (`AGENTS.md`, `.cursorrules`, `.cursor/rules/*.mdc`) — SOUL.md is **not** included here when it was already loaded as the identity in step 1
9. timestamp / optional session ID
10. platform hint
When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.
## API-call-time-only layers
These are intentionally *not* persisted as part of the cached system prompt:
- `ephemeral_system_prompt`
- prefill messages
- gateway-derived session context overlays
- later-turn Honcho recall injected into the current-turn user message
This separation keeps the stable prefix stable for caching.
## Memory snapshots
Local memory and user profile data are injected as frozen snapshots at session start. Mid-session writes update disk state but do not mutate the already-built system prompt until a new session or forced rebuild occurs.
## Context files
`agent/prompt_builder.py` scans and sanitizes project context files using a **priority system** — only one type is loaded (first match wins):
1. `.hermes.md` / `HERMES.md` (walks to git root)
2. `AGENTS.md` (recursive directory walk)
3. `CLAUDE.md` (CWD only)
4. `.cursorrules` / `.cursor/rules/*.mdc` (CWD only)
`SOUL.md` is loaded separately via `load_soul_md()` for the identity slot. When it loads successfully, `build_context_files_prompt(skip_soul=True)` prevents it from appearing twice.
Long files are truncated before injection.
## Skills index
The skills system contributes a compact skills index to the prompt when skills tooling is available.
## Why prompt assembly is split this way
The architecture is intentionally optimized to:
- preserve provider-side prompt caching
- avoid mutating history unnecessarily
- keep memory semantics understandable
- let gateway/ACP/CLI add context without poisoning persistent prompt state
## Related docs
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
- [Session Storage](./session-storage.md)
- [Gateway Internals](./gateway-internals.md)

View file

@ -0,0 +1,186 @@
---
sidebar_position: 4
title: "Provider Runtime Resolution"
description: "How Hermes resolves providers, credentials, API modes, and auxiliary models at runtime"
---
# Provider Runtime Resolution
Hermes has a shared provider runtime resolver used across:
- CLI
- gateway
- cron jobs
- ACP
- auxiliary model calls
Primary implementation:
- `hermes_cli/runtime_provider.py` — credential resolution, `_resolve_custom_runtime()`
- `hermes_cli/auth.py` — provider registry, `resolve_provider()`
- `hermes_cli/model_switch.py` — shared `/model` switch pipeline (CLI + gateway)
- `agent/auxiliary_client.py` — auxiliary model routing
If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
## Resolution precedence
At a high level, provider resolution uses:
1. explicit CLI/runtime request
2. `config.yaml` model/provider config
3. environment variables
4. provider-specific defaults or auto resolution
That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in `hermes model`.
## Providers
Current provider families include:
- AI Gateway (Vercel)
- OpenRouter
- Nous Portal
- OpenAI Codex
- Anthropic (native)
- Z.AI
- Kimi / Moonshot
- MiniMax
- MiniMax China
- Custom (`provider: custom`) — first-class provider for any OpenAI-compatible endpoint
- Named custom providers (`custom_providers` list in config.yaml)
## Output of runtime resolution
The runtime resolver returns data such as:
- `provider`
- `api_mode`
- `base_url`
- `api_key`
- `source`
- provider-specific metadata like expiry/refresh info
## Why this matters
This resolver is the main reason Hermes can share auth/runtime logic between:
- `hermes chat`
- gateway message handling
- cron jobs running in fresh sessions
- ACP editor sessions
- auxiliary model tasks
## AI Gateway
Set `AI_GATEWAY_API_KEY` in `~/.hermes/.env` and run with `--provider ai-gateway`. Hermes fetches available models from the gateway's `/models` endpoint, filtering to language models with tool-use support.
## OpenRouter, AI Gateway, and custom OpenAI-compatible base URLs
Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when multiple provider keys exist (e.g. `OPENROUTER_API_KEY`, `AI_GATEWAY_API_KEY`, and `OPENAI_API_KEY`).
Each provider's API key is scoped to its own base URL:
- `OPENROUTER_API_KEY` is only sent to `openrouter.ai` endpoints
- `AI_GATEWAY_API_KEY` is only sent to `ai-gateway.vercel.sh` endpoints
- `OPENAI_API_KEY` is used for custom endpoints and as a fallback
Hermes also distinguishes between:
- a real custom endpoint selected by the user
- the OpenRouter fallback path used when no custom endpoint is configured
That distinction is especially important for:
- local model servers
- non-OpenRouter/non-AI Gateway OpenAI-compatible APIs
- switching providers without re-running setup
- config-saved custom endpoints that should keep working even when `OPENAI_BASE_URL` is not exported in the current shell
## Native Anthropic path
Anthropic is not just "via OpenRouter" anymore.
When provider resolution selects `anthropic`, Hermes uses:
- `api_mode = anthropic_messages`
- the native Anthropic Messages API
- `agent/anthropic_adapter.py` for translation
Credential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:
- Claude Code credential files are treated as the preferred source when they include refreshable auth
- manual `ANTHROPIC_TOKEN` / `CLAUDE_CODE_OAUTH_TOKEN` values still work as explicit overrides
- Hermes preflights Anthropic credential refresh before native Messages API calls
- Hermes still retries once on a 401 after rebuilding the Anthropic client, as a fallback path
## OpenAI Codex path
Codex uses a separate Responses API path:
- `api_mode = codex_responses`
- dedicated credential resolution and auth store support
## Auxiliary model routing
Auxiliary tasks such as:
- vision
- web extraction summarization
- context compression summaries
- session search summarization
- skills hub operations
- MCP helper operations
- memory flushes
can use their own provider/model routing rather than the main conversational model.
When an auxiliary task is configured with provider `main`, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:
- env-driven custom endpoints still work
- custom endpoints saved via `hermes model` / `config.yaml` also work
- auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback
## Fallback models
Hermes supports a configured fallback model/provider pair, allowing runtime failover when the primary model encounters errors.
### How it works internally
1. **Storage**: `AIAgent.__init__` stores the `fallback_model` dict and sets `_fallback_activated = False`.
2. **Trigger points**: `_try_activate_fallback()` is called from three places in the main retry loop in `run_agent.py`:
- After max retries on invalid API responses (None choices, missing content)
- On non-retryable client errors (HTTP 401, 403, 404)
- After max retries on transient errors (HTTP 429, 500, 502, 503)
3. **Activation flow** (`_try_activate_fallback`):
- Returns `False` immediately if already activated or not configured
- Calls `resolve_provider_client()` from `auxiliary_client.py` to build a new client with proper auth
- Determines `api_mode`: `codex_responses` for openai-codex, `anthropic_messages` for anthropic, `chat_completions` for everything else
- Swaps in-place: `self.model`, `self.provider`, `self.base_url`, `self.api_mode`, `self.client`, `self._client_kwargs`
- For anthropic fallback: builds a native Anthropic client instead of OpenAI-compatible
- Re-evaluates prompt caching (enabled for Claude models on OpenRouter)
- Sets `_fallback_activated = True` — prevents firing again
- Resets retry count to 0 and continues the loop
4. **Config flow**:
- CLI: `cli.py` reads `CLI_CONFIG["fallback_model"]` → passes to `AIAgent(fallback_model=...)`
- Gateway: `gateway/run.py._load_fallback_model()` reads `config.yaml` → passes to `AIAgent`
- Validation: both `provider` and `model` keys must be non-empty, or fallback is disabled
### What does NOT support fallback
- **Subagent delegation** (`tools/delegate_tool.py`): subagents inherit the parent's provider but not the fallback config
- **Cron jobs** (`cron/`): run with a fixed provider, no fallback mechanism
- **Auxiliary tasks**: use their own independent provider auto-detection chain (see Auxiliary model routing above)
### Test coverage
See `tests/test_fallback_model.py` for comprehensive tests covering all supported providers, one-shot semantics, and edge cases.
## Related docs
- [Agent Loop Internals](./agent-loop.md)
- [ACP Internals](./acp-internals.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)

View file

@ -0,0 +1,66 @@
---
sidebar_position: 8
title: "Session Storage"
description: "How Hermes stores sessions in SQLite, maintains lineage, and exposes recall/search"
---
# Session Storage
Hermes uses a SQLite-backed session store as the main source of truth for historical conversation state.
Primary files:
- `hermes_state.py`
- `gateway/session.py`
- `tools/session_search_tool.py`
## Main database
The primary store lives at:
```text
~/.hermes/state.db
```
It contains:
- sessions
- messages
- metadata such as token counts and titles
- lineage relationships
- full-text search indexes
## What is stored per session
Examples of important session metadata:
- session ID
- source/platform
- title
- created/updated timestamps
- token counts
- tool call counts
- stored system prompt snapshot
- parent session ID after compression splits
## Lineage
When Hermes compresses a conversation, it can continue in a new session ID while preserving ancestry via `parent_session_id`.
This means resuming/searching can follow session families instead of treating each compressed shard as unrelated.
## Gateway vs CLI persistence
- CLI uses the state DB directly for resume/history/search
- gateway keeps active-session mappings and may also maintain additional platform transcript/state files
- some legacy JSON/JSONL artifacts still exist for compatibility, but SQLite is the main historical store
## Session search
The `session_search` tool uses the session DB's search features to retrieve and summarize relevant past work.
## Related docs
- [Gateway Internals](./gateway-internals.md)
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)

View file

@ -0,0 +1,65 @@
---
sidebar_position: 9
title: "Tools Runtime"
description: "Runtime behavior of the tool registry, toolsets, dispatch, and terminal environments"
---
# Tools Runtime
Hermes tools are self-registering functions grouped into toolsets and executed through a central registry/dispatch system.
Primary files:
- `tools/registry.py`
- `model_tools.py`
- `toolsets.py`
- `tools/terminal_tool.py`
- `tools/environments/*`
## Tool registration model
Each tool module calls `registry.register(...)` at import time.
`model_tools.py` is responsible for importing/discovering tool modules and building the schema list used by the model.
## Toolset resolution
Toolsets are named bundles of tools. Hermes resolves them through:
- explicit enabled/disabled toolset lists
- platform presets (`hermes-cli`, `hermes-telegram`, etc.)
- dynamic MCP toolsets
- curated special-purpose sets like `hermes-acp`
## Dispatch
At runtime, tools are dispatched through the central registry, with agent-loop exceptions for some agent-level tools such as memory/todo/session-search handling.
## Terminal/runtime environments
The terminal system supports multiple backends:
- local
- docker
- ssh
- singularity
- modal
- daytona
It also supports:
- per-task cwd overrides
- background process management
- PTY mode
- approval callbacks for dangerous commands
## Concurrency
Tool calls may execute sequentially or concurrently depending on the tool mix and interaction requirements.
## Related docs
- [Toolsets Reference](../reference/toolsets-reference.md)
- [Built-in Tools Reference](../reference/tools-reference.md)
- [Agent Loop Internals](./agent-loop.md)
- [ACP Internals](./acp-internals.md)

View file

@ -0,0 +1,56 @@
---
sidebar_position: 10
title: "Trajectories & Training Format"
description: "How Hermes saves trajectories, normalizes tool calls, and produces training-friendly outputs"
---
# Trajectories & Training Format
Hermes can save conversation trajectories for training, evaluation, and batch data generation workflows.
Primary files:
- `agent/trajectory.py`
- `run_agent.py`
- `batch_runner.py`
- `trajectory_compressor.py`
## What trajectories are for
Trajectory outputs are used for:
- SFT data generation
- debugging agent behavior
- benchmark/evaluation artifact capture
- post-processing and compression pipelines
## Normalization strategy
Hermes converts live conversation structure into a training-friendly format.
Important behaviors include:
- representing reasoning in explicit markup
- converting tool calls into structured XML-like regions for dataset compatibility
- grouping tool outputs appropriately
- separating successful and failed trajectories
## Persistence boundaries
Trajectory files do **not** blindly mirror all runtime prompt state.
Some prompt-time-only layers are intentionally excluded from persisted trajectory content so datasets are cleaner and less environment-specific.
## Batch runner
`batch_runner.py` emits richer metadata than single-session trajectory saving, including:
- model/provider metadata
- toolset info
- partial/failure markers
- tool statistics
## Related docs
- [Environments, Benchmarks & Data Generation](./environments.md)
- [Agent Loop Internals](./agent-loop.md)