The architecture has been updated

2026-03-31 23:31:36 +03:00 · 2026-03-31 23:31:36 +03:00 · a01257ead9
commit a01257ead9
parent 805f7a017e
1119 changed files with 226 additions and 352 deletions
--- a/hermes_code/website/docs/developer-guide/_category_.json
+++ b/hermes_code/website/docs/developer-guide/_category_.json
@ -0,0 +1,8 @@
+{
+  "label": "Developer Guide",
+  "position": 3,
+  "link": {
+    "type": "generated-index",
+    "description": "Contribute to Hermes Agent — architecture, tools, skills, and more."
+  }
+}
--- a/hermes_code/website/docs/developer-guide/acp-internals.md
+++ b/hermes_code/website/docs/developer-guide/acp-internals.md
@ -0,0 +1,182 @@
+---
+sidebar_position: 2
+title: "ACP Internals"
+description: "How the ACP adapter works: lifecycle, sessions, event bridge, approvals, and tool rendering"
+---
+
+# ACP Internals
+
+The ACP adapter wraps Hermes' synchronous `AIAgent` in an async JSON-RPC stdio server.
+
+Key implementation files:
+
+- `acp_adapter/entry.py`
+- `acp_adapter/server.py`
+- `acp_adapter/session.py`
+- `acp_adapter/events.py`
+- `acp_adapter/permissions.py`
+- `acp_adapter/tools.py`
+- `acp_adapter/auth.py`
+- `acp_registry/agent.json`
+
+## Boot flow
+
+```text
+hermes acp / hermes-acp / python -m acp_adapter
+  -> acp_adapter.entry.main()
+  -> load ~/.hermes/.env
+  -> configure stderr logging
+  -> construct HermesACPAgent
+  -> acp.run_agent(agent)
+```
+
+Stdout is reserved for ACP JSON-RPC transport. Human-readable logs go to stderr.
+
+## Major components
+
+### `HermesACPAgent`
+
+`acp_adapter/server.py` implements the ACP agent protocol.
+
+Responsibilities:
+
+- initialize / authenticate
+- new/load/resume/fork/list/cancel session methods
+- prompt execution
+- session model switching
+- wiring sync AIAgent callbacks into ACP async notifications
+
+### `SessionManager`
+
+`acp_adapter/session.py` tracks live ACP sessions.
+
+Each session stores:
+
+- `session_id`
+- `agent`
+- `cwd`
+- `model`
+- `history`
+- `cancel_event`
+
+The manager is thread-safe and supports:
+
+- create
+- get
+- remove
+- fork
+- list
+- cleanup
+- cwd updates
+
+### Event bridge
+
+`acp_adapter/events.py` converts AIAgent callbacks into ACP `session_update` events.
+
+Bridged callbacks:
+
+- `tool_progress_callback`
+- `thinking_callback`
+- `step_callback`
+- `message_callback`
+
+Because `AIAgent` runs in a worker thread while ACP I/O lives on the main event loop, the bridge uses:
+
+```python
+asyncio.run_coroutine_threadsafe(...)
+```
+
+### Permission bridge
+
+`acp_adapter/permissions.py` adapts dangerous terminal approval prompts into ACP permission requests.
+
+Mapping:
+
+- `allow_once` -> Hermes `once`
+- `allow_always` -> Hermes `always`
+- reject options -> Hermes `deny`
+
+Timeouts and bridge failures deny by default.
+
+### Tool rendering helpers
+
+`acp_adapter/tools.py` maps Hermes tools to ACP tool kinds and builds editor-facing content.
+
+Examples:
+
+- `patch` / `write_file` -> file diffs
+- `terminal` -> shell command text
+- `read_file` / `search_files` -> text previews
+- large results -> truncated text blocks for UI safety
+
+## Session lifecycle
+
+```text
+new_session(cwd)
+  -> create SessionState
+  -> create AIAgent(platform="acp", enabled_toolsets=["hermes-acp"])
+  -> bind task_id/session_id to cwd override
+
+prompt(..., session_id)
+  -> extract text from ACP content blocks
+  -> reset cancel event
+  -> install callbacks + approval bridge
+  -> run AIAgent in ThreadPoolExecutor
+  -> update session history
+  -> emit final agent message chunk
+```
+
+### Cancelation
+
+`cancel(session_id)`:
+
+- sets the session cancel event
+- calls `agent.interrupt()` when available
+- causes the prompt response to return `stop_reason="cancelled"`
+
+### Forking
+
+`fork_session()` deep-copies message history into a new live session, preserving conversation state while giving the fork its own session ID and cwd.
+
+## Provider/auth behavior
+
+ACP does not implement its own auth store.
+
+Instead it reuses Hermes' runtime resolver:
+
+- `acp_adapter/auth.py`
+- `hermes_cli/runtime_provider.py`
+
+So ACP advertises and uses the currently configured Hermes provider/credentials.
+
+## Working directory binding
+
+ACP sessions carry an editor cwd.
+
+The session manager binds that cwd to the ACP session ID via task-scoped terminal/file overrides, so file and terminal tools operate relative to the editor workspace.
+
+## Duplicate same-name tool calls
+
+The event bridge tracks tool IDs FIFO per tool name, not just one ID per name. This is important for:
+
+- parallel same-name calls
+- repeated same-name calls in one step
+
+Without FIFO queues, completion events would attach to the wrong tool invocation.
+
+## Approval callback restoration
+
+ACP temporarily installs an approval callback on the terminal tool during prompt execution, then restores the previous callback afterward. This avoids leaving ACP session-specific approval handlers installed globally forever.
+
+## Current limitations
+
+- ACP sessions are process-local from the ACP server's point of view
+- non-text prompt blocks are currently ignored for request text extraction
+- editor-specific UX varies by ACP client implementation
+
+## Related files
+
+- `tests/acp/` — ACP test suite
+- `toolsets.py` — `hermes-acp` toolset definition
+- `hermes_cli/main.py` — `hermes acp` CLI subcommand
+- `pyproject.toml` — `[acp]` optional dependency + `hermes-acp` script
--- a/hermes_code/website/docs/developer-guide/adding-providers.md
+++ b/hermes_code/website/docs/developer-guide/adding-providers.md
@ -0,0 +1,424 @@
+---
+sidebar_position: 5
+title: "Adding Providers"
+description: "How to add a new inference provider to Hermes Agent — auth, runtime resolution, CLI flows, adapters, tests, and docs"
+---
+
+# Adding Providers
+
+Hermes can already talk to any OpenAI-compatible endpoint through the custom provider path. Do not add a built-in provider unless you want first-class UX for that service:
+
+- provider-specific auth or token refresh
+- a curated model catalog
+- setup / `hermes model` menu entries
+- provider aliases for `provider:model` syntax
+- a non-OpenAI API shape that needs an adapter
+
+If the provider is just "another OpenAI-compatible base URL and API key", a named custom provider may be enough.
+
+## The mental model
+
+A built-in provider has to line up across a few layers:
+
+1. `hermes_cli/auth.py` decides how credentials are found.
+2. `hermes_cli/runtime_provider.py` turns that into runtime data:
+   - `provider`
+   - `api_mode`
+   - `base_url`
+   - `api_key`
+   - `source`
+3. `run_agent.py` uses `api_mode` to decide how requests are built and sent.
+4. `hermes_cli/models.py`, `hermes_cli/main.py`, and `hermes_cli/setup.py` make the provider show up in the CLI.
+5. `agent/auxiliary_client.py` and `agent/model_metadata.py` keep side tasks and token budgeting working.
+
+The important abstraction is `api_mode`.
+
+- Most providers use `chat_completions`.
+- Codex uses `codex_responses`.
+- Anthropic uses `anthropic_messages`.
+- A new non-OpenAI protocol usually means adding a new adapter and a new `api_mode` branch.
+
+## Choose the implementation path first
+
+### Path A — OpenAI-compatible provider
+
+Use this when the provider accepts standard chat-completions style requests.
+
+Typical work:
+
+- add auth metadata
+- add model catalog / aliases
+- add runtime resolution
+- add CLI menu wiring
+- add aux-model defaults
+- add tests and user docs
+
+You usually do not need a new adapter or a new `api_mode`.
+
+### Path B — Native provider
+
+Use this when the provider does not behave like OpenAI chat completions.
+
+Examples in-tree today:
+
+- `codex_responses`
+- `anthropic_messages`
+
+This path includes everything from Path A plus:
+
+- a provider adapter in `agent/`
+- `run_agent.py` branches for request building, dispatch, usage extraction, interrupt handling, and response normalization
+- adapter tests
+
+## File checklist
+
+### Required for every built-in provider
+
+1. `hermes_cli/auth.py`
+2. `hermes_cli/models.py`
+3. `hermes_cli/runtime_provider.py`
+4. `hermes_cli/main.py`
+5. `hermes_cli/setup.py`
+6. `agent/auxiliary_client.py`
+7. `agent/model_metadata.py`
+8. tests
+9. user-facing docs under `website/docs/`
+
+### Additional for native / non-OpenAI providers
+
+10. `agent/<provider>_adapter.py`
+11. `run_agent.py`
+12. `pyproject.toml` if a provider SDK is required
+
+## Step 1: Pick one canonical provider id
+
+Choose a single provider id and use it everywhere.
+
+Examples from the repo:
+
+- `openai-codex`
+- `kimi-coding`
+- `minimax-cn`
+
+That same id should appear in:
+
+- `PROVIDER_REGISTRY` in `hermes_cli/auth.py`
+- `_PROVIDER_LABELS` in `hermes_cli/models.py`
+- `_PROVIDER_ALIASES` in both `hermes_cli/auth.py` and `hermes_cli/models.py`
+- CLI `--provider` choices in `hermes_cli/main.py`
+- setup / model selection branches
+- auxiliary-model defaults
+- tests
+
+If the id differs between those files, the provider will feel half-wired: auth may work while `/model`, setup, or runtime resolution silently misses it.
+
+## Step 2: Add auth metadata in `hermes_cli/auth.py`
+
+For API-key providers, add a `ProviderConfig` entry to `PROVIDER_REGISTRY` with:
+
+- `id`
+- `name`
+- `auth_type="api_key"`
+- `inference_base_url`
+- `api_key_env_vars`
+- optional `base_url_env_var`
+
+Also add aliases to `_PROVIDER_ALIASES`.
+
+Use the existing providers as templates:
+
+- simple API-key path: Z.AI, MiniMax
+- API-key path with endpoint detection: Kimi, Z.AI
+- native token resolution: Anthropic
+- OAuth / auth-store path: Nous, OpenAI Codex
+
+Questions to answer here:
+
+- What env vars should Hermes check, and in what priority order?
+- Does the provider need base-URL overrides?
+- Does it need endpoint probing or token refresh?
+- What should the auth error say when credentials are missing?
+
+If the provider needs something more than "look up an API key", add a dedicated credential resolver instead of shoving logic into unrelated branches.
+
+## Step 3: Add model catalog and aliases in `hermes_cli/models.py`
+
+Update the provider catalog so the provider works in menus and in `provider:model` syntax.
+
+Typical edits:
+
+- `_PROVIDER_MODELS`
+- `_PROVIDER_LABELS`
+- `_PROVIDER_ALIASES`
+- provider display order inside `list_available_providers()`
+- `provider_model_ids()` if the provider supports a live `/models` fetch
+
+If the provider exposes a live model list, prefer that first and keep `_PROVIDER_MODELS` as the static fallback.
+
+This file is also what makes inputs like these work:
+
+```text
+anthropic:claude-sonnet-4-6
+kimi:model-name
+```
+
+If aliases are missing here, the provider may authenticate correctly but still fail in `/model` parsing.
+
+## Step 4: Resolve runtime data in `hermes_cli/runtime_provider.py`
+
+`resolve_runtime_provider()` is the shared path used by CLI, gateway, cron, ACP, and helper clients.
+
+Add a branch that returns a dict with at least:
+
+```python
+{
+    "provider": "your-provider",
+    "api_mode": "chat_completions",  # or your native mode
+    "base_url": "https://...",
+    "api_key": "...",
+    "source": "env|portal|auth-store|explicit",
+    "requested_provider": requested_provider,
+}
+```
+
+If the provider is OpenAI-compatible, `api_mode` should usually stay `chat_completions`.
+
+Be careful with API-key precedence. Hermes already contains logic to avoid leaking an OpenRouter key to unrelated endpoints. A new provider should be equally explicit about which key goes to which base URL.
+
+## Step 5: Wire the CLI in `hermes_cli/main.py` and `hermes_cli/setup.py`
+
+A provider is not discoverable until it shows up in the interactive flows.
+
+Update:
+
+### `hermes_cli/main.py`
+
+- `provider_labels`
+- provider dispatch inside the `model` command
+- `--provider` argument choices
+- login/logout choices if the provider supports those flows
+- a `_model_flow_<provider>()` function, or reuse `_model_flow_api_key_provider()` if it fits
+
+### `hermes_cli/setup.py`
+
+- `provider_choices`
+- auth branch for the provider
+- model-selection branch
+- any provider-specific explanatory text
+- any place where a provider should be excluded from OpenRouter-only prompts or routing settings
+
+If you only update one of these files, `hermes model` and `hermes setup` will drift.
+
+## Step 6: Keep auxiliary calls working
+
+Two files matter here:
+
+### `agent/auxiliary_client.py`
+
+Add a cheap / fast default aux model to `_API_KEY_PROVIDER_AUX_MODELS` if this is a direct API-key provider.
+
+Auxiliary tasks include things like:
+
+- vision summarization
+- web extraction summarization
+- context compression summaries
+- session-search summaries
+- memory flushes
+
+If the provider has no sensible aux default, side tasks may fall back badly or use an expensive main model unexpectedly.
+
+### `agent/model_metadata.py`
+
+Add context lengths for the provider's models so token budgeting, compression thresholds, and limits stay sane.
+
+## Step 7: If the provider is native, add an adapter and `run_agent.py` support
+
+If the provider is not plain chat completions, isolate the provider-specific logic in `agent/<provider>_adapter.py`.
+
+Keep `run_agent.py` focused on orchestration. It should call adapter helpers, not hand-build provider payloads inline all over the file.
+
+A native provider usually needs work in these places:
+
+### New adapter file
+
+Typical responsibilities:
+
+- build the SDK / HTTP client
+- resolve tokens
+- convert OpenAI-style conversation messages to the provider's request format
+- convert tool schemas if needed
+- normalize provider responses back into what `run_agent.py` expects
+- extract usage and finish-reason data
+
+### `run_agent.py`
+
+Search for `api_mode` and audit every switch point. At minimum, verify:
+
+- `__init__` chooses the new `api_mode`
+- client construction works for the provider
+- `_build_api_kwargs()` knows how to format requests
+- `_api_call_with_interrupt()` dispatches to the right client call
+- interrupt / client rebuild paths work
+- response validation accepts the provider's shape
+- finish-reason extraction is correct
+- token-usage extraction is correct
+- fallback-model activation can switch into the new provider cleanly
+- summary-generation and memory-flush paths still work
+
+Also search `run_agent.py` for `self.client.`. Any code path that assumes the standard OpenAI client exists can break when a native provider uses a different client object or `self.client = None`.
+
+### Prompt caching and provider-specific request fields
+
+Prompt caching and provider-specific knobs are easy to regress.
+
+Examples already in-tree:
+
+- Anthropic has a native prompt-caching path
+- OpenRouter gets provider-routing fields
+- not every provider should receive every request-side option
+
+When you add a native provider, double-check that Hermes is only sending fields that provider actually understands.
+
+## Step 8: Tests
+
+At minimum, touch the tests that guard provider wiring.
+
+Common places:
+
+- `tests/test_runtime_provider_resolution.py`
+- `tests/test_cli_provider_resolution.py`
+- `tests/test_cli_model_command.py`
+- `tests/test_setup_model_selection.py`
+- `tests/test_provider_parity.py`
+- `tests/test_run_agent.py`
+- `tests/test_<provider>_adapter.py` for a native provider
+
+For docs-only examples, the exact file set may differ. The point is to cover:
+
+- auth resolution
+- CLI menu / provider selection
+- runtime provider resolution
+- agent execution path
+- provider:model parsing
+- any adapter-specific message conversion
+
+Run tests with xdist disabled:
+
+```bash
+source venv/bin/activate
+python -m pytest tests/test_runtime_provider_resolution.py tests/test_cli_provider_resolution.py tests/test_cli_model_command.py tests/test_setup_model_selection.py -n0 -q
+```
+
+For deeper changes, run the full suite before pushing:
+
+```bash
+source venv/bin/activate
+python -m pytest tests/ -n0 -q
+```
+
+## Step 9: Live verification
+
+After tests, run a real smoke test.
+
+```bash
+source venv/bin/activate
+python -m hermes_cli.main chat -q "Say hello" --provider your-provider --model your-model
+```
+
+Also test the interactive flows if you changed menus:
+
+```bash
+source venv/bin/activate
+python -m hermes_cli.main model
+python -m hermes_cli.main setup
+```
+
+For native providers, verify at least one tool call too, not just a plain text response.
+
+## Step 10: Update user-facing docs
+
+If the provider is meant to ship as a first-class option, update the user docs too:
+
+- `website/docs/getting-started/quickstart.md`
+- `website/docs/user-guide/configuration.md`
+- `website/docs/reference/environment-variables.md`
+
+A developer can wire the provider perfectly and still leave users unable to discover the required env vars or setup flow.
+
+## OpenAI-compatible provider checklist
+
+Use this if the provider is standard chat completions.
+
+- [ ] `ProviderConfig` added in `hermes_cli/auth.py`
+- [ ] aliases added in `hermes_cli/auth.py` and `hermes_cli/models.py`
+- [ ] model catalog added in `hermes_cli/models.py`
+- [ ] runtime branch added in `hermes_cli/runtime_provider.py`
+- [ ] CLI wiring added in `hermes_cli/main.py`
+- [ ] setup wiring added in `hermes_cli/setup.py`
+- [ ] aux model added in `agent/auxiliary_client.py`
+- [ ] context lengths added in `agent/model_metadata.py`
+- [ ] runtime / CLI tests updated
+- [ ] user docs updated
+
+## Native provider checklist
+
+Use this when the provider needs a new protocol path.
+
+- [ ] everything in the OpenAI-compatible checklist
+- [ ] adapter added in `agent/<provider>_adapter.py`
+- [ ] new `api_mode` supported in `run_agent.py`
+- [ ] interrupt / rebuild path works
+- [ ] usage and finish-reason extraction works
+- [ ] fallback path works
+- [ ] adapter tests added
+- [ ] live smoke test passes
+
+## Common pitfalls
+
+### 1. Adding the provider to auth but not to model parsing
+
+That makes credentials resolve correctly while `/model` and `provider:model` inputs fail.
+
+### 2. Forgetting that `config["model"]` can be a string or a dict
+
+A lot of provider-selection code has to normalize both forms.
+
+### 3. Assuming a built-in provider is required
+
+If the service is just OpenAI-compatible, a custom provider may already solve the user problem with less maintenance.
+
+### 4. Forgetting auxiliary paths
+
+The main chat path can work while summarization, memory flushes, or vision helpers fail because aux routing was never updated.
+
+### 5. Native-provider branches hiding in `run_agent.py`
+
+Search for `api_mode` and `self.client.`. Do not assume the obvious request path is the only one.
+
+### 6. Sending OpenRouter-only knobs to other providers
+
+Fields like provider routing belong only on the providers that support them.
+
+### 7. Updating `hermes model` but not `hermes setup`
+
+Both flows need to know about the provider.
+
+## Good search targets while implementing
+
+If you are hunting for all the places a provider touches, search these symbols:
+
+- `PROVIDER_REGISTRY`
+- `_PROVIDER_ALIASES`
+- `_PROVIDER_MODELS`
+- `resolve_runtime_provider`
+- `_model_flow_`
+- `provider_choices`
+- `api_mode`
+- `_API_KEY_PROVIDER_AUX_MODELS`
+- `self.client.`
+
+## Related docs
+
+- [Provider Runtime Resolution](./provider-runtime.md)
+- [Architecture](./architecture.md)
+- [Contributing](./contributing.md)
--- a/hermes_code/website/docs/developer-guide/adding-tools.md
+++ b/hermes_code/website/docs/developer-guide/adding-tools.md
@ -0,0 +1,208 @@
+---
+sidebar_position: 2
+title: "Adding Tools"
+description: "How to add a new tool to Hermes Agent — schemas, handlers, registration, and toolsets"
+---
+
+# Adding Tools
+
+Before writing a tool, ask yourself: **should this be a [skill](creating-skills.md) instead?**
+
+Make it a **Skill** when the capability can be expressed as instructions + shell commands + existing tools (arXiv search, git workflows, Docker management, PDF processing).
+
+Make it a **Tool** when it requires end-to-end integration with API keys, custom processing logic, binary data handling, or streaming (browser automation, TTS, vision analysis).
+
+## Overview
+
+Adding a tool touches **3 files**:
+
+1. **`tools/your_tool.py`** — handler, schema, check function, `registry.register()` call
+2. **`toolsets.py`** — add tool name to `_HERMES_CORE_TOOLS` (or a specific toolset)
+3. **`model_tools.py`** — add `"tools.your_tool"` to the `_discover_tools()` list
+
+## Step 1: Create the Tool File
+
+Every tool file follows the same structure:
+
+```python
+# tools/weather_tool.py
+"""Weather Tool -- look up current weather for a location."""
+
+import json
+import os
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+# --- Availability check ---
+
+def check_weather_requirements() -> bool:
+    """Return True if the tool's dependencies are available."""
+    return bool(os.getenv("WEATHER_API_KEY"))
+
+
+# --- Handler ---
+
+def weather_tool(location: str, units: str = "metric") -> str:
+    """Fetch weather for a location. Returns JSON string."""
+    api_key = os.getenv("WEATHER_API_KEY")
+    if not api_key:
+        return json.dumps({"error": "WEATHER_API_KEY not configured"})
+    try:
+        # ... call weather API ...
+        return json.dumps({"location": location, "temp": 22, "units": units})
+    except Exception as e:
+        return json.dumps({"error": str(e)})
+
+
+# --- Schema ---
+
+WEATHER_SCHEMA = {
+    "name": "weather",
+    "description": "Get current weather for a location.",
+    "parameters": {
+        "type": "object",
+        "properties": {
+            "location": {
+                "type": "string",
+                "description": "City name or coordinates (e.g. 'London' or '51.5,-0.1')"
+            },
+            "units": {
+                "type": "string",
+                "enum": ["metric", "imperial"],
+                "description": "Temperature units (default: metric)",
+                "default": "metric"
+            }
+        },
+        "required": ["location"]
+    }
+}
+
+
+# --- Registration ---
+
+from tools.registry import registry
+
+registry.register(
+    name="weather",
+    toolset="weather",
+    schema=WEATHER_SCHEMA,
+    handler=lambda args, **kw: weather_tool(
+        location=args.get("location", ""),
+        units=args.get("units", "metric")),
+    check_fn=check_weather_requirements,
+    requires_env=["WEATHER_API_KEY"],
+)
+```
+
+### Key Rules
+
+:::danger Important
+- Handlers **MUST** return a JSON string (via `json.dumps()`), never raw dicts
+- Errors **MUST** be returned as `{"error": "message"}`, never raised as exceptions
+- The `check_fn` is called when building tool definitions — if it returns `False`, the tool is silently excluded
+- The `handler` receives `(args: dict, **kwargs)` where `args` is the LLM's tool call arguments
+:::
+
+## Step 2: Add to a Toolset
+
+In `toolsets.py`, add the tool name:
+
+```python
+# If it should be available on all platforms (CLI + messaging):
+_HERMES_CORE_TOOLS = [
+    ...
+    "weather",  # <-- add here
+]
+
+# Or create a new standalone toolset:
+"weather": {
+    "description": "Weather lookup tools",
+    "tools": ["weather"],
+    "includes": []
+},
+```
+
+## Step 3: Add Discovery Import
+
+In `model_tools.py`, add the module to the `_discover_tools()` list:
+
+```python
+def _discover_tools():
+    _modules = [
+        ...
+        "tools.weather_tool",  # <-- add here
+    ]
+```
+
+This import triggers the `registry.register()` call at the bottom of your tool file.
+
+## Async Handlers
+
+If your handler needs async code, mark it with `is_async=True`:
+
+```python
+async def weather_tool_async(location: str) -> str:
+    async with aiohttp.ClientSession() as session:
+        ...
+    return json.dumps(result)
+
+registry.register(
+    name="weather",
+    toolset="weather",
+    schema=WEATHER_SCHEMA,
+    handler=lambda args, **kw: weather_tool_async(args.get("location", "")),
+    check_fn=check_weather_requirements,
+    is_async=True,  # registry calls _run_async() automatically
+)
+```
+
+The registry handles async bridging transparently — you never call `asyncio.run()` yourself.
+
+## Handlers That Need task_id
+
+Tools that manage per-session state receive `task_id` via `**kwargs`:
+
+```python
+def _handle_weather(args, **kw):
+    task_id = kw.get("task_id")
+    return weather_tool(args.get("location", ""), task_id=task_id)
+
+registry.register(
+    name="weather",
+    ...
+    handler=_handle_weather,
+)
+```
+
+## Agent-Loop Intercepted Tools
+
+Some tools (`todo`, `memory`, `session_search`, `delegate_task`) need access to per-session agent state. These are intercepted by `run_agent.py` before reaching the registry. The registry still holds their schemas, but `dispatch()` returns a fallback error if the intercept is bypassed.
+
+## Optional: Setup Wizard Integration
+
+If your tool requires an API key, add it to `hermes_cli/config.py`:
+
+```python
+OPTIONAL_ENV_VARS = {
+    ...
+    "WEATHER_API_KEY": {
+        "description": "Weather API key for weather lookup",
+        "prompt": "Weather API key",
+        "url": "https://weatherapi.com/",
+        "tools": ["weather"],
+        "password": True,
+    },
+}
+```
+
+## Checklist
+
+- [ ] Tool file created with handler, schema, check function, and registration
+- [ ] Added to appropriate toolset in `toolsets.py`
+- [ ] Discovery import added to `model_tools.py`
+- [ ] Handler returns JSON strings, errors returned as `{"error": "..."}`
+- [ ] Optional: API key added to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
+- [ ] Optional: Added to `toolset_distributions.py` for batch processing
+- [ ] Tested with `hermes chat -q "Use the weather tool for London"`
--- a/hermes_code/website/docs/developer-guide/agent-loop.md
+++ b/hermes_code/website/docs/developer-guide/agent-loop.md
@ -0,0 +1,112 @@
+---
+sidebar_position: 3
+title: "Agent Loop Internals"
+description: "Detailed walkthrough of AIAgent execution, API modes, tools, callbacks, and fallback behavior"
+---
+
+# Agent Loop Internals
+
+The core orchestration engine is `run_agent.py`'s `AIAgent`.
+
+## Core responsibilities
+
+`AIAgent` is responsible for:
+
+- assembling the effective prompt and tool schemas
+- selecting the correct provider/API mode
+- making interruptible model calls
+- executing tool calls (sequentially or concurrently)
+- maintaining session history
+- handling compression, retries, and fallback models
+
+## API modes
+
+Hermes currently supports three API execution modes:
+
+| API mode | Used for |
+|----------|----------|
+| `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints |
+| `codex_responses` | OpenAI Codex / Responses API path |
+| `anthropic_messages` | Native Anthropic Messages API |
+
+The mode is resolved from explicit args, provider selection, and base URL heuristics.
+
+## Turn lifecycle
+
+```text
+run_conversation()
+  -> generate effective task_id
+  -> append current user message
+  -> load or build cached system prompt
+  -> maybe preflight-compress
+  -> build api_messages
+  -> inject ephemeral prompt layers
+  -> apply prompt caching if appropriate
+  -> make interruptible API call
+  -> if tool calls: execute them, append tool results, loop
+  -> if final text: persist, cleanup, return response
+```
+
+## Interruptible API calls
+
+Hermes wraps API requests so they can be interrupted from the CLI or gateway.
+
+This matters because:
+
+- the agent may be in a long LLM call
+- the user may send a new message mid-flight
+- background systems may need cancellation semantics
+
+## Tool execution modes
+
+Hermes uses two execution strategies:
+
+- sequential execution for single or interactive tools
+- concurrent execution for multiple non-interactive tools
+
+Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history.
+
+## Callback surfaces
+
+`AIAgent` supports platform/integration callbacks such as:
+
+- `tool_progress_callback`
+- `thinking_callback`
+- `reasoning_callback`
+- `clarify_callback`
+- `step_callback`
+- `stream_delta_callback`
+- `tool_gen_callback`
+- `status_callback`
+
+These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows.
+
+## Budget and fallback behavior
+
+Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window.
+
+Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths.
+
+## Compression and persistence
+
+Before and during long runs, Hermes may:
+
+- flush memory before context loss
+- compress middle conversation turns
+- split the session lineage into a new session ID after compression
+- preserve recent context and structural tool-call/result consistency
+
+## Key files to read next
+
+- `run_agent.py`
+- `agent/prompt_builder.py`
+- `agent/context_compressor.py`
+- `agent/prompt_caching.py`
+- `model_tools.py`
+
+## Related docs
+
+- [Provider Runtime Resolution](./provider-runtime.md)
+- [Prompt Assembly](./prompt-assembly.md)
+- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+- [Tools Runtime](./tools-runtime.md)
--- a/hermes_code/website/docs/developer-guide/architecture.md
+++ b/hermes_code/website/docs/developer-guide/architecture.md
@ -0,0 +1,152 @@
+---
+sidebar_position: 1
+title: "Architecture"
+description: "Hermes Agent internals — major subsystems, execution paths, and where to read next"
+---
+
+# Architecture
+
+This page is the top-level map of Hermes Agent internals. The project has grown beyond a single monolithic loop, so the best way to understand it is by subsystem.
+
+## High-level structure
+
+```text
+hermes-agent/
+├── run_agent.py              # AIAgent core loop
+├── cli.py                    # interactive terminal UI
+├── model_tools.py            # tool discovery/orchestration
+├── toolsets.py               # tool groupings and presets
+├── hermes_state.py           # SQLite session/state database
+├── batch_runner.py           # batch trajectory generation
+│
+├── agent/                    # prompt building, compression, caching, metadata, trajectories
+├── hermes_cli/               # command entrypoints, auth, setup, models, config, doctor
+├── tools/                    # tool implementations and terminal environments
+├── gateway/                  # messaging gateway, session routing, delivery, pairing, hooks
+├── cron/                     # scheduled job storage and scheduler
+├── honcho_integration/       # Honcho memory integration
+├── acp_adapter/              # ACP editor integration server
+├── acp_registry/             # ACP registry manifest + icon
+├── environments/             # Hermes RL / benchmark environment framework
+├── skills/                   # bundled skills
+├── optional-skills/          # official optional skills
+└── tests/                    # test suite
+```
+
+## Recommended reading order
+
+If you are new to the codebase, read in this order:
+
+1. this page
+2. [Agent Loop Internals](./agent-loop.md)
+3. [Prompt Assembly](./prompt-assembly.md)
+4. [Provider Runtime Resolution](./provider-runtime.md)
+5. [Adding Providers](./adding-providers.md)
+6. [Tools Runtime](./tools-runtime.md)
+7. [Session Storage](./session-storage.md)
+8. [Gateway Internals](./gateway-internals.md)
+9. [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+10. [ACP Internals](./acp-internals.md)
+11. [Environments, Benchmarks & Data Generation](./environments.md)
+
+## Major subsystems
+
+### Agent loop
+
+The core synchronous orchestration engine is `AIAgent` in `run_agent.py`.
+
+It is responsible for:
+
+- provider/API-mode selection
+- prompt construction
+- tool execution
+- retries and fallback
+- callbacks
+- compression and persistence
+
+See [Agent Loop Internals](./agent-loop.md).
+
+### Prompt system
+
+Prompt-building logic is split between:
+
+- `run_agent.py`
+- `agent/prompt_builder.py`
+- `agent/prompt_caching.py`
+- `agent/context_compressor.py`
+
+See:
+
+- [Prompt Assembly](./prompt-assembly.md)
+- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+
+### Provider/runtime resolution
+
+Hermes has a shared runtime provider resolver used by CLI, gateway, cron, ACP, and auxiliary calls.
+
+See [Provider Runtime Resolution](./provider-runtime.md).
+
+### Tooling runtime
+
+The tool registry, toolsets, terminal backends, process manager, and dispatch rules form a subsystem of their own.
+
+See [Tools Runtime](./tools-runtime.md).
+
+### Session persistence
+
+Historical session state is stored primarily in SQLite, with lineage preserved across compression splits.
+
+See [Session Storage](./session-storage.md).
+
+### Messaging gateway
+
+The gateway is a long-running orchestration layer for platform adapters, session routing, pairing, delivery, and cron ticking.
+
+See [Gateway Internals](./gateway-internals.md).
+
+### ACP integration
+
+ACP exposes Hermes as an editor-native agent over stdio/JSON-RPC.
+
+See:
+
+- [ACP Editor Integration](../user-guide/features/acp.md)
+- [ACP Internals](./acp-internals.md)
+
+### Cron
+
+Cron jobs are implemented as first-class agent tasks, not just shell tasks.
+
+See [Cron Internals](./cron-internals.md).
+
+### RL / environments / trajectories
+
+Hermes ships a full environment framework for evaluation, RL integration, and SFT data generation.
+
+See:
+
+- [Environments, Benchmarks & Data Generation](./environments.md)
+- [Trajectories & Training Format](./trajectory-format.md)
+
+## Design themes
+
+Several cross-cutting design themes appear throughout the codebase:
+
+- prompt stability matters
+- tool execution must be observable and interruptible
+- session persistence must survive long-running use
+- platform frontends should share one agent core
+- optional subsystems should remain loosely coupled where possible
+
+## Implementation notes
+
+The older mental model of Hermes as “one OpenAI-compatible chat loop plus some tools” is no longer sufficient. Current Hermes includes:
+
+- multiple API modes
+- auxiliary model routing
+- ACP editor integration
+- gateway-specific session and delivery semantics
+- RL environment infrastructure
+- prompt-caching and compression logic with lineage-aware persistence
+
+Use this page as the map, then dive into subsystem-specific docs for the real implementation details.
--- a/hermes_code/website/docs/developer-guide/context-compression-and-caching.md
+++ b/hermes_code/website/docs/developer-guide/context-compression-and-caching.md
@ -0,0 +1,72 @@
+---
+sidebar_position: 6
+title: "Context Compression & Prompt Caching"
+description: "How Hermes compresses long conversations and applies provider-side prompt caching"
+---
+
+# Context Compression & Prompt Caching
+
+Hermes manages long conversations with two complementary mechanisms:
+
+- prompt caching
+- context compression
+
+Primary files:
+
+- `agent/prompt_caching.py`
+- `agent/context_compressor.py`
+- `run_agent.py`
+
+## Prompt caching
+
+For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
+
+Current strategy:
+
+- cache the system prompt
+- cache the last 3 non-system messages
+- default TTL is 5 minutes unless explicitly extended
+
+This is implemented in `agent/prompt_caching.py`.
+
+## Why prompt stability matters
+
+Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
+
+## Compression trigger
+
+Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
+
+## Compression algorithm
+
+The compressor protects:
+
+- the first N turns
+- the last N turns
+
+and summarizes the middle section.
+
+It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
+
+## Pre-compression memory flush
+
+Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.
+
+## Session lineage after compression
+
+Compression can split the session into a new session ID while preserving parent lineage in the state DB.
+
+This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
+
+## Re-injected state after compression
+
+After compression, Hermes may re-inject compact operational state such as:
+
+- todo snapshot
+- prior-read-files summary
+
+## Related docs
+
+- [Prompt Assembly](./prompt-assembly.md)
+- [Session Storage](./session-storage.md)
+- [Agent Loop Internals](./agent-loop.md)
--- a/hermes_code/website/docs/developer-guide/contributing.md
+++ b/hermes_code/website/docs/developer-guide/contributing.md
@ -0,0 +1,232 @@
+---
+sidebar_position: 4
+title: "Contributing"
+description: "How to contribute to Hermes Agent — dev setup, code style, PR process"
+---
+
+# Contributing
+
+Thank you for contributing to Hermes Agent! This guide covers setting up your dev environment, understanding the codebase, and getting your PR merged.
+
+## Contribution Priorities
+
+We value contributions in this order:
+
+1. **Bug fixes** — crashes, incorrect behavior, data loss
+2. **Cross-platform compatibility** — macOS, different Linux distros, WSL2
+3. **Security hardening** — shell injection, prompt injection, path traversal
+4. **Performance and robustness** — retry logic, error handling, graceful degradation
+5. **New skills** — broadly useful ones (see [Creating Skills](creating-skills.md))
+6. **New tools** — rarely needed; most capabilities should be skills
+7. **Documentation** — fixes, clarifications, new examples
+
+## Common contribution paths
+
+- Building a new tool? Start with [Adding Tools](./adding-tools.md)
+- Building a new skill? Start with [Creating Skills](./creating-skills.md)
+- Building a new inference provider? Start with [Adding Providers](./adding-providers.md)
+
+## Development Setup
+
+### Prerequisites
+
+| Requirement | Notes |
+|-------------|-------|
+| **Git** | With `--recurse-submodules` support |
+| **Python 3.10+** | uv will install it if missing |
+| **uv** | Fast Python package manager ([install](https://docs.astral.sh/uv/)) |
+| **Node.js 18+** | Optional — needed for browser tools and WhatsApp bridge |
+
+### Clone and Install
+
+```bash
+git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
+cd hermes-agent
+
+# Create venv with Python 3.11
+uv venv venv --python 3.11
+export VIRTUAL_ENV="$(pwd)/venv"
+
+# Install with all extras (messaging, cron, CLI menus, dev tools)
+uv pip install -e ".[all,dev]"
+uv pip install -e "./tinker-atropos"
+
+# Optional: browser tools
+npm install
+```
+
+### Configure for Development
+
+```bash
+mkdir -p ~/.hermes/{cron,sessions,logs,memories,skills}
+cp cli-config.yaml.example ~/.hermes/config.yaml
+touch ~/.hermes/.env
+
+# Add at minimum an LLM provider key:
+echo 'OPENROUTER_API_KEY=sk-or-v1-your-key' >> ~/.hermes/.env
+```
+
+### Run
+
+```bash
+# Symlink for global access
+mkdir -p ~/.local/bin
+ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
+
+# Verify
+hermes doctor
+hermes chat -q "Hello"
+```
+
+### Run Tests
+
+```bash
+pytest tests/ -v
+```
+
+## Code Style
+
+- **PEP 8** with practical exceptions (no strict line length enforcement)
+- **Comments**: Only when explaining non-obvious intent, trade-offs, or API quirks
+- **Error handling**: Catch specific exceptions. Use `logger.warning()`/`logger.error()` with `exc_info=True` for unexpected errors
+- **Cross-platform**: Never assume Unix (see below)
+
+## Cross-Platform Compatibility
+
+Hermes officially supports Linux, macOS, and WSL2. Native Windows is **not supported**, but the codebase includes some defensive coding patterns to avoid hard crashes in edge cases. Key rules:
+
+### 1. `termios` and `fcntl` are Unix-only
+
+Always catch both `ImportError` and `NotImplementedError`:
+
+```python
+try:
+    from simple_term_menu import TerminalMenu
+    menu = TerminalMenu(options)
+    idx = menu.show()
+except (ImportError, NotImplementedError):
+    # Fallback: numbered menu
+    for i, opt in enumerate(options):
+        print(f"  {i+1}. {opt}")
+    idx = int(input("Choice: ")) - 1
+```
+
+### 2. File encoding
+
+Some environments may save `.env` files in non-UTF-8 encodings:
+
+```python
+try:
+    load_dotenv(env_path)
+except UnicodeDecodeError:
+    load_dotenv(env_path, encoding="latin-1")
+```
+
+### 3. Process management
+
+`os.setsid()`, `os.killpg()`, and signal handling differ across platforms:
+
+```python
+import platform
+if platform.system() != "Windows":
+    kwargs["preexec_fn"] = os.setsid
+```
+
+### 4. Path separators
+
+Use `pathlib.Path` instead of string concatenation with `/`.
+
+## Security Considerations
+
+Hermes has terminal access. Security matters.
+
+### Existing Protections
+
+| Layer | Implementation |
+|-------|---------------|
+| **Sudo password piping** | Uses `shlex.quote()` to prevent shell injection |
+| **Dangerous command detection** | Regex patterns in `tools/approval.py` with user approval flow |
+| **Cron prompt injection** | Scanner blocks instruction-override patterns |
+| **Write deny list** | Protected paths resolved via `os.path.realpath()` to prevent symlink bypass |
+| **Skills guard** | Security scanner for hub-installed skills |
+| **Code execution sandbox** | Child process runs with API keys stripped |
+| **Container hardening** | Docker: all capabilities dropped, no privilege escalation, PID limits |
+
+### Contributing Security-Sensitive Code
+
+- Always use `shlex.quote()` when interpolating user input into shell commands
+- Resolve symlinks with `os.path.realpath()` before access control checks
+- Don't log secrets
+- Catch broad exceptions around tool execution
+- Test on all platforms if your change touches file paths or processes
+
+## Pull Request Process
+
+### Branch Naming
+
+```
+fix/description        # Bug fixes
+feat/description       # New features
+docs/description       # Documentation
+test/description       # Tests
+refactor/description   # Code restructuring
+```
+
+### Before Submitting
+
+1. **Run tests**: `pytest tests/ -v`
+2. **Test manually**: Run `hermes` and exercise the code path you changed
+3. **Check cross-platform impact**: Consider macOS and different Linux distros
+4. **Keep PRs focused**: One logical change per PR
+
+### PR Description
+
+Include:
+- **What** changed and **why**
+- **How to test** it
+- **What platforms** you tested on
+- Reference any related issues
+
+### Commit Messages
+
+We use [Conventional Commits](https://www.conventionalcommits.org/):
+
+```
+<type>(<scope>): <description>
+```
+
+| Type | Use for |
+|------|---------|
+| `fix` | Bug fixes |
+| `feat` | New features |
+| `docs` | Documentation |
+| `test` | Tests |
+| `refactor` | Code restructuring |
+| `chore` | Build, CI, dependency updates |
+
+Scopes: `cli`, `gateway`, `tools`, `skills`, `agent`, `install`, `whatsapp`, `security`
+
+Examples:
+```
+fix(cli): prevent crash in save_config_value when model is a string
+feat(gateway): add WhatsApp multi-user session isolation
+fix(security): prevent shell injection in sudo password piping
+```
+
+## Reporting Issues
+
+- Use [GitHub Issues](https://github.com/NousResearch/hermes-agent/issues)
+- Include: OS, Python version, Hermes version (`hermes version`), full error traceback
+- Include steps to reproduce
+- Check existing issues before creating duplicates
+- For security vulnerabilities, please report privately
+
+## Community
+
+- **Discord**: [discord.gg/NousResearch](https://discord.gg/NousResearch)
+- **GitHub Discussions**: For design proposals and architecture discussions
+- **Skills Hub**: Upload specialized skills and share with the community
+
+## License
+
+By contributing, you agree that your contributions will be licensed under the [MIT License](https://github.com/NousResearch/hermes-agent/blob/main/LICENSE).
--- a/hermes_code/website/docs/developer-guide/creating-skills.md
+++ b/hermes_code/website/docs/developer-guide/creating-skills.md
@ -0,0 +1,247 @@
+---
+sidebar_position: 3
+title: "Creating Skills"
+description: "How to create skills for Hermes Agent — SKILL.md format, guidelines, and publishing"
+---
+
+# Creating Skills
+
+Skills are the preferred way to add new capabilities to Hermes Agent. They're easier to create than tools, require no code changes to the agent, and can be shared with the community.
+
+## Should it be a Skill or a Tool?
+
+Make it a **Skill** when:
+- The capability can be expressed as instructions + shell commands + existing tools
+- It wraps an external CLI or API that the agent can call via `terminal` or `web_extract`
+- It doesn't need custom Python integration or API key management baked into the agent
+- Examples: arXiv search, git workflows, Docker management, PDF processing, email via CLI tools
+
+Make it a **Tool** when:
+- It requires end-to-end integration with API keys, auth flows, or multi-component configuration
+- It needs custom processing logic that must execute precisely every time
+- It handles binary data, streaming, or real-time events
+- Examples: browser automation, TTS, vision analysis
+
+## Skill Directory Structure
+
+Bundled skills live in `skills/` organized by category. Official optional skills use the same structure in `optional-skills/`:
+
+```text
+skills/
+├── research/
+│   └── arxiv/
+│       ├── SKILL.md              # Required: main instructions
+│       └── scripts/              # Optional: helper scripts
+│           └── search_arxiv.py
+├── productivity/
+│   └── ocr-and-documents/
+│       ├── SKILL.md
+│       ├── scripts/
+│       └── references/
+└── ...
+```
+
+## SKILL.md Format
+
+```markdown
+---
+name: my-skill
+description: Brief description (shown in skill search results)
+version: 1.0.0
+author: Your Name
+license: MIT
+platforms: [macos, linux]          # Optional — restrict to specific OS platforms
+                                   #   Valid: macos, linux, windows
+                                   #   Omit to load on all platforms (default)
+metadata:
+  hermes:
+    tags: [Category, Subcategory, Keywords]
+    related_skills: [other-skill-name]
+    requires_toolsets: [web]            # Optional — only show when these toolsets are active
+    requires_tools: [web_search]        # Optional — only show when these tools are available
+    fallback_for_toolsets: [browser]    # Optional — hide when these toolsets are active
+    fallback_for_tools: [browser_navigate]  # Optional — hide when these tools exist
+required_environment_variables:          # Optional — env vars the skill needs
+  - name: MY_API_KEY
+    prompt: "Enter your API key"
+    help: "Get one at https://example.com"
+    required_for: "API access"
+---
+
+# Skill Title
+
+Brief intro.
+
+## When to Use
+Trigger conditions — when should the agent load this skill?
+
+## Quick Reference
+Table of common commands or API calls.
+
+## Procedure
+Step-by-step instructions the agent follows.
+
+## Pitfalls
+Known failure modes and how to handle them.
+
+## Verification
+How the agent confirms it worked.
+```
+
+### Platform-Specific Skills
+
+Skills can restrict themselves to specific operating systems using the `platforms` field:
+
+```yaml
+platforms: [macos]            # macOS only (e.g., iMessage, Apple Reminders)
+platforms: [macos, linux]     # macOS and Linux
+platforms: [windows]          # Windows only
+```
+
+When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted or empty, the skill loads on all platforms (backward compatible).
+
+### Conditional Skill Activation
+
+Skills can declare dependencies on specific tools or toolsets. This controls whether the skill appears in the system prompt for a given session.
+
+```yaml
+metadata:
+  hermes:
+    requires_toolsets: [web]           # Hide if the web toolset is NOT active
+    requires_tools: [web_search]       # Hide if web_search tool is NOT available
+    fallback_for_toolsets: [browser]   # Hide if the browser toolset IS active
+    fallback_for_tools: [browser_navigate]  # Hide if browser_navigate IS available
+```
+
+| Field | Behavior |
+|-------|----------|
+| `requires_toolsets` | Skill is **hidden** when ANY listed toolset is **not** available |
+| `requires_tools` | Skill is **hidden** when ANY listed tool is **not** available |
+| `fallback_for_toolsets` | Skill is **hidden** when ANY listed toolset **is** available |
+| `fallback_for_tools` | Skill is **hidden** when ANY listed tool **is** available |
+
+**Use case for `fallback_for_*`:** Create a skill that serves as a workaround when a primary tool isn't available. For example, a `duckduckgo-search` skill with `fallback_for_tools: [web_search]` only shows when the web search tool (which requires an API key) is not configured.
+
+**Use case for `requires_*`:** Create a skill that only makes sense when certain tools are present. For example, a web scraping workflow skill with `requires_toolsets: [web]` won't clutter the prompt when web tools are disabled.
+
+### Environment Variable Requirements
+
+Skills can declare environment variables they need. When a skill is loaded via `skill_view`, its required vars are automatically registered for passthrough into sandboxed execution environments (terminal, execute_code).
+
+```yaml
+required_environment_variables:
+  - name: TENOR_API_KEY
+    prompt: "Tenor API key"               # Shown when prompting user
+    help: "Get your key at https://tenor.com"  # Help text or URL
+    required_for: "GIF search functionality"   # What needs this var
+```
+
+Each entry supports:
+- `name` (required) — the environment variable name
+- `prompt` (optional) — prompt text when asking the user for the value
+- `help` (optional) — help text or URL for obtaining the value
+- `required_for` (optional) — describes which feature needs this variable
+
+Users can also manually configure passthrough variables in `config.yaml`:
+
+```yaml
+terminal:
+  env_passthrough:
+    - MY_CUSTOM_VAR
+    - ANOTHER_VAR
+```
+
+See `skills/apple/` for examples of macOS-only skills.
+
+## Secure Setup on Load
+
+Use `required_environment_variables` when a skill needs an API key or token. Missing values do **not** hide the skill from discovery. Instead, Hermes prompts for them securely when the skill is loaded in the local CLI.
+
+```yaml
+required_environment_variables:
+  - name: TENOR_API_KEY
+    prompt: Tenor API key
+    help: Get a key from https://developers.google.com/tenor
+    required_for: full functionality
+```
+
+The user can skip setup and keep loading the skill. Hermes never exposes the raw secret value to the model. Gateway and messaging sessions show local setup guidance instead of collecting secrets in-band.
+
+:::tip Sandbox Passthrough
+When your skill is loaded, any declared `required_environment_variables` that are set are **automatically passed through** to `execute_code` and `terminal` sandboxes. Your skill's scripts can access `$TENOR_API_KEY` (or `os.environ["TENOR_API_KEY"]` in Python) without the user needing to configure anything extra. See [Environment Variable Passthrough](/docs/user-guide/security#environment-variable-passthrough) for details.
+:::
+
+Legacy `prerequisites.env_vars` remains supported as a backward-compatible alias.
+
+## Skill Guidelines
+
+### No External Dependencies
+
+Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`). If a dependency is needed, document installation steps in the skill.
+
+### Progressive Disclosure
+
+Put the most common workflow first. Edge cases and advanced usage go at the bottom. This keeps token usage low for common tasks.
+
+### Include Helper Scripts
+
+For XML/JSON parsing or complex logic, include helper scripts in `scripts/` — don't expect the LLM to write parsers inline every time.
+
+### Test It
+
+Run the skill and verify the agent follows the instructions correctly:
+
+```bash
+hermes chat --toolsets skills -q "Use the X skill to do Y"
+```
+
+## Where Should the Skill Live?
+
+Bundled skills (in `skills/`) ship with every Hermes install. They should be **broadly useful to most users**:
+
+- Document handling, web research, common dev workflows, system administration
+- Used regularly by a wide range of people
+
+If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo, is discoverable via `hermes skills browse` (labeled "official"), and installs with builtin trust.
+
+If your skill is specialized, community-contributed, or niche, it's better suited for a **Skills Hub** — upload it to a registry and share it via `hermes skills install`.
+
+## Publishing Skills
+
+### To the Skills Hub
+
+```bash
+hermes skills publish skills/my-skill --to github --repo owner/repo
+```
+
+### To a Custom Repository
+
+Add your repo as a tap:
+
+```bash
+hermes skills tap add owner/repo
+```
+
+Users can then search and install from your repository.
+
+## Security Scanning
+
+All hub-installed skills go through a security scanner that checks for:
+
+- Data exfiltration patterns
+- Prompt injection attempts
+- Destructive commands
+- Shell injection
+
+Trust levels:
+- `builtin` — ships with Hermes (always trusted)
+- `official` — from `optional-skills/` in the repo (builtin trust, no third-party warning)
+- `trusted` — from openai/skills, anthropics/skills
+- `community` — non-dangerous findings can be overridden with `--force`; `dangerous` verdicts remain blocked
+
+Hermes can now consume third-party skills from multiple external discovery models:
+- direct GitHub identifiers (for example `openai/skills/k8s`)
+- `skills.sh` identifiers (for example `skills-sh/vercel-labs/json-render/json-render-react`)
+- well-known endpoints served from `/.well-known/skills/index.json`
+
+If you want your skills to be discoverable without a GitHub-specific installer, consider serving them from a well-known endpoint in addition to publishing them in a repo or marketplace.
--- a/hermes_code/website/docs/developer-guide/cron-internals.md
+++ b/hermes_code/website/docs/developer-guide/cron-internals.md
@ -0,0 +1,90 @@
+---
+sidebar_position: 11
+title: "Cron Internals"
+description: "How Hermes stores, schedules, edits, pauses, skill-loads, and delivers cron jobs"
+---
+
+# Cron Internals
+
+Hermes cron support is implemented primarily in:
+
+- `cron/jobs.py`
+- `cron/scheduler.py`
+- `tools/cronjob_tools.py`
+- `gateway/run.py`
+- `hermes_cli/cron.py`
+
+## Scheduling model
+
+Hermes supports:
+
+- one-shot delays
+- intervals
+- cron expressions
+- explicit timestamps
+
+The model-facing surface is a single `cronjob` tool with action-style operations:
+
+- `create`
+- `list`
+- `update`
+- `pause`
+- `resume`
+- `run`
+- `remove`
+
+## Job storage
+
+Cron jobs are stored in Hermes-managed local state (`~/.hermes/cron/jobs.json`) with atomic write semantics.
+
+Each job can carry:
+
+- prompt
+- schedule metadata
+- repeat counters
+- delivery target
+- lifecycle state (`scheduled`, `paused`, `completed`, etc.)
+- zero, one, or multiple attached skills
+
+Backward compatibility is preserved for older jobs that only stored a legacy single `skill` field or none of the newer lifecycle fields.
+
+## Runtime behavior
+
+The scheduler:
+
+- loads jobs
+- computes due work
+- executes jobs in fresh agent sessions
+- optionally injects one or more skills before the prompt
+- handles repeat counters
+- updates next-run metadata and state
+
+In gateway mode, cron ticking is integrated into the long-running gateway loop.
+
+## Skill-backed jobs
+
+A cron job may attach multiple skills. At runtime, Hermes loads those skills in order and then appends the job prompt as the task instruction.
+
+This gives scheduled jobs reusable guidance without requiring the user to paste full skill bodies into the cron prompt.
+
+## Recursion guard
+
+Cron-run sessions disable the `cronjob` toolset. This prevents a scheduled job from recursively creating or mutating more cron jobs and accidentally exploding token usage or scheduler load.
+
+## Delivery model
+
+Cron jobs can deliver to:
+
+- origin chat
+- local files
+- platform home channels
+- explicit platform/chat IDs
+
+## Locking
+
+Hermes uses lock-based protections so overlapping scheduler ticks do not execute the same due-job batch twice.
+
+## Related docs
+
+- [Cron feature guide](../user-guide/features/cron.md)
+- [Gateway Internals](./gateway-internals.md)
--- a/hermes_code/website/docs/developer-guide/environments.md
+++ b/hermes_code/website/docs/developer-guide/environments.md
@ -0,0 +1,520 @@
+---
+sidebar_position: 5
+title: "Environments, Benchmarks & Data Generation"
+description: "Building RL training environments, running evaluation benchmarks, and generating SFT data with the Hermes-Agent Atropos integration"
+---
+
+# Environments, Benchmarks & Data Generation
+
+Hermes Agent includes a full environment framework that connects its tool-calling capabilities to the [Atropos](https://github.com/NousResearch/atropos) RL training framework. This enables three workflows:
+
+1. **RL Training** — Train language models on multi-turn agentic tasks with GRPO
+2. **Benchmarks** — Evaluate models on standardised agentic benchmarks
+3. **Data Generation** — Generate SFT training data from agent rollouts
+
+All three share the same core: an **environment** class that defines tasks, runs an agent loop, and scores the output.
+
+:::info Repo environments vs RL training tools
+The Python environment framework documented here lives under the repo's `environments/` directory and is the implementation-level API for Hermes/Atropos integration. This is separate from the user-facing `rl_*` tools, which operate as an orchestration surface for remote RL training workflows.
+:::
+
+:::tip Quick Links
+- **Want to run benchmarks?** Jump to [Available Benchmarks](#available-benchmarks)
+- **Want to train with RL?** See [RL Training Tools](/user-guide/features/rl-training) for the agent-driven interface, or [Running Environments](#running-environments) for manual execution
+- **Want to create a new environment?** See [Creating Environments](#creating-environments)
+:::
+
+## Architecture
+
+The environment system is built on a three-layer inheritance chain:
+
+```mermaid
+classDiagram
+    class BaseEnv {
+      Server management
+      Worker scheduling
+      Wandb logging
+      CLI: serve / process / evaluate
+    }
+
+    class HermesAgentBaseEnv {
+      Terminal backend configuration
+      Tool resolution
+      Agent loop engine
+      ToolContext access
+    }
+
+    class TerminalTestEnv {
+      Stack testing
+    }
+
+    class HermesSweEnv {
+      SWE training
+    }
+
+    class TerminalBench2EvalEnv {
+      Benchmark evaluation
+    }
+
+    class TBLiteEvalEnv {
+      Fast benchmark
+    }
+
+    class YCBenchEvalEnv {
+      Long-horizon benchmark
+    }
+
+    BaseEnv <|-- HermesAgentBaseEnv
+    HermesAgentBaseEnv <|-- TerminalTestEnv
+    HermesAgentBaseEnv <|-- HermesSweEnv
+    HermesAgentBaseEnv <|-- TerminalBench2EvalEnv
+    TerminalBench2EvalEnv <|-- TBLiteEvalEnv
+    TerminalBench2EvalEnv <|-- YCBenchEvalEnv
+```
+
+### BaseEnv (Atropos)
+
+The foundation from `atroposlib`. Provides:
+- **Server management** — connects to OpenAI-compatible APIs (VLLM, SGLang, OpenRouter)
+- **Worker scheduling** — parallel rollout coordination
+- **Wandb integration** — metrics logging and rollout visualisation
+- **CLI interface** — three subcommands: `serve`, `process`, `evaluate`
+- **Eval logging** — `evaluate_log()` saves results to JSON + JSONL
+
+### HermesAgentBaseEnv
+
+The hermes-agent layer (`environments/hermes_base_env.py`). Adds:
+- **Terminal backend configuration** — sets `TERMINAL_ENV` for sandboxed execution (local, Docker, Modal, Daytona, SSH, Singularity)
+- **Tool resolution** — `_resolve_tools_for_group()` calls hermes-agent's `get_tool_definitions()` to get the right tool schemas based on enabled/disabled toolsets
+- **Agent loop integration** — `collect_trajectory()` runs `HermesAgentLoop` and scores the result
+- **Two-phase operation** — Phase 1 (OpenAI server) for eval/SFT, Phase 2 (VLLM ManagedServer) for full RL with logprobs
+- **Async safety patches** — monkey-patches Modal backend to work inside Atropos's event loop
+
+### Concrete Environments
+
+Your environment inherits from `HermesAgentBaseEnv` and implements five methods:
+
+| Method | Purpose |
+|--------|---------|
+| `setup()` | Load dataset, initialise state |
+| `get_next_item()` | Return the next item for rollout |
+| `format_prompt(item)` | Convert an item into the user message |
+| `compute_reward(item, result, ctx)` | Score the rollout (0.0–1.0) |
+| `evaluate()` | Periodic evaluation logic |
+
+## Core Components
+
+### Agent Loop
+
+`HermesAgentLoop` (`environments/agent_loop.py`) is the reusable multi-turn agent engine. It runs the same tool-calling pattern as hermes-agent's main loop:
+
+1. Send messages + tool schemas to the API via `server.chat_completion()`
+2. If the response contains `tool_calls`, dispatch each via `handle_function_call()`
+3. Append tool results to the conversation, go back to step 1
+4. If no `tool_calls`, the agent is done
+
+Tool calls execute in a thread pool (`ThreadPoolExecutor(128)`) so that async backends (Modal, Docker) don't deadlock inside Atropos's event loop.
+
+Returns an `AgentResult`:
+
+```python
+@dataclass
+class AgentResult:
+    messages: List[Dict[str, Any]]       # Full conversation history
+    turns_used: int                       # Number of LLM calls made
+    finished_naturally: bool              # True if model stopped on its own
+    reasoning_per_turn: List[Optional[str]]  # Extracted reasoning content
+    tool_errors: List[ToolError]          # Errors encountered during tool dispatch
+    managed_state: Optional[Dict]         # VLLM ManagedServer state (Phase 2)
+```
+
+### Tool Context
+
+`ToolContext` (`environments/tool_context.py`) gives reward functions direct access to the **same sandbox** the model used during its rollout. The `task_id` scoping means all state (files, processes, browser tabs) is preserved.
+
+```python
+async def compute_reward(self, item, result, ctx: ToolContext):
+    # Run tests in the model's terminal sandbox
+    test = ctx.terminal("pytest -v")
+    if test["exit_code"] == 0:
+        return 1.0
+
+    # Check if a file was created
+    content = ctx.read_file("/workspace/solution.py")
+    if content.get("content"):
+        return 0.5
+
+    # Download files for local verification
+    ctx.download_file("/remote/output.bin", "/local/output.bin")
+    return 0.0
+```
+
+Available methods:
+
+| Category | Methods |
+|----------|---------|
+| **Terminal** | `terminal(command, timeout)` |
+| **Files** | `read_file(path)`, `write_file(path, content)`, `search(query, path)` |
+| **Transfers** | `upload_file()`, `upload_dir()`, `download_file()`, `download_dir()` |
+| **Web** | `web_search(query)`, `web_extract(urls)` |
+| **Browser** | `browser_navigate(url)`, `browser_snapshot()` |
+| **Generic** | `call_tool(name, args)` — escape hatch for any hermes-agent tool |
+| **Cleanup** | `cleanup()` — release all resources |
+
+### Tool Call Parsers
+
+For **Phase 2** (VLLM ManagedServer), the server returns raw text without structured tool calls. Client-side parsers in `environments/tool_call_parsers/` extract `tool_calls` from raw output:
+
+```python
+from environments.tool_call_parsers import get_parser
+
+parser = get_parser("hermes")  # or "mistral", "llama3_json", "qwen", "deepseek_v3", etc.
+content, tool_calls = parser.parse(raw_model_output)
+```
+
+Available parsers: `hermes`, `mistral`, `llama3_json`, `qwen`, `qwen3_coder`, `deepseek_v3`, `deepseek_v3_1`, `kimi_k2`, `longcat`, `glm45`, `glm47`.
+
+In Phase 1 (OpenAI server type), parsers are not needed — the server handles tool call parsing natively.
+
+## Available Benchmarks
+
+### TerminalBench2
+
+**89 challenging terminal tasks** with per-task Docker sandbox environments.
+
+| | |
+|---|---|
+| **What it tests** | Single-task coding/sysadmin ability |
+| **Scoring** | Binary pass/fail (test suite verification) |
+| **Sandbox** | Modal cloud sandboxes (per-task Docker images) |
+| **Tools** | `terminal` + `file` |
+| **Tasks** | 89 tasks across multiple categories |
+| **Cost** | ~$50–200 for full eval (parallel execution) |
+| **Time** | ~2–4 hours |
+
+```bash
+python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+    --config environments/benchmarks/terminalbench_2/default.yaml
+
+# Run specific tasks
+python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
+    --config environments/benchmarks/terminalbench_2/default.yaml \
+    --env.task_filter fix-git,git-multibranch
+```
+
+Dataset: [NousResearch/terminal-bench-2](https://huggingface.co/datasets/NousResearch/terminal-bench-2) on HuggingFace.
+
+### TBLite (OpenThoughts Terminal Bench Lite)
+
+**100 difficulty-calibrated tasks** — a faster proxy for TerminalBench2.
+
+| | |
+|---|---|
+| **What it tests** | Same as TB2 (coding/sysadmin), calibrated difficulty tiers |
+| **Scoring** | Binary pass/fail |
+| **Sandbox** | Modal cloud sandboxes |
+| **Tools** | `terminal` + `file` |
+| **Tasks** | 100 tasks: Easy (40), Medium (26), Hard (26), Extreme (8) |
+| **Correlation** | r=0.911 with full TB2 |
+| **Speed** | 2.6–8× faster than TB2 |
+
+```bash
+python environments/benchmarks/tblite/tblite_env.py evaluate \
+    --config environments/benchmarks/tblite/default.yaml
+```
+
+TBLite is a thin subclass of TerminalBench2 — only the dataset and timeouts differ. Created by the OpenThoughts Agent team (Snorkel AI + Bespoke Labs). Dataset: [NousResearch/openthoughts-tblite](https://huggingface.co/datasets/NousResearch/openthoughts-tblite).
+
+### YC-Bench
+
+**Long-horizon strategic benchmark** — the agent plays CEO of an AI startup.
+
+| | |
+|---|---|
+| **What it tests** | Multi-turn strategic coherence over hundreds of turns |
+| **Scoring** | Composite: `0.5 × survival + 0.5 × normalised_funds` |
+| **Sandbox** | Local terminal (no Modal needed) |
+| **Tools** | `terminal` only |
+| **Runs** | 9 default (3 presets × 3 seeds), sequential |
+| **Cost** | ~$50–200 for full eval |
+| **Time** | ~3–6 hours |
+
+```bash
+# Install yc-bench (optional dependency)
+pip install "hermes-agent[yc-bench]"
+
+# Run evaluation
+bash environments/benchmarks/yc_bench/run_eval.sh
+
+# Or directly
+python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
+    --config environments/benchmarks/yc_bench/default.yaml
+
+# Quick single-preset test
+python environments/benchmarks/yc_bench/yc_bench_env.py evaluate \
+    --config environments/benchmarks/yc_bench/default.yaml \
+    --env.presets '["fast_test"]' --env.seeds '[1]'
+```
+
+YC-Bench uses [collinear-ai/yc-bench](https://github.com/collinear-ai/yc-bench) — a deterministic simulation with 4 skill domains (research, inference, data_environment, training), prestige system, employee management, and financial pressure. Unlike TB2's per-task binary scoring, YC-Bench measures whether an agent can maintain coherent strategy over hundreds of compounding decisions.
+
+## Training Environments
+
+### TerminalTestEnv
+
+A minimal self-contained environment with inline tasks (no external dataset). Used for **validating the full stack** end-to-end. Each task asks the model to create a file at a known path; the verifier checks the content.
+
+```bash
+# Process mode (saves rollouts to JSONL, no training server needed)
+python environments/terminal_test_env/terminal_test_env.py process \
+    --env.data_path_to_save_groups terminal_test_output.jsonl
+
+# Serve mode (connects to Atropos API for RL training)
+python environments/terminal_test_env/terminal_test_env.py serve
+```
+
+### HermesSweEnv
+
+SWE-bench style training environment. The model gets a coding task, uses terminal + file + web tools to solve it, and the reward function runs tests in the same Modal sandbox.
+
+```bash
+python environments/hermes_swe_env/hermes_swe_env.py serve \
+    --openai.model_name YourModel \
+    --env.dataset_name bigcode/humanevalpack \
+    --env.terminal_backend modal
+```
+
+## Running Environments
+
+Every environment is a standalone Python script with three CLI subcommands:
+
+### `evaluate` — Run a benchmark
+
+For eval-only environments (benchmarks). Runs all items, computes metrics, logs to wandb.
+
+```bash
+python environments/benchmarks/tblite/tblite_env.py evaluate \
+    --config environments/benchmarks/tblite/default.yaml \
+    --openai.model_name anthropic/claude-sonnet-4.6
+```
+
+No training server or `run-api` needed. The environment handles everything.
+
+### `process` — Generate SFT data
+
+Runs rollouts and saves scored trajectories to JSONL. Useful for generating training data without a full RL loop.
+
+```bash
+python environments/terminal_test_env/terminal_test_env.py process \
+    --env.data_path_to_save_groups output.jsonl \
+    --openai.model_name anthropic/claude-sonnet-4.6
+```
+
+Output format: each line is a scored trajectory with the full conversation history, reward, and metadata.
+
+### `serve` — Connect to Atropos for RL training
+
+Connects the environment to a running Atropos API server (`run-api`). Used during live RL training.
+
+```bash
+# Terminal 1: Start the Atropos API
+run-api
+
+# Terminal 2: Start the environment
+python environments/hermes_swe_env/hermes_swe_env.py serve \
+    --openai.model_name YourModel
+```
+
+The environment receives items from Atropos, runs agent rollouts, computes rewards, and sends scored trajectories back for training.
+
+## Two-Phase Operation
+
+### Phase 1: OpenAI Server (Eval / SFT)
+
+Uses `server.chat_completion()` with `tools=` parameter. The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing natively. Returns `ChatCompletion` objects with structured `tool_calls`.
+
+- **Use for**: evaluation, SFT data generation, benchmarks, testing
+- **Placeholder tokens** are created for the Atropos pipeline (since real token IDs aren't available from the OpenAI API)
+
+### Phase 2: VLLM ManagedServer (Full RL)
+
+Uses ManagedServer for exact token IDs + logprobs via `/generate`. A client-side [tool call parser](#tool-call-parsers) reconstructs structured `tool_calls` from raw output.
+
+- **Use for**: full RL training with GRPO/PPO
+- **Real tokens**, masks, and logprobs flow through the pipeline
+- Set `tool_call_parser` in config to match your model's format (e.g., `"hermes"`, `"qwen"`, `"mistral"`)
+
+## Creating Environments
+
+### Training Environment
+
+```python
+from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
+from atroposlib.envs.server_handling.server_manager import APIServerConfig
+
+class MyEnvConfig(HermesAgentEnvConfig):
+    my_custom_field: str = "default_value"
+
+class MyEnv(HermesAgentBaseEnv):
+    name = "my-env"
+    env_config_cls = MyEnvConfig
+
+    @classmethod
+    def config_init(cls):
+        env_config = MyEnvConfig(
+            enabled_toolsets=["terminal", "file"],
+            terminal_backend="modal",
+            max_agent_turns=30,
+        )
+        server_configs = [APIServerConfig(
+            base_url="https://openrouter.ai/api/v1",
+            model_name="anthropic/claude-sonnet-4.6",
+            server_type="openai",
+        )]
+        return env_config, server_configs
+
+    async def setup(self):
+        from datasets import load_dataset
+        self.dataset = list(load_dataset("my-dataset", split="train"))
+        self.iter = 0
+
+    async def get_next_item(self):
+        item = self.dataset[self.iter % len(self.dataset)]
+        self.iter += 1
+        return item
+
+    def format_prompt(self, item):
+        return item["instruction"]
+
+    async def compute_reward(self, item, result, ctx):
+        # ctx gives full tool access to the rollout's sandbox
+        test = ctx.terminal("pytest -v")
+        return 1.0 if test["exit_code"] == 0 else 0.0
+
+    async def evaluate(self, *args, **kwargs):
+        # Periodic evaluation during training
+        pass
+
+if __name__ == "__main__":
+    MyEnv.cli()
+```
+
+### Eval-Only Benchmark
+
+For benchmarks, follow the pattern used by TerminalBench2, TBLite, and YC-Bench:
+
+1. **Create under** `environments/benchmarks/your-benchmark/`
+2. **Set eval-only config**: `eval_handling=STOP_TRAIN`, `steps_per_eval=1`, `total_steps=1`
+3. **Stub training methods**: `collect_trajectories()` returns `(None, [])`, `score()` returns `None`
+4. **Implement** `rollout_and_score_eval(eval_item)` — the per-item agent loop + scoring
+5. **Implement** `evaluate()` — orchestrates all runs, computes aggregate metrics
+6. **Add streaming JSONL** for crash-safe result persistence
+7. **Add cleanup**: `KeyboardInterrupt` handling, `cleanup_all_environments()`, `_tool_executor.shutdown()`
+8. **Run with** `evaluate` subcommand
+
+See `environments/benchmarks/yc_bench/yc_bench_env.py` for a clean, well-documented reference implementation.
+
+## Configuration Reference
+
+### HermesAgentEnvConfig Fields
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `enabled_toolsets` | `List[str]` | `None` (all) | Which hermes toolsets to enable |
+| `disabled_toolsets` | `List[str]` | `None` | Toolsets to filter out |
+| `distribution` | `str` | `None` | Probabilistic toolset distribution name |
+| `max_agent_turns` | `int` | `30` | Max LLM calls per rollout |
+| `agent_temperature` | `float` | `1.0` | Sampling temperature |
+| `system_prompt` | `str` | `None` | System message for the agent |
+| `terminal_backend` | `str` | `"local"` | `local`, `docker`, `modal`, `daytona`, `ssh`, `singularity` |
+| `terminal_timeout` | `int` | `120` | Seconds per terminal command |
+| `terminal_lifetime` | `int` | `3600` | Max sandbox lifetime |
+| `dataset_name` | `str` | `None` | HuggingFace dataset identifier |
+| `tool_pool_size` | `int` | `128` | Thread pool size for tool execution |
+| `tool_call_parser` | `str` | `"hermes"` | Parser for Phase 2 raw output |
+| `extra_body` | `Dict` | `None` | Extra params for OpenAI API (e.g., OpenRouter provider prefs) |
+| `eval_handling` | `Enum` | `STOP_TRAIN` | `STOP_TRAIN`, `LIMIT_TRAIN`, `NONE` |
+
+### YAML Configuration
+
+Environments can be configured via YAML files passed with `--config`:
+
+```yaml
+env:
+  enabled_toolsets: ["terminal", "file"]
+  max_agent_turns: 60
+  max_token_length: 32000
+  agent_temperature: 0.8
+  terminal_backend: "modal"
+  terminal_timeout: 300
+  dataset_name: "NousResearch/terminal-bench-2"
+  tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
+  use_wandb: true
+  wandb_name: "my-benchmark"
+
+openai:
+  base_url: "https://openrouter.ai/api/v1"
+  model_name: "anthropic/claude-sonnet-4.6"
+  server_type: "openai"
+  health_check: false
+```
+
+YAML values override `config_init()` defaults. CLI arguments override YAML values:
+
+```bash
+python my_env.py evaluate \
+    --config my_config.yaml \
+    --openai.model_name anthropic/claude-opus-4.6  # overrides YAML
+```
+
+## Prerequisites
+
+### For all environments
+
+- Python >= 3.11
+- `atroposlib`: `pip install git+https://github.com/NousResearch/atropos.git`
+- An LLM API key (OpenRouter, OpenAI, or self-hosted VLLM/SGLang)
+
+### For Modal-sandboxed benchmarks (TB2, TBLite)
+
+- [Modal](https://modal.com) account and CLI: `pip install "hermes-agent[modal]"`
+- `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` environment variables
+
+### For YC-Bench
+
+- `pip install "hermes-agent[yc-bench]"` (installs the yc-bench CLI + SQLAlchemy)
+- No Modal needed — runs with local terminal backend
+
+### For RL training
+
+- `TINKER_API_KEY` — API key for the [Tinker](https://tinker.computer) training service
+- `WANDB_API_KEY` — for Weights & Biases metrics tracking
+- The `tinker-atropos` submodule (at `tinker-atropos/` in the repo)
+
+See [RL Training](/user-guide/features/rl-training) for the agent-driven RL workflow.
+
+## Directory Structure
+
+```
+environments/
+├── hermes_base_env.py          # Abstract base class (HermesAgentBaseEnv)
+├── agent_loop.py               # Multi-turn agent engine (HermesAgentLoop)
+├── tool_context.py             # Per-rollout tool access for reward functions
+├── patches.py                  # Async-safety patches for Modal backend
+│
+├── tool_call_parsers/          # Phase 2 client-side parsers
+│   ├── hermes_parser.py        # Hermes/ChatML <tool_call> format
+│   ├── mistral_parser.py       # Mistral [TOOL_CALLS] format
+│   ├── llama_parser.py         # Llama 3 JSON tool calling
+│   ├── qwen_parser.py          # Qwen format
+│   ├── deepseek_v3_parser.py   # DeepSeek V3 format
+│   └── ...                     # + kimi_k2, longcat, glm45/47, etc.
+│
+├── terminal_test_env/          # Stack validation (inline tasks)
+├── hermes_swe_env/             # SWE-bench training environment
+│
+└── benchmarks/                 # Evaluation benchmarks
+    ├── terminalbench_2/        # 89 terminal tasks, Modal sandboxes
+    ├── tblite/                 # 100 calibrated tasks (fast TB2 proxy)
+    └── yc_bench/               # Long-horizon strategic benchmark
+```
--- a/hermes_code/website/docs/developer-guide/extending-the-cli.md
+++ b/hermes_code/website/docs/developer-guide/extending-the-cli.md
@ -0,0 +1,190 @@
+---
+sidebar_position: 8
+title: "Extending the CLI"
+description: "Build wrapper CLIs that extend the Hermes TUI with custom widgets, keybindings, and layout changes"
+---
+
+# Extending the CLI
+
+Hermes exposes protected extension hooks on `HermesCLI` so wrapper CLIs can add widgets, keybindings, and layout customizations without overriding the 1000+ line `run()` method. This keeps your extension decoupled from internal changes.
+
+## Extension points
+
+There are five extension seams available:
+
+| Hook | Purpose | Override when... |
+|------|---------|------------------|
+| `_get_extra_tui_widgets()` | Inject widgets into the layout | You need a persistent UI element (panel, status line, mini-player) |
+| `_register_extra_tui_keybindings(kb, *, input_area)` | Add keyboard shortcuts | You need hotkeys (toggle panels, transport controls, modal shortcuts) |
+| `_build_tui_layout_children(**widgets)` | Full control over widget ordering | You need to reorder or wrap existing widgets (rare) |
+| `process_command()` | Add custom slash commands | You need `/mycommand` handling (pre-existing hook) |
+| `_build_tui_style_dict()` | Custom prompt_toolkit styles | You need custom colors or styling (pre-existing hook) |
+
+The first three are new protected hooks. The last two already existed.
+
+## Quick start: a wrapper CLI
+
+```python
+#!/usr/bin/env python3
+"""my_cli.py — Example wrapper CLI that extends Hermes."""
+
+from cli import HermesCLI
+from prompt_toolkit.layout import FormattedTextControl, Window
+from prompt_toolkit.filters import Condition
+
+
+class MyCLI(HermesCLI):
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self._panel_visible = False
+
+    def _get_extra_tui_widgets(self):
+        """Add a toggleable info panel above the status bar."""
+        cli_ref = self
+        return [
+            Window(
+                FormattedTextControl(lambda: "📊 My custom panel content"),
+                height=1,
+                filter=Condition(lambda: cli_ref._panel_visible),
+            ),
+        ]
+
+    def _register_extra_tui_keybindings(self, kb, *, input_area):
+        """F2 toggles the custom panel."""
+        cli_ref = self
+
+        @kb.add("f2")
+        def _toggle_panel(event):
+            cli_ref._panel_visible = not cli_ref._panel_visible
+
+    def process_command(self, cmd: str) -> bool:
+        """Add a /panel slash command."""
+        if cmd.strip().lower() == "/panel":
+            self._panel_visible = not self._panel_visible
+            state = "visible" if self._panel_visible else "hidden"
+            print(f"Panel is now {state}")
+            return True
+        return super().process_command(cmd)
+
+
+if __name__ == "__main__":
+    cli = MyCLI()
+    cli.run()
+```
+
+Run it:
+
+```bash
+cd ~/.hermes/hermes-agent
+source .venv/bin/activate
+python my_cli.py
+```
+
+## Hook reference
+
+### `_get_extra_tui_widgets()`
+
+Returns a list of prompt_toolkit widgets to insert into the TUI layout. Widgets appear **between the spacer and the status bar** — above the input area but below the main output.
+
+```python
+def _get_extra_tui_widgets(self) -> list:
+    return []  # default: no extra widgets
+```
+
+Each widget should be a prompt_toolkit container (e.g., `Window`, `ConditionalContainer`, `HSplit`). Use `ConditionalContainer` or `filter=Condition(...)` to make widgets toggleable.
+
+```python
+from prompt_toolkit.layout import ConditionalContainer, Window, FormattedTextControl
+from prompt_toolkit.filters import Condition
+
+def _get_extra_tui_widgets(self):
+    return [
+        ConditionalContainer(
+            Window(FormattedTextControl("Status: connected"), height=1),
+            filter=Condition(lambda: self._show_status),
+        ),
+    ]
+```
+
+### `_register_extra_tui_keybindings(kb, *, input_area)`
+
+Called after Hermes registers its own keybindings and before the layout is built. Add your keybindings to `kb`.
+
+```python
+def _register_extra_tui_keybindings(self, kb, *, input_area):
+    pass  # default: no extra keybindings
+```
+
+Parameters:
+- **`kb`** — The `KeyBindings` instance for the prompt_toolkit application
+- **`input_area`** — The main `TextArea` widget, if you need to read or manipulate user input
+
+```python
+def _register_extra_tui_keybindings(self, kb, *, input_area):
+    cli_ref = self
+
+    @kb.add("f3")
+    def _clear_input(event):
+        input_area.text = ""
+
+    @kb.add("f4")
+    def _insert_template(event):
+        input_area.text = "/search "
+```
+
+**Avoid conflicts** with built-in keybindings: `Enter` (submit), `Escape Enter` (newline), `Ctrl-C` (interrupt), `Ctrl-D` (exit), `Tab` (auto-suggest accept). Function keys F2+ and Ctrl-combinations are generally safe.
+
+### `_build_tui_layout_children(**widgets)`
+
+Override this only when you need full control over widget ordering. Most extensions should use `_get_extra_tui_widgets()` instead.
+
+```python
+def _build_tui_layout_children(self, *, sudo_widget, secret_widget,
+    approval_widget, clarify_widget, spinner_widget, spacer,
+    status_bar, input_rule_top, image_bar, input_area,
+    input_rule_bot, voice_status_bar, completions_menu) -> list:
+```
+
+The default implementation returns:
+
+```python
+[
+    Window(height=0),       # anchor
+    sudo_widget,            # sudo password prompt (conditional)
+    secret_widget,          # secret input prompt (conditional)
+    approval_widget,        # dangerous command approval (conditional)
+    clarify_widget,         # clarify question UI (conditional)
+    spinner_widget,         # thinking spinner (conditional)
+    spacer,                 # fills remaining vertical space
+    *self._get_extra_tui_widgets(),  # YOUR WIDGETS GO HERE
+    status_bar,             # model/token/context status line
+    input_rule_top,         # ─── border above input
+    image_bar,              # attached images indicator
+    input_area,             # user text input
+    input_rule_bot,         # ─── border below input
+    voice_status_bar,       # voice mode status (conditional)
+    completions_menu,       # autocomplete dropdown
+]
+```
+
+## Layout diagram
+
+The default layout from top to bottom:
+
+1. **Output area** — scrolling conversation history
+2. **Spacer**
+3. **Extra widgets** — from `_get_extra_tui_widgets()`
+4. **Status bar** — model, context %, elapsed time
+5. **Image bar** — attached image count
+6. **Input area** — user prompt
+7. **Voice status** — recording indicator
+8. **Completions menu** — autocomplete suggestions
+
+## Tips
+
+- **Invalidate the display** after state changes: call `self._invalidate()` to trigger a prompt_toolkit redraw.
+- **Access agent state**: `self.agent`, `self.model`, `self.conversation_history` are all available.
+- **Custom styles**: Override `_build_tui_style_dict()` and add entries for your custom style classes.
+- **Slash commands**: Override `process_command()`, handle your commands, and call `super().process_command(cmd)` for everything else.
+- **Don't override `run()`** unless absolutely necessary — the extension hooks exist specifically to avoid that coupling.
--- a/hermes_code/website/docs/developer-guide/gateway-internals.md
+++ b/hermes_code/website/docs/developer-guide/gateway-internals.md
@ -0,0 +1,121 @@
+---
+sidebar_position: 7
+title: "Gateway Internals"
+description: "How the messaging gateway boots, authorizes users, routes sessions, and delivers messages"
+---
+
+# Gateway Internals
+
+The messaging gateway is the long-running process that connects Hermes to external platforms.
+
+Key files:
+
+- `gateway/run.py`
+- `gateway/config.py`
+- `gateway/session.py`
+- `gateway/delivery.py`
+- `gateway/pairing.py`
+- `gateway/channel_directory.py`
+- `gateway/hooks.py`
+- `gateway/mirror.py`
+- `gateway/platforms/*`
+
+## Core responsibilities
+
+The gateway process is responsible for:
+
+- loading configuration from `.env`, `config.yaml`, and `gateway.json`
+- starting platform adapters
+- authorizing users
+- routing incoming events to sessions
+- maintaining per-chat session continuity
+- dispatching messages to `AIAgent`
+- running cron ticks and background maintenance tasks
+- mirroring/proactively delivering output to configured channels
+
+## Config sources
+
+The gateway has a multi-source config model:
+
+- environment variables
+- `~/.hermes/gateway.json`
+- selected bridged values from `~/.hermes/config.yaml`
+
+## Session routing
+
+`gateway/session.py` and `GatewayRunner` cooperate to map incoming messages to active session IDs.
+
+Session keying can depend on:
+
+- platform
+- user/chat identity
+- thread/topic identity
+- special platform-specific routing behavior
+
+## Authorization layers
+
+The gateway can authorize through:
+
+- platform allowlists
+- gateway-wide allowlists
+- DM pairing flows
+- explicit allow-all settings
+
+Pairing support is implemented in `gateway/pairing.py`.
+
+## Delivery path
+
+Outgoing deliveries are handled by `gateway/delivery.py`, which knows how to:
+
+- deliver to a home channel
+- resolve explicit targets
+- mirror some remote deliveries back into local history/session tracking
+
+## Hooks
+
+Gateway events emit hook callbacks through `gateway/hooks.py`. Hooks are local trusted Python code and can observe or extend gateway lifecycle events.
+
+## Background maintenance
+
+The gateway also runs maintenance tasks such as:
+
+- cron ticking
+- cache refreshes
+- session expiry checks
+- proactive memory flush before reset/expiry
+
+## Honcho interaction
+
+When Honcho is enabled, the gateway keeps persistent Honcho managers aligned with session lifetimes and platform-specific session keys.
+
+### Session routing
+
+Honcho tools (`honcho_profile`, `honcho_search`, `honcho_context`, `honcho_conclude`) need to execute against the correct user's Honcho session. In a multi-user gateway, the process-global module state in `tools/honcho_tools.py` is insufficient — multiple sessions may be active concurrently.
+
+The solution threads session context through the call chain:
+
+```
+AIAgent._invoke_tool()
+  → handle_function_call(honcho_manager=..., honcho_session_key=...)
+    → registry.dispatch(**kwargs)
+      → _handle_honcho_*(args, **kw)
+        → _resolve_session_context(**kw)   # prefers explicit kwargs over module globals
+```
+
+`_resolve_session_context()` in `honcho_tools.py` checks for `honcho_manager` and `honcho_session_key` in the kwargs first, falling back to the module-global `_session_manager` / `_session_key` for CLI mode where there's only one session.
+
+### Memory flush lifecycle
+
+When a session is reset, resumed, or expires, the gateway flushes memories before discarding context. The flush creates a temporary `AIAgent` with:
+
+- `session_id` set to the old session's ID (so transcripts load correctly)
+- `honcho_session_key` set to the gateway session key (so Honcho writes go to the right place)
+- `sync_honcho=False` passed to `run_conversation()` (so the synthetic flush turn doesn't write back to Honcho's conversation history)
+
+After the flush completes, any queued Honcho writes are drained and the gateway-level Honcho manager is shut down for that session key.
+
+## Related docs
+
+- [Session Storage](./session-storage.md)
+- [Cron Internals](./cron-internals.md)
+- [ACP Internals](./acp-internals.md)
--- a/hermes_code/website/docs/developer-guide/prompt-assembly.md
+++ b/hermes_code/website/docs/developer-guide/prompt-assembly.md
@ -0,0 +1,89 @@
+---
+sidebar_position: 5
+title: "Prompt Assembly"
+description: "How Hermes builds the system prompt, preserves cache stability, and injects ephemeral layers"
+---
+
+# Prompt Assembly
+
+Hermes deliberately separates:
+
+- **cached system prompt state**
+- **ephemeral API-call-time additions**
+
+This is one of the most important design choices in the project because it affects:
+
+- token usage
+- prompt caching effectiveness
+- session continuity
+- memory correctness
+
+Primary files:
+
+- `run_agent.py`
+- `agent/prompt_builder.py`
+- `tools/memory_tool.py`
+
+## Cached system prompt layers
+
+The cached system prompt is assembled in roughly this order:
+
+1. agent identity — `SOUL.md` from `HERMES_HOME` when available, otherwise falls back to `DEFAULT_AGENT_IDENTITY` in `prompt_builder.py`
+2. tool-aware behavior guidance
+3. Honcho static block (when active)
+4. optional system message
+5. frozen MEMORY snapshot
+6. frozen USER profile snapshot
+7. skills index
+8. context files (`AGENTS.md`, `.cursorrules`, `.cursor/rules/*.mdc`) — SOUL.md is **not** included here when it was already loaded as the identity in step 1
+9. timestamp / optional session ID
+10. platform hint
+
+When `skip_context_files` is set (e.g., subagent delegation), SOUL.md is not loaded and the hardcoded `DEFAULT_AGENT_IDENTITY` is used instead.
+
+## API-call-time-only layers
+
+These are intentionally *not* persisted as part of the cached system prompt:
+
+- `ephemeral_system_prompt`
+- prefill messages
+- gateway-derived session context overlays
+- later-turn Honcho recall injected into the current-turn user message
+
+This separation keeps the stable prefix stable for caching.
+
+## Memory snapshots
+
+Local memory and user profile data are injected as frozen snapshots at session start. Mid-session writes update disk state but do not mutate the already-built system prompt until a new session or forced rebuild occurs.
+
+## Context files
+
+`agent/prompt_builder.py` scans and sanitizes project context files using a **priority system** — only one type is loaded (first match wins):
+
+1. `.hermes.md` / `HERMES.md` (walks to git root)
+2. `AGENTS.md` (recursive directory walk)
+3. `CLAUDE.md` (CWD only)
+4. `.cursorrules` / `.cursor/rules/*.mdc` (CWD only)
+
+`SOUL.md` is loaded separately via `load_soul_md()` for the identity slot. When it loads successfully, `build_context_files_prompt(skip_soul=True)` prevents it from appearing twice.
+
+Long files are truncated before injection.
+
+## Skills index
+
+The skills system contributes a compact skills index to the prompt when skills tooling is available.
+
+## Why prompt assembly is split this way
+
+The architecture is intentionally optimized to:
+
+- preserve provider-side prompt caching
+- avoid mutating history unnecessarily
+- keep memory semantics understandable
+- let gateway/ACP/CLI add context without poisoning persistent prompt state
+
+## Related docs
+
+- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
+- [Session Storage](./session-storage.md)
+- [Gateway Internals](./gateway-internals.md)
--- a/hermes_code/website/docs/developer-guide/provider-runtime.md
+++ b/hermes_code/website/docs/developer-guide/provider-runtime.md
@ -0,0 +1,186 @@
+---
+sidebar_position: 4
+title: "Provider Runtime Resolution"
+description: "How Hermes resolves providers, credentials, API modes, and auxiliary models at runtime"
+---
+
+# Provider Runtime Resolution
+
+Hermes has a shared provider runtime resolver used across:
+
+- CLI
+- gateway
+- cron jobs
+- ACP
+- auxiliary model calls
+
+Primary implementation:
+
+- `hermes_cli/runtime_provider.py` — credential resolution, `_resolve_custom_runtime()`
+- `hermes_cli/auth.py` — provider registry, `resolve_provider()`
+- `hermes_cli/model_switch.py` — shared `/model` switch pipeline (CLI + gateway)
+- `agent/auxiliary_client.py` — auxiliary model routing
+
+If you are trying to add a new first-class inference provider, read [Adding Providers](./adding-providers.md) alongside this page.
+
+## Resolution precedence
+
+At a high level, provider resolution uses:
+
+1. explicit CLI/runtime request
+2. `config.yaml` model/provider config
+3. environment variables
+4. provider-specific defaults or auto resolution
+
+That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in `hermes model`.
+
+## Providers
+
+Current provider families include:
+
+- AI Gateway (Vercel)
+- OpenRouter
+- Nous Portal
+- OpenAI Codex
+- Anthropic (native)
+- Z.AI
+- Kimi / Moonshot
+- MiniMax
+- MiniMax China
+- Custom (`provider: custom`) — first-class provider for any OpenAI-compatible endpoint
+- Named custom providers (`custom_providers` list in config.yaml)
+
+## Output of runtime resolution
+
+The runtime resolver returns data such as:
+
+- `provider`
+- `api_mode`
+- `base_url`
+- `api_key`
+- `source`
+- provider-specific metadata like expiry/refresh info
+
+## Why this matters
+
+This resolver is the main reason Hermes can share auth/runtime logic between:
+
+- `hermes chat`
+- gateway message handling
+- cron jobs running in fresh sessions
+- ACP editor sessions
+- auxiliary model tasks
+
+## AI Gateway
+
+Set `AI_GATEWAY_API_KEY` in `~/.hermes/.env` and run with `--provider ai-gateway`. Hermes fetches available models from the gateway's `/models` endpoint, filtering to language models with tool-use support.
+
+## OpenRouter, AI Gateway, and custom OpenAI-compatible base URLs
+
+Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when multiple provider keys exist (e.g. `OPENROUTER_API_KEY`, `AI_GATEWAY_API_KEY`, and `OPENAI_API_KEY`).
+
+Each provider's API key is scoped to its own base URL:
+
+- `OPENROUTER_API_KEY` is only sent to `openrouter.ai` endpoints
+- `AI_GATEWAY_API_KEY` is only sent to `ai-gateway.vercel.sh` endpoints
+- `OPENAI_API_KEY` is used for custom endpoints and as a fallback
+
+Hermes also distinguishes between:
+
+- a real custom endpoint selected by the user
+- the OpenRouter fallback path used when no custom endpoint is configured
+
+That distinction is especially important for:
+
+- local model servers
+- non-OpenRouter/non-AI Gateway OpenAI-compatible APIs
+- switching providers without re-running setup
+- config-saved custom endpoints that should keep working even when `OPENAI_BASE_URL` is not exported in the current shell
+
+## Native Anthropic path
+
+Anthropic is not just "via OpenRouter" anymore.
+
+When provider resolution selects `anthropic`, Hermes uses:
+
+- `api_mode = anthropic_messages`
+- the native Anthropic Messages API
+- `agent/anthropic_adapter.py` for translation
+
+Credential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:
+
+- Claude Code credential files are treated as the preferred source when they include refreshable auth
+- manual `ANTHROPIC_TOKEN` / `CLAUDE_CODE_OAUTH_TOKEN` values still work as explicit overrides
+- Hermes preflights Anthropic credential refresh before native Messages API calls
+- Hermes still retries once on a 401 after rebuilding the Anthropic client, as a fallback path
+
+## OpenAI Codex path
+
+Codex uses a separate Responses API path:
+
+- `api_mode = codex_responses`
+- dedicated credential resolution and auth store support
+
+## Auxiliary model routing
+
+Auxiliary tasks such as:
+
+- vision
+- web extraction summarization
+- context compression summaries
+- session search summarization
+- skills hub operations
+- MCP helper operations
+- memory flushes
+
+can use their own provider/model routing rather than the main conversational model.
+
+When an auxiliary task is configured with provider `main`, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:
+
+- env-driven custom endpoints still work
+- custom endpoints saved via `hermes model` / `config.yaml` also work
+- auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback
+
+## Fallback models
+
+Hermes supports a configured fallback model/provider pair, allowing runtime failover when the primary model encounters errors.
+
+### How it works internally
+
+1. **Storage**: `AIAgent.__init__` stores the `fallback_model` dict and sets `_fallback_activated = False`.
+
+2. **Trigger points**: `_try_activate_fallback()` is called from three places in the main retry loop in `run_agent.py`:
+   - After max retries on invalid API responses (None choices, missing content)
+   - On non-retryable client errors (HTTP 401, 403, 404)
+   - After max retries on transient errors (HTTP 429, 500, 502, 503)
+
+3. **Activation flow** (`_try_activate_fallback`):
+   - Returns `False` immediately if already activated or not configured
+   - Calls `resolve_provider_client()` from `auxiliary_client.py` to build a new client with proper auth
+   - Determines `api_mode`: `codex_responses` for openai-codex, `anthropic_messages` for anthropic, `chat_completions` for everything else
+   - Swaps in-place: `self.model`, `self.provider`, `self.base_url`, `self.api_mode`, `self.client`, `self._client_kwargs`
+   - For anthropic fallback: builds a native Anthropic client instead of OpenAI-compatible
+   - Re-evaluates prompt caching (enabled for Claude models on OpenRouter)
+   - Sets `_fallback_activated = True` — prevents firing again
+   - Resets retry count to 0 and continues the loop
+
+4. **Config flow**:
+   - CLI: `cli.py` reads `CLI_CONFIG["fallback_model"]` → passes to `AIAgent(fallback_model=...)`
+   - Gateway: `gateway/run.py._load_fallback_model()` reads `config.yaml` → passes to `AIAgent`
+   - Validation: both `provider` and `model` keys must be non-empty, or fallback is disabled
+
+### What does NOT support fallback
+
+- **Subagent delegation** (`tools/delegate_tool.py`): subagents inherit the parent's provider but not the fallback config
+- **Cron jobs** (`cron/`): run with a fixed provider, no fallback mechanism
+- **Auxiliary tasks**: use their own independent provider auto-detection chain (see Auxiliary model routing above)
+
+### Test coverage
+
+See `tests/test_fallback_model.py` for comprehensive tests covering all supported providers, one-shot semantics, and edge cases.
+
+## Related docs
+
+- [Agent Loop Internals](./agent-loop.md)
+- [ACP Internals](./acp-internals.md)
+- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
--- a/hermes_code/website/docs/developer-guide/session-storage.md
+++ b/hermes_code/website/docs/developer-guide/session-storage.md
@ -0,0 +1,66 @@
+---
+sidebar_position: 8
+title: "Session Storage"
+description: "How Hermes stores sessions in SQLite, maintains lineage, and exposes recall/search"
+---
+
+# Session Storage
+
+Hermes uses a SQLite-backed session store as the main source of truth for historical conversation state.
+
+Primary files:
+
+- `hermes_state.py`
+- `gateway/session.py`
+- `tools/session_search_tool.py`
+
+## Main database
+
+The primary store lives at:
+
+```text
+~/.hermes/state.db
+```
+
+It contains:
+
+- sessions
+- messages
+- metadata such as token counts and titles
+- lineage relationships
+- full-text search indexes
+
+## What is stored per session
+
+Examples of important session metadata:
+
+- session ID
+- source/platform
+- title
+- created/updated timestamps
+- token counts
+- tool call counts
+- stored system prompt snapshot
+- parent session ID after compression splits
+
+## Lineage
+
+When Hermes compresses a conversation, it can continue in a new session ID while preserving ancestry via `parent_session_id`.
+
+This means resuming/searching can follow session families instead of treating each compressed shard as unrelated.
+
+## Gateway vs CLI persistence
+
+- CLI uses the state DB directly for resume/history/search
+- gateway keeps active-session mappings and may also maintain additional platform transcript/state files
+- some legacy JSON/JSONL artifacts still exist for compatibility, but SQLite is the main historical store
+
+## Session search
+
+The `session_search` tool uses the session DB's search features to retrieve and summarize relevant past work.
+
+## Related docs
+
+- [Gateway Internals](./gateway-internals.md)
+- [Prompt Assembly](./prompt-assembly.md)
+- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
--- a/hermes_code/website/docs/developer-guide/tools-runtime.md
+++ b/hermes_code/website/docs/developer-guide/tools-runtime.md
@ -0,0 +1,65 @@
+---
+sidebar_position: 9
+title: "Tools Runtime"
+description: "Runtime behavior of the tool registry, toolsets, dispatch, and terminal environments"
+---
+
+# Tools Runtime
+
+Hermes tools are self-registering functions grouped into toolsets and executed through a central registry/dispatch system.
+
+Primary files:
+
+- `tools/registry.py`
+- `model_tools.py`
+- `toolsets.py`
+- `tools/terminal_tool.py`
+- `tools/environments/*`
+
+## Tool registration model
+
+Each tool module calls `registry.register(...)` at import time.
+
+`model_tools.py` is responsible for importing/discovering tool modules and building the schema list used by the model.
+
+## Toolset resolution
+
+Toolsets are named bundles of tools. Hermes resolves them through:
+
+- explicit enabled/disabled toolset lists
+- platform presets (`hermes-cli`, `hermes-telegram`, etc.)
+- dynamic MCP toolsets
+- curated special-purpose sets like `hermes-acp`
+
+## Dispatch
+
+At runtime, tools are dispatched through the central registry, with agent-loop exceptions for some agent-level tools such as memory/todo/session-search handling.
+
+## Terminal/runtime environments
+
+The terminal system supports multiple backends:
+
+- local
+- docker
+- ssh
+- singularity
+- modal
+- daytona
+
+It also supports:
+
+- per-task cwd overrides
+- background process management
+- PTY mode
+- approval callbacks for dangerous commands
+
+## Concurrency
+
+Tool calls may execute sequentially or concurrently depending on the tool mix and interaction requirements.
+
+## Related docs
+
+- [Toolsets Reference](../reference/toolsets-reference.md)
+- [Built-in Tools Reference](../reference/tools-reference.md)
+- [Agent Loop Internals](./agent-loop.md)
+- [ACP Internals](./acp-internals.md)
--- a/hermes_code/website/docs/developer-guide/trajectory-format.md
+++ b/hermes_code/website/docs/developer-guide/trajectory-format.md
@ -0,0 +1,56 @@
+---
+sidebar_position: 10
+title: "Trajectories & Training Format"
+description: "How Hermes saves trajectories, normalizes tool calls, and produces training-friendly outputs"
+---
+
+# Trajectories & Training Format
+
+Hermes can save conversation trajectories for training, evaluation, and batch data generation workflows.
+
+Primary files:
+
+- `agent/trajectory.py`
+- `run_agent.py`
+- `batch_runner.py`
+- `trajectory_compressor.py`
+
+## What trajectories are for
+
+Trajectory outputs are used for:
+
+- SFT data generation
+- debugging agent behavior
+- benchmark/evaluation artifact capture
+- post-processing and compression pipelines
+
+## Normalization strategy
+
+Hermes converts live conversation structure into a training-friendly format.
+
+Important behaviors include:
+
+- representing reasoning in explicit markup
+- converting tool calls into structured XML-like regions for dataset compatibility
+- grouping tool outputs appropriately
+- separating successful and failed trajectories
+
+## Persistence boundaries
+
+Trajectory files do **not** blindly mirror all runtime prompt state.
+
+Some prompt-time-only layers are intentionally excluded from persisted trajectory content so datasets are cleaner and less environment-specific.
+
+## Batch runner
+
+`batch_runner.py` emits richer metadata than single-session trajectory saving, including:
+
+- model/provider metadata
+- toolset info
+- partial/failure markers
+- tool statistics
+
+## Related docs
+
+- [Environments, Benchmarks & Data Generation](./environments.md)
+- [Agent Loop Internals](./agent-loop.md)