The architecture has been updated

2026-03-31 23:31:36 +03:00 · 2026-03-31 23:31:36 +03:00 · a01257ead9
commit a01257ead9
parent 805f7a017e
1119 changed files with 226 additions and 352 deletions
--- a/hermes_code/website/docs/user-guide/features/_category_.json
+++ b/hermes_code/website/docs/user-guide/features/_category_.json
@ -0,0 +1,8 @@
+{
+  "label": "Features",
+  "position": 4,
+  "link": {
+    "type": "generated-index",
+    "description": "Explore the powerful features of Hermes Agent."
+  }
+}
--- a/hermes_code/website/docs/user-guide/features/acp.md
+++ b/hermes_code/website/docs/user-guide/features/acp.md
@ -0,0 +1,197 @@
+---
+sidebar_position: 11
+title: "ACP Editor Integration"
+description: "Use Hermes Agent inside ACP-compatible editors such as VS Code, Zed, and JetBrains"
+---
+
+# ACP Editor Integration
+
+Hermes Agent can run as an ACP server, letting ACP-compatible editors talk to Hermes over stdio and render:
+
+- chat messages
+- tool activity
+- file diffs
+- terminal commands
+- approval prompts
+- streamed thinking / response chunks
+
+ACP is a good fit when you want Hermes to behave like an editor-native coding agent instead of a standalone CLI or messaging bot.
+
+## What Hermes exposes in ACP mode
+
+Hermes runs with a curated `hermes-acp` toolset designed for editor workflows. It includes:
+
+- file tools: `read_file`, `write_file`, `patch`, `search_files`
+- terminal tools: `terminal`, `process`
+- web/browser tools
+- memory, todo, session search
+- skills
+- execute_code and delegate_task
+- vision
+
+It intentionally excludes things that do not fit typical editor UX, such as messaging delivery and cronjob management.
+
+## Installation
+
+Install Hermes normally, then add the ACP extra:
+
+```bash
+pip install -e '.[acp]'
+```
+
+This installs the `agent-client-protocol` dependency and enables:
+
+- `hermes acp`
+- `hermes-acp`
+- `python -m acp_adapter`
+
+## Launching the ACP server
+
+Any of the following starts Hermes in ACP mode:
+
+```bash
+hermes acp
+```
+
+```bash
+hermes-acp
+```
+
+```bash
+python -m acp_adapter
+```
+
+Hermes logs to stderr so stdout remains reserved for ACP JSON-RPC traffic.
+
+## Editor setup
+
+### VS Code
+
+Install an ACP client extension, then point it at the repo's `acp_registry/` directory.
+
+Example settings snippet:
+
+```json
+{
+  "acpClient.agents": [
+    {
+      "name": "hermes-agent",
+      "registryDir": "/path/to/hermes-agent/acp_registry"
+    }
+  ]
+}
+```
+
+### Zed
+
+Example settings snippet:
+
+```json
+{
+  "acp": {
+    "agents": [
+      {
+        "name": "hermes-agent",
+        "registry_dir": "/path/to/hermes-agent/acp_registry"
+      }
+    ]
+  }
+}
+```
+
+### JetBrains
+
+Use an ACP-compatible plugin and point it at:
+
+```text
+/path/to/hermes-agent/acp_registry
+```
+
+## Registry manifest
+
+The ACP registry manifest lives at:
+
+```text
+acp_registry/agent.json
+```
+
+It advertises a command-based agent whose launch command is:
+
+```text
+hermes acp
+```
+
+## Configuration and credentials
+
+ACP mode uses the same Hermes configuration as the CLI:
+
+- `~/.hermes/.env`
+- `~/.hermes/config.yaml`
+- `~/.hermes/skills/`
+- `~/.hermes/state.db`
+
+Provider resolution uses Hermes' normal runtime resolver, so ACP inherits the currently configured provider and credentials.
+
+## Session behavior
+
+ACP sessions are tracked by the ACP adapter's in-memory session manager while the server is running.
+
+Each session stores:
+
+- session ID
+- working directory
+- selected model
+- current conversation history
+- cancel event
+
+The underlying `AIAgent` still uses Hermes' normal persistence/logging paths, but ACP `list/load/resume/fork` are scoped to the currently running ACP server process.
+
+## Working directory behavior
+
+ACP sessions bind the editor's cwd to the Hermes task ID so file and terminal tools run relative to the editor workspace, not the server process cwd.
+
+## Approvals
+
+Dangerous terminal commands can be routed back to the editor as approval prompts. ACP approval options are simpler than the CLI flow:
+
+- allow once
+- allow always
+- deny
+
+On timeout or error, the approval bridge denies the request.
+
+## Troubleshooting
+
+### ACP agent does not appear in the editor
+
+Check:
+
+- the editor is pointed at the correct `acp_registry/` path
+- Hermes is installed and on your PATH
+- the ACP extra is installed (`pip install -e '.[acp]'`)
+
+### ACP starts but immediately errors
+
+Try these checks:
+
+```bash
+hermes doctor
+hermes status
+hermes acp
+```
+
+### Missing credentials
+
+ACP mode does not have its own login flow. It uses Hermes' existing provider setup. Configure credentials with:
+
+```bash
+hermes model
+```
+
+or by editing `~/.hermes/.env`.
+
+## See also
+
+- [ACP Internals](../../developer-guide/acp-internals.md)
+- [Provider Runtime Resolution](../../developer-guide/provider-runtime.md)
+- [Tools Runtime](../../developer-guide/tools-runtime.md)
--- a/hermes_code/website/docs/user-guide/features/api-server.md
+++ b/hermes_code/website/docs/user-guide/features/api-server.md
@ -0,0 +1,236 @@
+---
+sidebar_position: 14
+title: "API Server"
+description: "Expose hermes-agent as an OpenAI-compatible API for any frontend"
+---
+
+# API Server
+
+The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.
+
+Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side.
+
+## Quick Start
+
+### 1. Enable the API server
+
+Add to `~/.hermes/.env`:
+
+```bash
+API_SERVER_ENABLED=true
+API_SERVER_KEY=change-me-local-dev
+# Optional: only if a browser must call Hermes directly
+# API_SERVER_CORS_ORIGINS=http://localhost:3000
+```
+
+### 2. Start the gateway
+
+```bash
+hermes gateway
+```
+
+You'll see:
+
+```
+[API Server] API server listening on http://127.0.0.1:8642
+```
+
+### 3. Connect a frontend
+
+Point any OpenAI-compatible client at `http://localhost:8642/v1`:
+
+```bash
+# Test with curl
+curl http://localhost:8642/v1/chat/completions \
+  -H "Authorization: Bearer change-me-local-dev" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'
+```
+
+Or connect Open WebUI, LobeChat, or any other frontend — see the [Open WebUI integration guide](/docs/user-guide/messaging/open-webui) for step-by-step instructions.
+
+## Endpoints
+
+### POST /v1/chat/completions
+
+Standard OpenAI Chat Completions format. Stateless — the full conversation is included in each request via the `messages` array.
+
+**Request:**
+```json
+{
+  "model": "hermes-agent",
+  "messages": [
+    {"role": "system", "content": "You are a Python expert."},
+    {"role": "user", "content": "Write a fibonacci function"}
+  ],
+  "stream": false
+}
+```
+
+**Response:**
+```json
+{
+  "id": "chatcmpl-abc123",
+  "object": "chat.completion",
+  "created": 1710000000,
+  "model": "hermes-agent",
+  "choices": [{
+    "index": 0,
+    "message": {"role": "assistant", "content": "Here's a fibonacci function..."},
+    "finish_reason": "stop"
+  }],
+  "usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
+}
+```
+
+**Streaming** (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk.
+
+### POST /v1/responses
+
+OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.
+
+**Request:**
+```json
+{
+  "model": "hermes-agent",
+  "input": "What files are in my project?",
+  "instructions": "You are a helpful coding assistant.",
+  "store": true
+}
+```
+
+**Response:**
+```json
+{
+  "id": "resp_abc123",
+  "object": "response",
+  "status": "completed",
+  "model": "hermes-agent",
+  "output": [
+    {"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
+    {"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
+    {"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
+  ],
+  "usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
+}
+```
+
+#### Multi-turn with previous_response_id
+
+Chain responses to maintain full context (including tool calls) across turns:
+
+```json
+{
+  "input": "Now show me the README",
+  "previous_response_id": "resp_abc123"
+}
+```
+
+The server reconstructs the full conversation from the stored response chain — all previous tool calls and results are preserved.
+
+#### Named conversations
+
+Use the `conversation` parameter instead of tracking response IDs:
+
+```json
+{"input": "Hello", "conversation": "my-project"}
+{"input": "What's in src/?", "conversation": "my-project"}
+{"input": "Run the tests", "conversation": "my-project"}
+```
+
+The server automatically chains to the latest response in that conversation. Like the `/title` command for gateway sessions.
+
+### GET /v1/responses/\{id\}
+
+Retrieve a previously stored response by ID.
+
+### DELETE /v1/responses/\{id\}
+
+Delete a stored response.
+
+### GET /v1/models
+
+Lists `hermes-agent` as an available model. Required by most frontends for model discovery.
+
+### GET /health
+
+Health check. Returns `{"status": "ok"}`.
+
+## System Prompt Handling
+
+When a frontend sends a `system` message (Chat Completions) or `instructions` field (Responses API), hermes-agent **layers it on top** of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions.
+
+This means you can customize behavior per-frontend without losing capabilities:
+- Open WebUI system prompt: "You are a Python expert. Always include type hints."
+- The agent still has terminal, file tools, web search, memory, etc.
+
+## Authentication
+
+Bearer token auth via the `Authorization` header:
+
+```
+Authorization: Bearer ***
+```
+
+Configure the key via `API_SERVER_KEY` env var. If you need a browser to call Hermes directly, also set `API_SERVER_CORS_ORIGINS` to an explicit allowlist.
+
+:::warning Security
+The API server gives full access to hermes-agent's toolset, **including terminal commands**. If you change the bind address to `0.0.0.0` (network-accessible), **always set `API_SERVER_KEY`** and keep `API_SERVER_CORS_ORIGINS` narrow — without that, remote callers may be able to execute arbitrary commands on your machine.
+
+The default bind address (`127.0.0.1`) is for local-only use. Browser access is disabled by default; enable it only for explicit trusted origins.
+:::
+
+## Configuration
+
+### Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `API_SERVER_ENABLED` | `false` | Enable the API server |
+| `API_SERVER_PORT` | `8642` | HTTP server port |
+| `API_SERVER_HOST` | `127.0.0.1` | Bind address (localhost only by default) |
+| `API_SERVER_KEY` | _(none)_ | Bearer token for auth |
+| `API_SERVER_CORS_ORIGINS` | _(none)_ | Comma-separated allowed browser origins |
+
+### config.yaml
+
+```yaml
+# Not yet supported — use environment variables.
+# config.yaml support coming in a future release.
+```
+
+## CORS
+
+The API server does **not** enable browser CORS by default.
+
+For direct browser access, set an explicit allowlist:
+
+```bash
+API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
+```
+
+Most documented frontends such as Open WebUI connect server-to-server and do not need CORS at all.
+
+## Compatible Frontends
+
+Any frontend that supports the OpenAI API format works. Tested/documented integrations:
+
+| Frontend | Stars | Connection |
+|----------|-------|------------|
+| [Open WebUI](/docs/user-guide/messaging/open-webui) | 126k | Full guide available |
+| LobeChat | 73k | Custom provider endpoint |
+| LibreChat | 34k | Custom endpoint in librechat.yaml |
+| AnythingLLM | 56k | Generic OpenAI provider |
+| NextChat | 87k | BASE_URL env var |
+| ChatBox | 39k | API Host setting |
+| Jan | 26k | Remote model config |
+| HF Chat-UI | 8k | OPENAI_BASE_URL |
+| big-AGI | 7k | Custom endpoint |
+| OpenAI Python SDK | — | `OpenAI(base_url="http://localhost:8642/v1")` |
+| curl | — | Direct HTTP requests |
+
+## Limitations
+
+- **Response storage** — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
+- **No file upload** — vision/document analysis via uploaded files is not yet supported through the API.
+- **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.
--- a/hermes_code/website/docs/user-guide/features/batch-processing.md
+++ b/hermes_code/website/docs/user-guide/features/batch-processing.md
@ -0,0 +1,226 @@
+---
+sidebar_position: 12
+title: "Batch Processing"
+description: "Generate agent trajectories at scale — parallel processing, checkpointing, and toolset distributions"
+---
+
+# Batch Processing
+
+Batch processing lets you run the Hermes agent across hundreds or thousands of prompts in parallel, generating structured trajectory data. This is primarily used for **training data generation** — producing ShareGPT-format trajectories with tool usage statistics that can be used for fine-tuning or evaluation.
+
+## Overview
+
+The batch runner (`batch_runner.py`) processes a JSONL dataset of prompts, running each through a full agent session with tool access. Each prompt gets its own isolated environment. The output is structured trajectory data with full conversation history, tool call statistics, and reasoning coverage metrics.
+
+## Quick Start
+
+```bash
+# Basic batch run
+python batch_runner.py \
+    --dataset_file=data/prompts.jsonl \
+    --batch_size=10 \
+    --run_name=my_first_run \
+    --model=anthropic/claude-sonnet-4-20250514 \
+    --num_workers=4
+
+# Resume an interrupted run
+python batch_runner.py \
+    --dataset_file=data/prompts.jsonl \
+    --batch_size=10 \
+    --run_name=my_first_run \
+    --resume
+
+# List available toolset distributions
+python batch_runner.py --list_distributions
+```
+
+## Dataset Format
+
+The input dataset is a JSONL file (one JSON object per line). Each entry must have a `prompt` field:
+
+```jsonl
+{"prompt": "Write a Python function that finds the longest palindromic substring"}
+{"prompt": "Create a REST API endpoint for user authentication using Flask"}
+{"prompt": "Debug this error: TypeError: cannot unpack non-iterable NoneType object"}
+```
+
+Entries can optionally include:
+- `image` or `docker_image`: A container image to use for this prompt's sandbox (works with Docker, Modal, and Singularity backends)
+- `cwd`: Working directory override for the task's terminal session
+
+## Configuration Options
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `--dataset_file` | (required) | Path to JSONL dataset |
+| `--batch_size` | (required) | Prompts per batch |
+| `--run_name` | (required) | Name for this run (used for output dir and checkpointing) |
+| `--distribution` | `"default"` | Toolset distribution to sample from |
+| `--model` | `claude-sonnet-4-20250514` | Model to use |
+| `--base_url` | `https://openrouter.ai/api/v1` | API base URL |
+| `--api_key` | (env var) | API key for model |
+| `--max_turns` | `10` | Maximum tool-calling iterations per prompt |
+| `--num_workers` | `4` | Parallel worker processes |
+| `--resume` | `false` | Resume from checkpoint |
+| `--verbose` | `false` | Enable verbose logging |
+| `--max_samples` | all | Only process first N samples from dataset |
+| `--max_tokens` | model default | Maximum tokens per model response |
+
+### Provider Routing (OpenRouter)
+
+| Parameter | Description |
+|-----------|-------------|
+| `--providers_allowed` | Comma-separated providers to allow (e.g., `"anthropic,openai"`) |
+| `--providers_ignored` | Comma-separated providers to ignore (e.g., `"together,deepinfra"`) |
+| `--providers_order` | Comma-separated preferred provider order |
+| `--provider_sort` | Sort by `"price"`, `"throughput"`, or `"latency"` |
+
+### Reasoning Control
+
+| Parameter | Description |
+|-----------|-------------|
+| `--reasoning_effort` | Effort level: `xhigh`, `high`, `medium`, `low`, `minimal`, `none` |
+| `--reasoning_disabled` | Completely disable reasoning/thinking tokens |
+
+### Advanced Options
+
+| Parameter | Description |
+|-----------|-------------|
+| `--ephemeral_system_prompt` | System prompt used during execution but NOT saved to trajectories |
+| `--log_prefix_chars` | Characters to show in log previews (default: 100) |
+| `--prefill_messages_file` | Path to JSON file with prefill messages for few-shot priming |
+
+## Toolset Distributions
+
+Each prompt gets a randomly sampled set of toolsets from a **distribution**. This ensures training data covers diverse tool combinations. Use `--list_distributions` to see all available distributions.
+
+In the current implementation, distributions assign a probability to **each individual toolset**. The sampler flips each toolset independently, then guarantees that at least one toolset is enabled. This is different from a hand-authored table of prebuilt combinations.
+
+## Output Format
+
+All output goes to `data/<run_name>/`:
+
+```text
+data/my_run/
+├── trajectories.jsonl    # Combined final output (all batches merged)
+├── batch_0.jsonl         # Individual batch results
+├── batch_1.jsonl
+├── ...
+├── checkpoint.json       # Resume checkpoint
+└── statistics.json       # Aggregate tool usage stats
+```
+
+### Trajectory Format
+
+Each line in `trajectories.jsonl` is a JSON object:
+
+```json
+{
+  "prompt_index": 42,
+  "conversations": [
+    {"from": "human", "value": "Write a function..."},
+    {"from": "gpt", "value": "I'll create that function...",
+     "tool_calls": [...]},
+    {"from": "tool", "value": "..."},
+    {"from": "gpt", "value": "Here's the completed function..."}
+  ],
+  "metadata": {
+    "batch_num": 2,
+    "timestamp": "2026-01-15T10:30:00",
+    "model": "anthropic/claude-sonnet-4-20250514"
+  },
+  "completed": true,
+  "partial": false,
+  "api_calls": 3,
+  "toolsets_used": ["terminal", "file"],
+  "tool_stats": {
+    "terminal": {"count": 2, "success": 2, "failure": 0},
+    "read_file": {"count": 1, "success": 1, "failure": 0}
+  },
+  "tool_error_counts": {
+    "terminal": 0,
+    "read_file": 0
+  }
+}
+```
+
+The `conversations` field uses a ShareGPT-like format with `from` and `value` fields. Tool stats are normalized to include all possible tools with zero defaults, ensuring consistent schema across entries for HuggingFace datasets compatibility.
+
+## Checkpointing
+
+The batch runner has robust checkpointing for fault tolerance:
+
+- **Checkpoint file:** Saved after each batch completes, tracking which prompt indices are done
+- **Content-based resume:** On `--resume`, the runner scans existing batch files and matches completed prompts by their actual text content (not just indices), enabling recovery even if the dataset order changes
+- **Failed prompts:** Only successfully completed prompts are marked as done — failed prompts will be retried on resume
+- **Batch merging:** On completion, all batch files (including from previous runs) are merged into a single `trajectories.jsonl`
+
+### How Resume Works
+
+1. Scan all `batch_*.jsonl` files for completed prompts (by content matching)
+2. Filter the dataset to exclude already-completed prompts
+3. Re-batch the remaining prompts
+4. Process only the remaining prompts
+5. Merge all batch files (old + new) into final output
+
+## Quality Filtering
+
+The batch runner applies automatic quality filtering:
+
+- **No-reasoning filter:** Samples where zero assistant turns contain reasoning (no `<REASONING_SCRATCHPAD>` or native thinking tokens) are discarded
+- **Corrupted entry filter:** Entries with hallucinated tool names (not in the valid tool list) are filtered out during the final merge
+- **Reasoning statistics:** Tracks percentage of turns with/without reasoning across the entire run
+
+## Statistics
+
+After completion, the runner prints comprehensive statistics:
+
+- **Tool usage:** Call counts, success/failure rates per tool
+- **Reasoning coverage:** Percentage of assistant turns with reasoning
+- **Samples discarded:** Count of samples filtered for lacking reasoning
+- **Duration:** Total processing time
+
+Statistics are also saved to `statistics.json` for programmatic analysis.
+
+## Use Cases
+
+### Training Data Generation
+
+Generate diverse tool-use trajectories for fine-tuning:
+
+```bash
+python batch_runner.py \
+    --dataset_file=data/coding_prompts.jsonl \
+    --batch_size=20 \
+    --run_name=coding_v1 \
+    --model=anthropic/claude-sonnet-4-20250514 \
+    --num_workers=8 \
+    --distribution=default \
+    --max_turns=15
+```
+
+### Model Evaluation
+
+Evaluate how well a model uses tools across standardized prompts:
+
+```bash
+python batch_runner.py \
+    --dataset_file=data/eval_suite.jsonl \
+    --batch_size=10 \
+    --run_name=eval_gpt4 \
+    --model=openai/gpt-4o \
+    --num_workers=4 \
+    --max_turns=10
+```
+
+### Per-Prompt Container Images
+
+For benchmarks requiring specific environments, each prompt can specify its own container image:
+
+```jsonl
+{"prompt": "Install numpy and compute eigenvalues of a 3x3 matrix", "image": "python:3.11-slim"}
+{"prompt": "Compile this Rust program and run it", "image": "rust:1.75"}
+{"prompt": "Set up a Node.js Express server", "image": "node:20-alpine", "cwd": "/app"}
+```
+
+The batch runner verifies Docker images are accessible before running each prompt.
--- a/hermes_code/website/docs/user-guide/features/browser.md
+++ b/hermes_code/website/docs/user-guide/features/browser.md
@ -0,0 +1,281 @@
+---
+title: Browser Automation
+description: Control browsers with multiple providers, local Chrome via CDP, or cloud browsers for web interaction, form filling, scraping, and more.
+sidebar_label: Browser
+sidebar_position: 5
+---
+
+# Browser Automation
+
+Hermes Agent includes a full browser automation toolset with multiple backend options:
+
+- **Browserbase cloud mode** via [Browserbase](https://browserbase.com) for managed cloud browsers and anti-bot tooling
+- **Browser Use cloud mode** via [Browser Use](https://browser-use.com) as an alternative cloud browser provider
+- **Local Chrome via CDP** — connect browser tools to your own Chrome instance using `/browser connect`
+- **Local browser mode** via the `agent-browser` CLI and a local Chromium installation
+
+In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
+
+## Overview
+
+Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
+
+Key capabilities:
+
+- **Multi-provider cloud execution** — Browserbase or Browser Use, no local browser needed
+- **Local Chrome integration** — attach to your running Chrome via CDP for hands-on browsing
+- **Built-in stealth** — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
+- **Session isolation** — each task gets its own browser session
+- **Automatic cleanup** — inactive sessions are closed after a timeout
+- **Vision analysis** — screenshot + AI analysis for visual understanding
+
+## Setup
+
+### Browserbase cloud mode
+
+To use Browserbase-managed cloud browsers, add:
+
+```bash
+# Add to ~/.hermes/.env
+BROWSERBASE_API_KEY=***
+BROWSERBASE_PROJECT_ID=your-project-id-here
+```
+
+Get your credentials at [browserbase.com](https://browserbase.com).
+
+### Browser Use cloud mode
+
+To use Browser Use as your cloud browser provider, add:
+
+```bash
+# Add to ~/.hermes/.env
+BROWSER_USE_API_KEY=***
+```
+
+Get your API key at [browser-use.com](https://browser-use.com). Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.
+
+### Local Chrome via CDP (`/browser connect`)
+
+Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.
+
+In the CLI, use:
+
+```
+/browser connect              # Connect to Chrome at ws://localhost:9222
+/browser connect ws://host:port  # Connect to a specific CDP endpoint
+/browser status               # Check current connection
+/browser disconnect            # Detach and return to cloud/local mode
+```
+
+If Chrome isn't already running with remote debugging, Hermes will attempt to auto-launch it with `--remote-debugging-port=9222`.
+
+:::tip
+To start Chrome manually with CDP enabled:
+```bash
+# Linux
+google-chrome --remote-debugging-port=9222
+
+# macOS
+"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
+```
+:::
+
+When connected via CDP, all browser tools (`browser_navigate`, `browser_click`, etc.) operate on your live Chrome instance instead of spinning up a cloud session.
+
+### Local browser mode
+
+If you do **not** set any cloud credentials and don't use `/browser connect`, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
+
+### Optional Environment Variables
+
+```bash
+# Residential proxies for better CAPTCHA solving (default: "true")
+BROWSERBASE_PROXIES=true
+
+# Advanced stealth with custom Chromium — requires Scale Plan (default: "false")
+BROWSERBASE_ADVANCED_STEALTH=false
+
+# Session reconnection after disconnects — requires paid plan (default: "true")
+BROWSERBASE_KEEP_ALIVE=true
+
+# Custom session timeout in milliseconds (default: project default)
+# Examples: 600000 (10min), 1800000 (30min)
+BROWSERBASE_SESSION_TIMEOUT=600000
+
+# Inactivity timeout before auto-cleanup in seconds (default: 300)
+BROWSER_INACTIVITY_TIMEOUT=300
+```
+
+### Install agent-browser CLI
+
+```bash
+npm install -g agent-browser
+# Or install locally in the repo:
+npm install
+```
+
+:::info
+The `browser` toolset must be included in your config's `toolsets` list or enabled via `hermes config set toolsets '["hermes-cli", "browser"]'`.
+:::
+
+## Available Tools
+
+### `browser_navigate`
+
+Navigate to a URL. Must be called before any other browser tool. Initializes the Browserbase session.
+
+```
+Navigate to https://github.com/NousResearch
+```
+
+:::tip
+For simple information retrieval, prefer `web_search` or `web_extract` — they are faster and cheaper. Use browser tools when you need to **interact** with a page (click buttons, fill forms, handle dynamic content).
+:::
+
+### `browser_snapshot`
+
+Get a text-based snapshot of the current page's accessibility tree. Returns interactive elements with ref IDs like `@e1`, `@e2` for use with `browser_click` and `browser_type`.
+
+- **`full=false`** (default): Compact view showing only interactive elements
+- **`full=true`**: Complete page content
+
+Snapshots over 8000 characters are automatically summarized by an LLM.
+
+### `browser_click`
+
+Click an element identified by its ref ID from the snapshot.
+
+```
+Click @e5 to press the "Sign In" button
+```
+
+### `browser_type`
+
+Type text into an input field. Clears the field first, then types the new text.
+
+```
+Type "hermes agent" into the search field @e3
+```
+
+### `browser_scroll`
+
+Scroll the page up or down to reveal more content.
+
+```
+Scroll down to see more results
+```
+
+### `browser_press`
+
+Press a keyboard key. Useful for submitting forms or navigation.
+
+```
+Press Enter to submit the form
+```
+
+Supported keys: `Enter`, `Tab`, `Escape`, `ArrowDown`, `ArrowUp`, and more.
+
+### `browser_back`
+
+Navigate back to the previous page in browser history.
+
+### `browser_get_images`
+
+List all images on the current page with their URLs and alt text. Useful for finding images to analyze.
+
+### `browser_vision`
+
+Take a screenshot and analyze it with vision AI. Use this when text snapshots don't capture important visual information — especially useful for CAPTCHAs, complex layouts, or visual verification challenges.
+
+The screenshot is saved persistently and the file path is returned alongside the AI analysis. On messaging platforms (Telegram, Discord, Slack, WhatsApp), you can ask the agent to share the screenshot — it will be sent as a native photo attachment via the `MEDIA:` mechanism.
+
+```
+What does the chart on this page show?
+```
+
+Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.
+
+### `browser_console`
+
+Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
+
+```
+Check the browser console for any JavaScript errors
+```
+
+Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
+
+### `browser_close`
+
+Close the browser session and release resources. Call this when done to free up Browserbase session quota.
+
+## Practical Examples
+
+### Filling Out a Web Form
+
+```
+User: Sign up for an account on example.com with my email john@example.com
+
+Agent workflow:
+1. browser_navigate("https://example.com/signup")
+2. browser_snapshot()  → sees form fields with refs
+3. browser_type(ref="@e3", text="john@example.com")
+4. browser_type(ref="@e5", text="SecurePass123")
+5. browser_click(ref="@e8")  → clicks "Create Account"
+6. browser_snapshot()  → confirms success
+7. browser_close()
+```
+
+### Researching Dynamic Content
+
+```
+User: What are the top trending repos on GitHub right now?
+
+Agent workflow:
+1. browser_navigate("https://github.com/trending")
+2. browser_snapshot(full=true)  → reads trending repo list
+3. Returns formatted results
+4. browser_close()
+```
+
+## Session Recording
+
+Automatically record browser sessions as WebM video files:
+
+```yaml
+browser:
+  record_sessions: true  # default: false
+```
+
+When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
+
+## Stealth Features
+
+Browserbase provides automatic stealth capabilities:
+
+| Feature | Default | Notes |
+|---------|---------|-------|
+| Basic Stealth | Always on | Random fingerprints, viewport randomization, CAPTCHA solving |
+| Residential Proxies | On | Routes through residential IPs for better access |
+| Advanced Stealth | Off | Custom Chromium build, requires Scale Plan |
+| Keep Alive | On | Session reconnection after network hiccups |
+
+:::note
+If paid features aren't available on your plan, Hermes automatically falls back — first disabling `keepAlive`, then proxies — so browsing still works on free plans.
+:::
+
+## Session Management
+
+- Each task gets an isolated browser session via Browserbase
+- Sessions are automatically cleaned up after inactivity (default: 5 minutes)
+- A background thread checks every 30 seconds for stale sessions
+- Emergency cleanup runs on process exit to prevent orphaned sessions
+- Sessions are released via the Browserbase API (`REQUEST_RELEASE` status)
+
+## Limitations
+
+- **Text-based interaction** — relies on accessibility tree, not pixel coordinates
+- **Snapshot size** — large pages may be truncated or LLM-summarized at 8000 characters
+- **Session timeout** — cloud sessions expire based on your provider's plan settings
+- **Cost** — cloud sessions consume provider credits; use `browser_close` when done. Use `/browser connect` for free local browsing.
+- **No file downloads** — cannot download files from the browser
--- a/hermes_code/website/docs/user-guide/features/checkpoints.md
+++ b/hermes_code/website/docs/user-guide/features/checkpoints.md
@ -0,0 +1,30 @@
+# Filesystem Checkpoints
+
+Hermes automatically snapshots your working directory before making file changes, giving you a safety net to roll back if something goes wrong. Checkpoints are **enabled by default**.
+
+## Quick Reference
+
+| Command | Description |
+|---------|-------------|
+| `/rollback` | List all checkpoints with change stats |
+| `/rollback <N>` | Restore to checkpoint N (also undoes last chat turn) |
+| `/rollback diff <N>` | Preview diff between checkpoint N and current state |
+| `/rollback <N> <file>` | Restore a single file from checkpoint N |
+
+## What Triggers Checkpoints
+
+- **File tools** — `write_file` and `patch`
+- **Destructive terminal commands** — `rm`, `mv`, `sed -i`, output redirects (`>`), `git reset`/`clean`
+
+## Configuration
+
+```yaml
+# ~/.hermes/config.yaml
+checkpoints:
+  enabled: true          # default: true
+  max_snapshots: 50      # max checkpoints per directory
+```
+
+## Learn More
+
+For the full guide — how shadow repos work, diff previews, file-level restore, conversation undo, safety guards, and best practices — see **[Checkpoints and /rollback](../checkpoints-and-rollback.md)**.
--- a/hermes_code/website/docs/user-guide/features/code-execution.md
+++ b/hermes_code/website/docs/user-guide/features/code-execution.md
@ -0,0 +1,210 @@
+---
+sidebar_position: 8
+title: "Code Execution"
+description: "Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn"
+---
+
+# Code Execution (Programmatic Tool Calling)
+
+The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC.
+
+## How It Works
+
+1. The agent writes a Python script using `from hermes_tools import ...`
+2. Hermes generates a `hermes_tools.py` stub module with RPC functions
+3. Hermes opens a Unix domain socket and starts an RPC listener thread
+4. The script runs in a child process — tool calls travel over the socket back to Hermes
+5. Only the script's `print()` output is returned to the LLM; intermediate tool results never enter the context window
+
+```python
+# The agent can write scripts like:
+from hermes_tools import web_search, web_extract
+
+results = web_search("Python 3.13 features", limit=5)
+for r in results["data"]["web"]:
+    content = web_extract([r["url"]])
+    # ... filter and process ...
+print(summary)
+```
+
+**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only).
+
+## When the Agent Uses This
+
+The agent uses `execute_code` when there are:
+
+- **3+ tool calls** with processing logic between them
+- Bulk data filtering or conditional branching
+- Loops over results
+
+The key benefit: intermediate tool results never enter the context window — only the final `print()` output comes back, dramatically reducing token usage.
+
+## Practical Examples
+
+### Data Processing Pipeline
+
+```python
+from hermes_tools import search_files, read_file
+import json
+
+# Find all config files and extract database settings
+matches = search_files("database", path=".", file_glob="*.yaml", limit=20)
+configs = []
+for match in matches.get("matches", []):
+    content = read_file(match["path"])
+    configs.append({"file": match["path"], "preview": content["content"][:200]})
+
+print(json.dumps(configs, indent=2))
+```
+
+### Multi-Step Web Research
+
+```python
+from hermes_tools import web_search, web_extract
+import json
+
+# Search, extract, and summarize in one turn
+results = web_search("Rust async runtime comparison 2025", limit=5)
+summaries = []
+for r in results["data"]["web"]:
+    page = web_extract([r["url"]])
+    for p in page.get("results", []):
+        if p.get("content"):
+            summaries.append({
+                "title": r["title"],
+                "url": r["url"],
+                "excerpt": p["content"][:500]
+            })
+
+print(json.dumps(summaries, indent=2))
+```
+
+### Bulk File Refactoring
+
+```python
+from hermes_tools import search_files, read_file, patch
+
+# Find all Python files using deprecated API and fix them
+matches = search_files("old_api_call", path="src/", file_glob="*.py")
+fixed = 0
+for match in matches.get("matches", []):
+    result = patch(
+        path=match["path"],
+        old_string="old_api_call(",
+        new_string="new_api_call(",
+        replace_all=True
+    )
+    if "error" not in str(result):
+        fixed += 1
+
+print(f"Fixed {fixed} files out of {len(matches.get('matches', []))} matches")
+```
+
+### Build and Test Pipeline
+
+```python
+from hermes_tools import terminal, read_file
+import json
+
+# Run tests, parse results, and report
+result = terminal("cd /project && python -m pytest --tb=short -q 2>&1", timeout=120)
+output = result.get("output", "")
+
+# Parse test output
+passed = output.count(" passed")
+failed = output.count(" failed")
+errors = output.count(" error")
+
+report = {
+    "passed": passed,
+    "failed": failed,
+    "errors": errors,
+    "exit_code": result.get("exit_code", -1),
+    "summary": output[-500:] if len(output) > 500 else output
+}
+
+print(json.dumps(report, indent=2))
+```
+
+## Resource Limits
+
+| Resource | Limit | Notes |
+|----------|-------|-------|
+| **Timeout** | 5 minutes (300s) | Script is killed with SIGTERM, then SIGKILL after 5s grace |
+| **Stdout** | 50 KB | Output truncated with `[output truncated at 50KB]` notice |
+| **Stderr** | 10 KB | Included in output on non-zero exit for debugging |
+| **Tool calls** | 50 per execution | Error returned when limit reached |
+
+All limits are configurable via `config.yaml`:
+
+```yaml
+# In ~/.hermes/config.yaml
+code_execution:
+  timeout: 300       # Max seconds per script (default: 300)
+  max_tool_calls: 50 # Max tool calls per execution (default: 50)
+```
+
+## How Tool Calls Work Inside Scripts
+
+When your script calls a function like `web_search("query")`:
+
+1. The call is serialized to JSON and sent over a Unix domain socket to the parent process
+2. The parent dispatches through the standard `handle_function_call` handler
+3. The result is sent back over the socket
+4. The function returns the parsed result
+
+This means tool calls inside scripts behave identically to normal tool calls — same rate limits, same error handling, same capabilities. The only restriction is that `terminal()` is foreground-only (no `background`, `pty`, or `check_interval` parameters).
+
+## Error Handling
+
+When a script fails, the agent receives structured error information:
+
+- **Non-zero exit code**: stderr is included in the output so the agent sees the full traceback
+- **Timeout**: Script is killed and the agent sees `"Script timed out after 300s and was killed."`
+- **Interruption**: If the user sends a new message during execution, the script is terminated and the agent sees `[execution interrupted — user sent a new message]`
+- **Tool call limit**: When the 50-call limit is hit, subsequent tool calls return an error message
+
+The response always includes `status` (success/error/timeout/interrupted), `output`, `tool_calls_made`, and `duration_seconds`.
+
+## Security
+
+:::danger Security Model
+The child process runs with a **minimal environment**. API keys, tokens, and credentials are stripped by default. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables unless explicitly allowed.
+:::
+
+Environment variables containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTIAL`, `PASSWD`, or `AUTH` in their names are excluded. Only safe system variables (`PATH`, `HOME`, `LANG`, `SHELL`, `PYTHONPATH`, `VIRTUAL_ENV`, etc.) are passed through.
+
+### Skill Environment Variable Passthrough
+
+When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` sandboxes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code.
+
+For non-skill use cases, you can explicitly allowlist variables in `config.yaml`:
+
+```yaml
+terminal:
+  env_passthrough:
+    - MY_CUSTOM_KEY
+    - ANOTHER_TOKEN
+```
+
+See the [Security guide](/docs/user-guide/security#environment-variable-passthrough) for full details.
+
+The script runs in a temporary directory that is cleaned up after execution. The child process runs in its own process group so it can be cleanly killed on timeout or interruption.
+
+## execute_code vs terminal
+
+| Use Case | execute_code | terminal |
+|----------|-------------|----------|
+| Multi-step workflows with tool calls between | ✅ | ❌ |
+| Simple shell command | ❌ | ✅ |
+| Filtering/processing large tool outputs | ✅ | ❌ |
+| Running a build or test suite | ❌ | ✅ |
+| Looping over search results | ✅ | ❌ |
+| Interactive/background processes | ❌ | ✅ |
+| Needs API keys in environment | ⚠️ Only via [passthrough](/docs/user-guide/security#environment-variable-passthrough) | ✅ (most pass through) |
+
+**Rule of thumb:** Use `execute_code` when you need to call Hermes tools programmatically with logic between calls. Use `terminal` for running shell commands, builds, and processes.
+
+## Platform Support
+
+Code execution requires Unix domain sockets and is available on **Linux and macOS only**. It is automatically disabled on Windows — the agent falls back to regular sequential tool calls.
--- a/hermes_code/website/docs/user-guide/features/context-files.md
+++ b/hermes_code/website/docs/user-guide/features/context-files.md
@ -0,0 +1,201 @@
+---
+sidebar_position: 8
+title: "Context Files"
+description: "Project context files — .hermes.md, AGENTS.md, CLAUDE.md, global SOUL.md, and .cursorrules — automatically injected into every conversation"
+---
+
+# Context Files
+
+Hermes Agent automatically discovers and loads context files that shape how it behaves. Some are project-local and discovered from your working directory. `SOUL.md` is now global to the Hermes instance and is loaded from `HERMES_HOME` only.
+
+## Supported Context Files
+
+| File | Purpose | Discovery |
+|------|---------|-----------| 
+| **.hermes.md** / **HERMES.md** | Project instructions (highest priority) | Walks to git root |
+| **AGENTS.md** | Project instructions, conventions, architecture | Recursive (walks subdirectories) |
+| **CLAUDE.md** | Claude Code context files (also detected) | CWD only |
+| **SOUL.md** | Global personality and tone customization for this Hermes instance | `HERMES_HOME/SOUL.md` only |
+| **.cursorrules** | Cursor IDE coding conventions | CWD only |
+| **.cursor/rules/*.mdc** | Cursor IDE rule modules | CWD only |
+
+:::info Priority system
+Only **one** project context type is loaded per session (first match wins): `.hermes.md` → `AGENTS.md` → `CLAUDE.md` → `.cursorrules`. **SOUL.md** is always loaded independently as the agent identity (slot #1).
+:::
+
+## AGENTS.md
+
+`AGENTS.md` is the primary project context file. It tells the agent how your project is structured, what conventions to follow, and any special instructions.
+
+### Hierarchical Discovery
+
+Hermes walks the directory tree starting from the working directory and loads **all** `AGENTS.md` files found, sorted by depth. This supports monorepo-style setups:
+
+```
+my-project/
+├── AGENTS.md              ← Top-level project context
+├── frontend/
+│   └── AGENTS.md          ← Frontend-specific instructions
+├── backend/
+│   └── AGENTS.md          ← Backend-specific instructions
+└── shared/
+    └── AGENTS.md          ← Shared library conventions
+```
+
+All four files are concatenated into a single context block with relative path headers.
+
+:::info
+Directories that are skipped during the walk: `.`-prefixed dirs, `node_modules`, `__pycache__`, `venv`, `.venv`.
+:::
+
+### Example AGENTS.md
+
+```markdown
+# Project Context
+
+This is a Next.js 14 web application with a Python FastAPI backend.
+
+## Architecture
+- Frontend: Next.js 14 with App Router in `/frontend`
+- Backend: FastAPI in `/backend`, uses SQLAlchemy ORM
+- Database: PostgreSQL 16
+- Deployment: Docker Compose on a Hetzner VPS
+
+## Conventions
+- Use TypeScript strict mode for all frontend code
+- Python code follows PEP 8, use type hints everywhere
+- All API endpoints return JSON with `{data, error, meta}` shape
+- Tests go in `__tests__/` directories (frontend) or `tests/` (backend)
+
+## Important Notes
+- Never modify migration files directly — use Alembic commands
+- The `.env.local` file has real API keys, don't commit it
+- Frontend port is 3000, backend is 8000, DB is 5432
+```
+
+## SOUL.md
+
+`SOUL.md` controls the agent's personality, tone, and communication style. See the [Personality](/docs/user-guide/features/personality) page for full details.
+
+**Location:**
+
+- `~/.hermes/SOUL.md`
+- or `$HERMES_HOME/SOUL.md` if you run Hermes with a custom home directory
+
+Important details:
+
+- Hermes seeds a default `SOUL.md` automatically if one does not exist yet
+- Hermes loads `SOUL.md` only from `HERMES_HOME`
+- Hermes does not probe the working directory for `SOUL.md`
+- If the file is empty, nothing from `SOUL.md` is added to the prompt
+- If the file has content, the content is injected verbatim after scanning and truncation
+
+## .cursorrules
+
+Hermes is compatible with Cursor IDE's `.cursorrules` file and `.cursor/rules/*.mdc` rule modules. If these files exist in your project root and no higher-priority context file (`.hermes.md`, `AGENTS.md`, or `CLAUDE.md`) is found, they're loaded as the project context.
+
+This means your existing Cursor conventions automatically apply when using Hermes.
+
+## How Context Files Are Loaded
+
+Context files are loaded by `build_context_files_prompt()` in `agent/prompt_builder.py`:
+
+1. **At session start** — the function scans the working directory
+2. **Content is read** — each file is read as UTF-8 text
+3. **Security scan** — content is checked for prompt injection patterns
+4. **Truncation** — files exceeding 20,000 characters are head/tail truncated (70% head, 20% tail, with a marker in the middle)
+5. **Assembly** — all sections are combined under a `# Project Context` header
+6. **Injection** — the assembled content is added to the system prompt
+
+The final prompt section looks roughly like:
+
+```text
+# Project Context
+
+The following project context files have been loaded and should be followed:
+
+## AGENTS.md
+
+[Your AGENTS.md content here]
+
+## .cursorrules
+
+[Your .cursorrules content here]
+
+[Your SOUL.md content here]
+```
+
+Notice that SOUL content is inserted directly, without extra wrapper text.
+
+## Security: Prompt Injection Protection
+
+All context files are scanned for potential prompt injection before being included. The scanner checks for:
+
+- **Instruction override attempts**: "ignore previous instructions", "disregard your rules"
+- **Deception patterns**: "do not tell the user"
+- **System prompt overrides**: "system prompt override"
+- **Hidden HTML comments**: `<!-- ignore instructions -->`
+- **Hidden div elements**: `<div style="display:none">`
+- **Credential exfiltration**: `curl ... $API_KEY`
+- **Secret file access**: `cat .env`, `cat credentials`
+- **Invisible characters**: zero-width spaces, bidirectional overrides, word joiners
+
+If any threat pattern is detected, the file is blocked:
+
+```
+[BLOCKED: AGENTS.md contained potential prompt injection (prompt_injection). Content not loaded.]
+```
+
+:::warning
+This scanner protects against common injection patterns, but it's not a substitute for reviewing context files in shared repositories. Always validate AGENTS.md content in projects you didn't author.
+:::
+
+## Size Limits
+
+| Limit | Value |
+|-------|-------|
+| Max chars per file | 20,000 (~7,000 tokens) |
+| Head truncation ratio | 70% |
+| Tail truncation ratio | 20% |
+| Truncation marker | 10% (shows char counts and suggests using file tools) |
+
+When a file exceeds 20,000 characters, the truncation message reads:
+
+```
+[...truncated AGENTS.md: kept 14000+4000 of 25000 chars. Use file tools to read the full file.]
+```
+
+## Tips for Effective Context Files
+
+:::tip Best practices for AGENTS.md
+1. **Keep it concise** — stay well under 20K chars; the agent reads it every turn
+2. **Structure with headers** — use `##` sections for architecture, conventions, important notes
+3. **Include concrete examples** — show preferred code patterns, API shapes, naming conventions
+4. **Mention what NOT to do** — "never modify migration files directly"
+5. **List key paths and ports** — the agent uses these for terminal commands
+6. **Update as the project evolves** — stale context is worse than no context
+:::
+
+### Per-Subdirectory Context
+
+For monorepos, put subdirectory-specific instructions in nested AGENTS.md files:
+
+```markdown
+<!-- frontend/AGENTS.md -->
+# Frontend Context
+
+- Use `pnpm` not `npm` for package management
+- Components go in `src/components/`, pages in `src/app/`
+- Use Tailwind CSS, never inline styles
+- Run tests with `pnpm test`
+```
+
+```markdown
+<!-- backend/AGENTS.md -->
+# Backend Context
+
+- Use `poetry` for dependency management
+- Run the dev server with `poetry run uvicorn main:app --reload`
+- All endpoints need OpenAPI docstrings
+- Database models are in `models/`, schemas in `schemas/`
+```
--- a/hermes_code/website/docs/user-guide/features/context-references.md
+++ b/hermes_code/website/docs/user-guide/features/context-references.md
@ -0,0 +1,109 @@
+---
+sidebar_position: 9
+title: "Context References"
+description: "Inline @-syntax for attaching files, folders, git diffs, and URLs directly into your messages"
+---
+
+# Context References
+
+Type `@` followed by a reference to inject content directly into your message. Hermes expands the reference inline and appends the content under an `--- Attached Context ---` section.
+
+## Supported References
+
+| Syntax | Description |
+|--------|-------------|
+| `@file:path/to/file.py` | Inject file contents |
+| `@file:path/to/file.py:10-25` | Inject specific line range (1-indexed, inclusive) |
+| `@folder:path/to/dir` | Inject directory tree listing with file metadata |
+| `@diff` | Inject `git diff` (unstaged working tree changes) |
+| `@staged` | Inject `git diff --staged` (staged changes) |
+| `@git:5` | Inject last N commits with patches (max 10) |
+| `@url:https://example.com` | Fetch and inject web page content |
+
+## Usage Examples
+
+```text
+Review @file:src/main.py and suggest improvements
+
+What changed? @diff
+
+Compare @file:old_config.yaml and @file:new_config.yaml
+
+What's in @folder:src/components?
+
+Summarize this article @url:https://arxiv.org/abs/2301.00001
+```
+
+Multiple references work in a single message:
+
+```text
+Check @file:main.py, and also @file:test.py.
+```
+
+Trailing punctuation (`,`, `.`, `;`, `!`, `?`) is automatically stripped from reference values.
+
+## CLI Tab Completion
+
+In the interactive CLI, typing `@` triggers autocomplete:
+
+- `@` shows all reference types (`@diff`, `@staged`, `@file:`, `@folder:`, `@git:`, `@url:`)
+- `@file:` and `@folder:` trigger filesystem path completion with file size metadata
+- Bare `@` followed by partial text shows matching files and folders from the current directory
+
+## Line Ranges
+
+The `@file:` reference supports line ranges for precise content injection:
+
+```text
+@file:src/main.py:42        # Single line 42
+@file:src/main.py:10-25     # Lines 10 through 25 (inclusive)
+```
+
+Lines are 1-indexed. Invalid ranges are silently ignored (full file is returned).
+
+## Size Limits
+
+Context references are bounded to prevent overwhelming the model's context window:
+
+| Threshold | Value | Behavior |
+|-----------|-------|----------|
+| Soft limit | 25% of context length | Warning appended, expansion proceeds |
+| Hard limit | 50% of context length | Expansion refused, original message returned unchanged |
+| Folder entries | 200 files max | Excess entries replaced with `- ...` |
+| Git commits | 10 max | `@git:N` clamped to range [1, 10] |
+
+## Security
+
+### Sensitive Path Blocking
+
+These paths are always blocked from `@file:` references to prevent credential exposure:
+
+- SSH keys and config: `~/.ssh/id_rsa`, `~/.ssh/id_ed25519`, `~/.ssh/authorized_keys`, `~/.ssh/config`
+- Shell profiles: `~/.bashrc`, `~/.zshrc`, `~/.profile`, `~/.bash_profile`, `~/.zprofile`
+- Credential files: `~/.netrc`, `~/.pgpass`, `~/.npmrc`, `~/.pypirc`
+- Hermes env: `$HERMES_HOME/.env`
+
+These directories are fully blocked (any file inside):
+- `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `~/.kube/`, `$HERMES_HOME/skills/.hub/`
+
+### Path Traversal Protection
+
+All paths are resolved relative to the working directory. References that resolve outside the allowed workspace root are rejected.
+
+### Binary File Detection
+
+Binary files are detected via MIME type and null-byte scanning. Known text extensions (`.py`, `.md`, `.json`, `.yaml`, `.toml`, `.js`, `.ts`, etc.) bypass MIME-based detection. Binary files are rejected with a warning.
+
+## Error Handling
+
+Invalid references produce inline warnings rather than failures:
+
+| Condition | Behavior |
+|-----------|----------|
+| File not found | Warning: "file not found" |
+| Binary file | Warning: "binary files are not supported" |
+| Folder not found | Warning: "folder not found" |
+| Git command fails | Warning with git stderr |
+| URL returns no content | Warning: "no content extracted" |
+| Sensitive path | Warning: "path is a sensitive credential file" |
+| Path outside workspace | Warning: "path is outside the allowed workspace" |
--- a/hermes_code/website/docs/user-guide/features/cron.md
+++ b/hermes_code/website/docs/user-guide/features/cron.md
@ -0,0 +1,285 @@
+---
+sidebar_position: 5
+title: "Scheduled Tasks (Cron)"
+description: "Schedule automated tasks with natural language, manage them with one cron tool, and attach one or more skills"
+---
+
+# Scheduled Tasks (Cron)
+
+Schedule tasks to run automatically with natural language or cron expressions. Hermes exposes cron management through a single `cronjob` tool with action-style operations instead of separate schedule/list/remove tools.
+
+## What cron can do now
+
+Cron jobs can:
+
+- schedule one-shot or recurring tasks
+- pause, resume, edit, trigger, and remove jobs
+- attach zero, one, or multiple skills to a job
+- deliver results back to the origin chat, local files, or configured platform targets
+- run in fresh agent sessions with the normal static tool list
+
+:::warning
+Cron-run sessions cannot recursively create more cron jobs. Hermes disables cron management tools inside cron executions to prevent runaway scheduling loops.
+:::
+
+## Creating scheduled tasks
+
+### In chat with `/cron`
+
+```bash
+/cron add 30m "Remind me to check the build"
+/cron add "every 2h" "Check server status"
+/cron add "every 1h" "Summarize new feed items" --skill blogwatcher
+/cron add "every 1h" "Use both skills and combine the result" --skill blogwatcher --skill find-nearby
+```
+
+### From the standalone CLI
+
+```bash
+hermes cron create "every 2h" "Check server status"
+hermes cron create "every 1h" "Summarize new feed items" --skill blogwatcher
+hermes cron create "every 1h" "Use both skills and combine the result" \
+  --skill blogwatcher \
+  --skill find-nearby \
+  --name "Skill combo"
+```
+
+### Through natural conversation
+
+Ask Hermes normally:
+
+```text
+Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram.
+```
+
+Hermes will use the unified `cronjob` tool internally.
+
+## Skill-backed cron jobs
+
+A cron job can load one or more skills before it runs the prompt.
+
+### Single skill
+
+```python
+cronjob(
+    action="create",
+    skill="blogwatcher",
+    prompt="Check the configured feeds and summarize anything new.",
+    schedule="0 9 * * *",
+    name="Morning feeds",
+)
+```
+
+### Multiple skills
+
+Skills are loaded in order. The prompt becomes the task instruction layered on top of those skills.
+
+```python
+cronjob(
+    action="create",
+    skills=["blogwatcher", "find-nearby"],
+    prompt="Look for new local events and interesting nearby places, then combine them into one short brief.",
+    schedule="every 6h",
+    name="Local brief",
+)
+```
+
+This is useful when you want a scheduled agent to inherit reusable workflows without stuffing the full skill text into the cron prompt itself.
+
+## Editing jobs
+
+You do not need to delete and recreate jobs just to change them.
+
+### Chat
+
+```bash
+/cron edit <job_id> --schedule "every 4h"
+/cron edit <job_id> --prompt "Use the revised task"
+/cron edit <job_id> --skill blogwatcher --skill find-nearby
+/cron edit <job_id> --remove-skill blogwatcher
+/cron edit <job_id> --clear-skills
+```
+
+### Standalone CLI
+
+```bash
+hermes cron edit <job_id> --schedule "every 4h"
+hermes cron edit <job_id> --prompt "Use the revised task"
+hermes cron edit <job_id> --skill blogwatcher --skill find-nearby
+hermes cron edit <job_id> --add-skill find-nearby
+hermes cron edit <job_id> --remove-skill blogwatcher
+hermes cron edit <job_id> --clear-skills
+```
+
+Notes:
+
+- repeated `--skill` replaces the job's attached skill list
+- `--add-skill` appends to the existing list without replacing it
+- `--remove-skill` removes specific attached skills
+- `--clear-skills` removes all attached skills
+
+## Lifecycle actions
+
+Cron jobs now have a fuller lifecycle than just create/remove.
+
+### Chat
+
+```bash
+/cron list
+/cron pause <job_id>
+/cron resume <job_id>
+/cron run <job_id>
+/cron remove <job_id>
+```
+
+### Standalone CLI
+
+```bash
+hermes cron list
+hermes cron pause <job_id>
+hermes cron resume <job_id>
+hermes cron run <job_id>
+hermes cron remove <job_id>
+hermes cron status
+hermes cron tick
+```
+
+What they do:
+
+- `pause` — keep the job but stop scheduling it
+- `resume` — re-enable the job and compute the next future run
+- `run` — trigger the job on the next scheduler tick
+- `remove` — delete it entirely
+
+## How it works
+
+**Cron execution is handled by the gateway daemon.** The gateway ticks the scheduler every 60 seconds, running any due jobs in isolated agent sessions.
+
+```bash
+hermes gateway install     # Install as a user service
+sudo hermes gateway install --system   # Linux: boot-time system service for servers
+hermes gateway             # Or run in foreground
+
+hermes cron list
+hermes cron status
+```
+
+### Gateway scheduler behavior
+
+On each tick Hermes:
+
+1. loads jobs from `~/.hermes/cron/jobs.json`
+2. checks `next_run_at` against the current time
+3. starts a fresh `AIAgent` session for each due job
+4. optionally injects one or more attached skills into that fresh session
+5. runs the prompt to completion
+6. delivers the final response
+7. updates run metadata and the next scheduled time
+
+A file lock at `~/.hermes/cron/.tick.lock` prevents overlapping scheduler ticks from double-running the same job batch.
+
+## Delivery options
+
+When scheduling jobs, you specify where the output goes:
+
+| Option | Description | Example |
+|--------|-------------|---------|
+| `"origin"` | Back to where the job was created | Default on messaging platforms |
+| `"local"` | Save to local files only (`~/.hermes/cron/output/`) | Default on CLI |
+| `"telegram"` | Telegram home channel | Uses `TELEGRAM_HOME_CHANNEL` |
+| `"discord"` | Discord home channel | Uses `DISCORD_HOME_CHANNEL` |
+| `"telegram:123456"` | Specific Telegram chat by ID | Direct delivery |
+| `"discord:987654"` | Specific Discord channel by ID | Direct delivery |
+
+The agent's final response is automatically delivered. You do not need to call `send_message` in the cron prompt.
+
+## Schedule formats
+
+The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets.
+
+### Relative delays (one-shot)
+
+```text
+30m     → Run once in 30 minutes
+2h      → Run once in 2 hours
+1d      → Run once in 1 day
+```
+
+### Intervals (recurring)
+
+```text
+every 30m    → Every 30 minutes
+every 2h     → Every 2 hours
+every 1d     → Every day
+```
+
+### Cron expressions
+
+```text
+0 9 * * *       → Daily at 9:00 AM
+0 9 * * 1-5     → Weekdays at 9:00 AM
+0 */6 * * *     → Every 6 hours
+30 8 1 * *      → First of every month at 8:30 AM
+0 0 * * 0       → Every Sunday at midnight
+```
+
+### ISO timestamps
+
+```text
+2026-03-15T09:00:00    → One-time at March 15, 2026 9:00 AM
+```
+
+## Repeat behavior
+
+| Schedule type | Default repeat | Behavior |
+|--------------|----------------|----------|
+| One-shot (`30m`, timestamp) | 1 | Runs once |
+| Interval (`every 2h`) | forever | Runs until removed |
+| Cron expression | forever | Runs until removed |
+
+You can override it:
+
+```python
+cronjob(
+    action="create",
+    prompt="...",
+    schedule="every 2h",
+    repeat=5,
+)
+```
+
+## Managing jobs programmatically
+
+The agent-facing API is one tool:
+
+```python
+cronjob(action="create", ...)
+cronjob(action="list")
+cronjob(action="update", job_id="...")
+cronjob(action="pause", job_id="...")
+cronjob(action="resume", job_id="...")
+cronjob(action="run", job_id="...")
+cronjob(action="remove", job_id="...")
+```
+
+For `update`, pass `skills=[]` to remove all attached skills.
+
+## Job storage
+
+Jobs are stored in `~/.hermes/cron/jobs.json`. Output from job runs is saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md`.
+
+The storage uses atomic file writes so interrupted writes do not leave a partially written job file behind.
+
+## Self-contained prompts still matter
+
+:::warning Important
+Cron jobs run in a completely fresh agent session. The prompt must contain everything the agent needs that is not already provided by attached skills.
+:::
+
+**BAD:** `"Check on that server issue"`
+
+**GOOD:** `"SSH into server 192.168.1.100 as user 'deploy', check if nginx is running with 'systemctl status nginx', and verify https://example.com returns HTTP 200."`
+
+## Security
+
+Scheduled task prompts are scanned for prompt-injection and credential-exfiltration patterns at creation and update time. Prompts containing invisible Unicode tricks, SSH backdoor attempts, or obvious secret-exfiltration payloads are blocked.
--- a/hermes_code/website/docs/user-guide/features/delegation.md
+++ b/hermes_code/website/docs/user-guide/features/delegation.md
@ -0,0 +1,222 @@
+---
+sidebar_position: 7
+title: "Subagent Delegation"
+description: "Spawn isolated child agents for parallel workstreams with delegate_task"
+---
+
+# Subagent Delegation
+
+The `delegate_task` tool spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Each child gets a fresh conversation and works independently — only its final summary enters the parent's context.
+
+## Single Task
+
+```python
+delegate_task(
+    goal="Debug why tests fail",
+    context="Error: assertion in test_foo.py line 42",
+    toolsets=["terminal", "file"]
+)
+```
+
+## Parallel Batch
+
+Up to 3 concurrent subagents:
+
+```python
+delegate_task(tasks=[
+    {"goal": "Research topic A", "toolsets": ["web"]},
+    {"goal": "Research topic B", "toolsets": ["web"]},
+    {"goal": "Fix the build", "toolsets": ["terminal", "file"]}
+])
+```
+
+## How Subagent Context Works
+
+:::warning Critical: Subagents Know Nothing
+Subagents start with a **completely fresh conversation**. They have zero knowledge of the parent's conversation history, prior tool calls, or anything discussed before delegation. The subagent's only context comes from the `goal` and `context` fields you provide.
+:::
+
+This means you must pass **everything** the subagent needs:
+
+```python
+# BAD - subagent has no idea what "the error" is
+delegate_task(goal="Fix the error")
+
+# GOOD - subagent has all context it needs
+delegate_task(
+    goal="Fix the TypeError in api/handlers.py",
+    context="""The file api/handlers.py has a TypeError on line 47:
+    'NoneType' object has no attribute 'get'.
+    The function process_request() receives a dict from parse_body(),
+    but parse_body() returns None when Content-Type is missing.
+    The project is at /home/user/myproject and uses Python 3.11."""
+)
+```
+
+The subagent receives a focused system prompt built from your goal and context, instructing it to complete the task and provide a structured summary of what it did, what it found, any files modified, and any issues encountered.
+
+## Practical Examples
+
+### Parallel Research
+
+Research multiple topics simultaneously and collect summaries:
+
+```python
+delegate_task(tasks=[
+    {
+        "goal": "Research the current state of WebAssembly in 2025",
+        "context": "Focus on: browser support, non-browser runtimes, language support",
+        "toolsets": ["web"]
+    },
+    {
+        "goal": "Research the current state of RISC-V adoption in 2025",
+        "context": "Focus on: server chips, embedded systems, software ecosystem",
+        "toolsets": ["web"]
+    },
+    {
+        "goal": "Research quantum computing progress in 2025",
+        "context": "Focus on: error correction breakthroughs, practical applications, key players",
+        "toolsets": ["web"]
+    }
+])
+```
+
+### Code Review + Fix
+
+Delegate a review-and-fix workflow to a fresh context:
+
+```python
+delegate_task(
+    goal="Review the authentication module for security issues and fix any found",
+    context="""Project at /home/user/webapp.
+    Auth module files: src/auth/login.py, src/auth/jwt.py, src/auth/middleware.py.
+    The project uses Flask, PyJWT, and bcrypt.
+    Focus on: SQL injection, JWT validation, password handling, session management.
+    Fix any issues found and run the test suite (pytest tests/auth/).""",
+    toolsets=["terminal", "file"]
+)
+```
+
+### Multi-File Refactoring
+
+Delegate a large refactoring task that would flood the parent's context:
+
+```python
+delegate_task(
+    goal="Refactor all Python files in src/ to replace print() with proper logging",
+    context="""Project at /home/user/myproject.
+    Use the 'logging' module with logger = logging.getLogger(__name__).
+    Replace print() calls with appropriate log levels:
+    - print(f"Error: ...") -> logger.error(...)
+    - print(f"Warning: ...") -> logger.warning(...)
+    - print(f"Debug: ...") -> logger.debug(...)
+    - Other prints -> logger.info(...)
+    Don't change print() in test files or CLI output.
+    Run pytest after to verify nothing broke.""",
+    toolsets=["terminal", "file"]
+)
+```
+
+## Batch Mode Details
+
+When you provide a `tasks` array, subagents run in **parallel** using a thread pool:
+
+- **Maximum concurrency:** 3 tasks (the `tasks` array is truncated to 3 if longer)
+- **Thread pool:** Uses `ThreadPoolExecutor` with `MAX_CONCURRENT_CHILDREN = 3` workers
+- **Progress display:** In CLI mode, a tree-view shows tool calls from each subagent in real-time with per-task completion lines. In gateway mode, progress is batched and relayed to the parent's progress callback
+- **Result ordering:** Results are sorted by task index to match input order regardless of completion order
+- **Interrupt propagation:** Interrupting the parent (e.g., sending a new message) interrupts all active children
+
+Single-task delegation runs directly without thread pool overhead.
+
+## Model Override
+
+You can configure a different model for subagents via `config.yaml` — useful for delegating simple tasks to cheaper/faster models:
+
+```yaml
+# In ~/.hermes/config.yaml
+delegation:
+  model: "google/gemini-flash-2.0"    # Cheaper model for subagents
+  provider: "openrouter"              # Optional: route subagents to a different provider
+```
+
+If omitted, subagents use the same model as the parent.
+
+## Toolset Selection Tips
+
+The `toolsets` parameter controls what tools the subagent has access to. Choose based on the task:
+
+| Toolset Pattern | Use Case |
+|----------------|----------|
+| `["terminal", "file"]` | Code work, debugging, file editing, builds |
+| `["web"]` | Research, fact-checking, documentation lookup |
+| `["terminal", "file", "web"]` | Full-stack tasks (default) |
+| `["file"]` | Read-only analysis, code review without execution |
+| `["terminal"]` | System administration, process management |
+
+Certain toolsets are **always blocked** for subagents regardless of what you specify:
+- `delegation` — no recursive delegation (prevents infinite spawning)
+- `clarify` — subagents cannot interact with the user
+- `memory` — no writes to shared persistent memory
+- `code_execution` — children should reason step-by-step
+- `send_message` — no cross-platform side effects (e.g., sending Telegram messages)
+
+## Max Iterations
+
+Each subagent has an iteration limit (default: 50) that controls how many tool-calling turns it can take:
+
+```python
+delegate_task(
+    goal="Quick file check",
+    context="Check if /etc/nginx/nginx.conf exists and print its first 10 lines",
+    max_iterations=10  # Simple task, don't need many turns
+)
+```
+
+## Depth Limit
+
+Delegation has a **depth limit of 2** — a parent (depth 0) can spawn children (depth 1), but children cannot delegate further. This prevents runaway recursive delegation chains.
+
+## Key Properties
+
+- Each subagent gets its **own terminal session** (separate from the parent)
+- **No nested delegation** — children cannot delegate further (no grandchildren)
+- Subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`
+- **Interrupt propagation** — interrupting the parent interrupts all active children
+- Only the final summary enters the parent's context, keeping token usage efficient
+- Subagents inherit the parent's **API key and provider configuration**
+
+## Delegation vs execute_code
+
+| Factor | delegate_task | execute_code |
+|--------|--------------|-------------|
+| **Reasoning** | Full LLM reasoning loop | Just Python code execution |
+| **Context** | Fresh isolated conversation | No conversation, just script |
+| **Tool access** | All non-blocked tools with reasoning | 7 tools via RPC, no reasoning |
+| **Parallelism** | Up to 3 concurrent subagents | Single script |
+| **Best for** | Complex tasks needing judgment | Mechanical multi-step pipelines |
+| **Token cost** | Higher (full LLM loop) | Lower (only stdout returned) |
+| **User interaction** | None (subagents can't clarify) | None |
+
+**Rule of thumb:** Use `delegate_task` when the subtask requires reasoning, judgment, or multi-step problem solving. Use `execute_code` when you need mechanical data processing or scripted workflows.
+
+## Configuration
+
+```yaml
+# In ~/.hermes/config.yaml
+delegation:
+  max_iterations: 50                        # Max turns per child (default: 50)
+  default_toolsets: ["terminal", "file", "web"]  # Default toolsets
+  model: "google/gemini-3-flash-preview"             # Optional provider/model override
+  provider: "openrouter"                             # Optional built-in provider
+
+# Or use a direct custom endpoint instead of provider:
+delegation:
+  model: "qwen2.5-coder"
+  base_url: "http://localhost:1234/v1"
+  api_key: "local-key"
+```
+
+:::tip
+The agent handles delegation automatically based on the task complexity. You don't need to explicitly ask it to delegate — it will do so when it makes sense.
+:::
--- a/hermes_code/website/docs/user-guide/features/fallback-providers.md
+++ b/hermes_code/website/docs/user-guide/features/fallback-providers.md
@ -0,0 +1,323 @@
+---
+title: Fallback Providers
+description: Configure automatic failover to backup LLM providers when your primary model is unavailable.
+sidebar_label: Fallback Providers
+sidebar_position: 8
+---
+
+# Fallback Providers
+
+Hermes Agent has two separate fallback systems that keep your sessions running when providers hit issues:
+
+1. **Primary model fallback** — automatically switches to a backup provider:model when your main model fails
+2. **Auxiliary task fallback** — independent provider resolution for side tasks like vision, compression, and web extraction
+
+Both are optional and work independently.
+
+## Primary Model Fallback
+
+When your main LLM provider encounters errors — rate limits, server overload, auth failures, connection drops — Hermes can automatically switch to a backup provider:model pair mid-session without losing your conversation.
+
+### Configuration
+
+Add a `fallback_model` section to `~/.hermes/config.yaml`:
+
+```yaml
+fallback_model:
+  provider: openrouter
+  model: anthropic/claude-sonnet-4
+```
+
+Both `provider` and `model` are **required**. If either is missing, the fallback is disabled.
+
+### Supported Providers
+
+| Provider | Value | Requirements |
+|----------|-------|-------------|
+| AI Gateway | `ai-gateway` | `AI_GATEWAY_API_KEY` |
+| OpenRouter | `openrouter` | `OPENROUTER_API_KEY` |
+| Nous Portal | `nous` | `hermes login` (OAuth) |
+| OpenAI Codex | `openai-codex` | `hermes model` (ChatGPT OAuth) |
+| Anthropic | `anthropic` | `ANTHROPIC_API_KEY` or Claude Code credentials |
+| z.ai / GLM | `zai` | `GLM_API_KEY` |
+| Kimi / Moonshot | `kimi-coding` | `KIMI_API_KEY` |
+| MiniMax | `minimax` | `MINIMAX_API_KEY` |
+| MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` |
+| Kilo Code | `kilocode` | `KILOCODE_API_KEY` |
+| Custom endpoint | `custom` | `base_url` + `api_key_env` (see below) |
+
+### Custom Endpoint Fallback
+
+For a custom OpenAI-compatible endpoint, add `base_url` and optionally `api_key_env`:
+
+```yaml
+fallback_model:
+  provider: custom
+  model: my-local-model
+  base_url: http://localhost:8000/v1
+  api_key_env: MY_LOCAL_KEY          # env var name containing the API key
+```
+
+### When Fallback Triggers
+
+The fallback activates automatically when the primary model fails with:
+
+- **Rate limits** (HTTP 429) — after exhausting retry attempts
+- **Server errors** (HTTP 500, 502, 503) — after exhausting retry attempts
+- **Auth failures** (HTTP 401, 403) — immediately (no point retrying)
+- **Not found** (HTTP 404) — immediately
+- **Invalid responses** — when the API returns malformed or empty responses repeatedly
+
+When triggered, Hermes:
+
+1. Resolves credentials for the fallback provider
+2. Builds a new API client
+3. Swaps the model, provider, and client in-place
+4. Resets the retry counter and continues the conversation
+
+The switch is seamless — your conversation history, tool calls, and context are preserved. The agent continues from exactly where it left off, just using a different model.
+
+:::info One-Shot
+Fallback activates **at most once** per session. If the fallback provider also fails, normal error handling takes over (retries, then error message). This prevents cascading failover loops.
+:::
+
+### Examples
+
+**OpenRouter as fallback for Anthropic native:**
+```yaml
+model:
+  provider: anthropic
+  default: claude-sonnet-4-6
+
+fallback_model:
+  provider: openrouter
+  model: anthropic/claude-sonnet-4
+```
+
+**Nous Portal as fallback for OpenRouter:**
+```yaml
+model:
+  provider: openrouter
+  default: anthropic/claude-opus-4
+
+fallback_model:
+  provider: nous
+  model: nous-hermes-3
+```
+
+**Local model as fallback for cloud:**
+```yaml
+fallback_model:
+  provider: custom
+  model: llama-3.1-70b
+  base_url: http://localhost:8000/v1
+  api_key_env: LOCAL_API_KEY
+```
+
+**Codex OAuth as fallback:**
+```yaml
+fallback_model:
+  provider: openai-codex
+  model: gpt-5.3-codex
+```
+
+### Where Fallback Works
+
+| Context | Fallback Supported |
+|---------|-------------------|
+| CLI sessions | ✔ |
+| Messaging gateway (Telegram, Discord, etc.) | ✔ |
+| Subagent delegation | ✘ (subagents do not inherit fallback config) |
+| Cron jobs | ✘ (run with a fixed provider) |
+| Auxiliary tasks (vision, compression) | ✘ (use their own provider chain — see below) |
+
+:::tip
+There are no environment variables for `fallback_model` — it is configured exclusively through `config.yaml`. This is intentional: fallback configuration is a deliberate choice, not something a stale shell export should override.
+:::
+
+---
+
+## Auxiliary Task Fallback
+
+Hermes uses separate lightweight models for side tasks. Each task has its own provider resolution chain that acts as a built-in fallback system.
+
+### Tasks with Independent Provider Resolution
+
+| Task | What It Does | Config Key |
+|------|-------------|-----------|
+| Vision | Image analysis, browser screenshots | `auxiliary.vision` |
+| Web Extract | Web page summarization | `auxiliary.web_extract` |
+| Compression | Context compression summaries | `auxiliary.compression` or `compression.summary_provider` |
+| Session Search | Past session summarization | `auxiliary.session_search` |
+| Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
+| MCP | MCP helper operations | `auxiliary.mcp` |
+| Memory Flush | Memory consolidation | `auxiliary.flush_memories` |
+
+### Auto-Detection Chain
+
+When a task's provider is set to `"auto"` (the default), Hermes tries providers in order until one works:
+
+**For text tasks (compression, web extract, etc.):**
+
+```text
+OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
+API-key providers (z.ai, Kimi, MiniMax, Anthropic) → give up
+```
+
+**For vision tasks:**
+
+```text
+Main provider (if vision-capable) → OpenRouter → Nous Portal →
+Codex OAuth → Anthropic → Custom endpoint → give up
+```
+
+If the resolved provider fails at call time, Hermes also has an internal retry: if the provider is not OpenRouter and no explicit `base_url` is set, it tries OpenRouter as a last-resort fallback.
+
+### Configuring Auxiliary Providers
+
+Each task can be configured independently in `config.yaml`:
+
+```yaml
+auxiliary:
+  vision:
+    provider: "auto"              # auto | openrouter | nous | codex | main | anthropic
+    model: ""                     # e.g. "openai/gpt-4o"
+    base_url: ""                  # direct endpoint (takes precedence over provider)
+    api_key: ""                   # API key for base_url
+
+  web_extract:
+    provider: "auto"
+    model: ""
+
+  compression:
+    provider: "auto"
+    model: ""
+
+  session_search:
+    provider: "auto"
+    model: ""
+
+  skills_hub:
+    provider: "auto"
+    model: ""
+
+  mcp:
+    provider: "auto"
+    model: ""
+
+  flush_memories:
+    provider: "auto"
+    model: ""
+```
+
+Every task above follows the same **provider / model / base_url** pattern. Context compression uses its own top-level block:
+
+```yaml
+compression:
+  summary_provider: main                             # Same provider options as auxiliary tasks
+  summary_model: google/gemini-3-flash-preview
+  summary_base_url: null                             # Custom OpenAI-compatible endpoint
+```
+
+And the fallback model uses:
+
+```yaml
+fallback_model:
+  provider: openrouter
+  model: anthropic/claude-sonnet-4
+  # base_url: http://localhost:8000/v1               # Optional custom endpoint
+```
+
+All three — auxiliary, compression, fallback — work the same way: set `provider` to pick who handles the request, `model` to pick which model, and `base_url` to point at a custom endpoint (overrides provider).
+
+### Provider Options for Auxiliary Tasks
+
+| Provider | Description | Requirements |
+|----------|-------------|-------------|
+| `"auto"` | Try providers in order until one works (default) | At least one provider configured |
+| `"openrouter"` | Force OpenRouter | `OPENROUTER_API_KEY` |
+| `"nous"` | Force Nous Portal | `hermes login` |
+| `"codex"` | Force Codex OAuth | `hermes model` → Codex |
+| `"main"` | Use whatever provider the main agent uses | Active main provider configured |
+| `"anthropic"` | Force Anthropic native | `ANTHROPIC_API_KEY` or Claude Code credentials |
+
+### Direct Endpoint Override
+
+For any auxiliary task, setting `base_url` bypasses provider resolution entirely and sends requests directly to that endpoint:
+
+```yaml
+auxiliary:
+  vision:
+    base_url: "http://localhost:1234/v1"
+    api_key: "local-key"
+    model: "qwen2.5-vl"
+```
+
+`base_url` takes precedence over `provider`. Hermes uses the configured `api_key` for authentication, falling back to `OPENAI_API_KEY` if not set. It does **not** reuse `OPENROUTER_API_KEY` for custom endpoints.
+
+---
+
+## Context Compression Fallback
+
+Context compression has a legacy configuration path in addition to the auxiliary system:
+
+```yaml
+compression:
+  summary_provider: "auto"                    # auto | openrouter | nous | main
+  summary_model: "google/gemini-3-flash-preview"
+```
+
+This is equivalent to configuring `auxiliary.compression.provider` and `auxiliary.compression.model`. If both are set, the `auxiliary.compression` values take precedence.
+
+If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.
+
+---
+
+## Delegation Provider Override
+
+Subagents spawned by `delegate_task` do **not** use the primary fallback model. However, they can be routed to a different provider:model pair for cost optimization:
+
+```yaml
+delegation:
+  provider: "openrouter"                      # override provider for all subagents
+  model: "google/gemini-3-flash-preview"      # override model
+  # base_url: "http://localhost:1234/v1"      # or use a direct endpoint
+  # api_key: "local-key"
+```
+
+See [Subagent Delegation](/docs/user-guide/features/delegation) for full configuration details.
+
+---
+
+## Cron Job Providers
+
+Cron jobs run with whatever provider is configured at execution time. They do not support a fallback model. To use a different provider for cron jobs, configure `provider` and `model` overrides on the cron job itself:
+
+```python
+cronjob(
+    action="create",
+    schedule="every 2h",
+    prompt="Check server status",
+    provider="openrouter",
+    model="google/gemini-3-flash-preview"
+)
+```
+
+See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configuration details.
+
+---
+
+## Summary
+
+| Feature | Fallback Mechanism | Config Location |
+|---------|-------------------|----------------|
+| Main agent model | `fallback_model` in config.yaml — one-shot failover on errors | `fallback_model:` (top-level) |
+| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
+| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
+| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` or `compression.summary_provider` |
+| Session search | Auto-detection chain | `auxiliary.session_search` |
+| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
+| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
+| Memory flush | Auto-detection chain | `auxiliary.flush_memories` |
+| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |
+| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` |
--- a/hermes_code/website/docs/user-guide/features/honcho.md
+++ b/hermes_code/website/docs/user-guide/features/honcho.md
@ -0,0 +1,404 @@
+---
+title: Honcho Memory
+description: AI-native persistent memory for cross-session user modeling and personalization.
+sidebar_label: Honcho Memory
+sidebar_position: 8
+---
+
+# Honcho Memory
+
+[Honcho](https://honcho.dev) is an AI-native memory system that gives Hermes persistent, cross-session understanding of users. While Hermes has built-in memory (`MEMORY.md` and `USER.md`), Honcho adds a deeper layer of **user modeling** — learning preferences, goals, communication style, and context across conversations via a dual-peer architecture where both the user and the AI build representations over time.
+
+## Works Alongside Built-in Memory
+
+Hermes has two memory systems that can work together or be configured separately. In `hybrid` mode (the default), both run side by side — Honcho adds cross-session user modeling while local files handle agent-level notes.
+
+| Feature | Built-in Memory | Honcho Memory |
+|---------|----------------|---------------|
+| Storage | Local files (`~/.hermes/memories/`) | Cloud-hosted Honcho API |
+| Scope | Agent-level notes and user profile | Deep user modeling via dialectic reasoning |
+| Persistence | Across sessions on same machine | Across sessions, machines, and platforms |
+| Query | Injected into system prompt automatically | Prefetched + on-demand via tools |
+| Content | Manually curated by the agent | Automatically learned from conversations |
+| Write surface | `memory` tool (add/replace/remove) | `honcho_conclude` tool (persist facts) |
+
+Set `memoryMode` to `honcho` to use Honcho exclusively. See [Memory Modes](#memory-modes) for per-peer configuration.
+
+
+## Self-hosted / Docker
+
+Hermes supports a local Honcho instance (e.g. via Docker) in addition to the hosted API. Point it at your instance using `HONCHO_BASE_URL` — no API key required.
+
+**Via `hermes config`:**
+
+```bash
+hermes config set HONCHO_BASE_URL http://localhost:8000
+```
+
+**Via `~/.honcho/config.json`:**
+
+```json
+{
+  "hosts": {
+    "hermes": {
+      "base_url": "http://localhost:8000",
+      "enabled": true
+    }
+  }
+}
+```
+
+Hermes auto-enables Honcho when either `apiKey` or `base_url` is present, so no further configuration is needed for a local instance.
+
+To run Honcho locally, refer to the [Honcho self-hosting docs](https://docs.honcho.dev).
+
+## Setup
+
+### Interactive Setup
+
+```bash
+hermes honcho setup
+```
+
+The setup wizard walks through API key, peer names, workspace, memory mode, write frequency, recall mode, and session strategy. It offers to install `honcho-ai` if missing.
+
+### Manual Setup
+
+#### 1. Install the Client Library
+
+```bash
+pip install 'honcho-ai>=2.0.1'
+```
+
+#### 2. Get an API Key
+
+Go to [app.honcho.dev](https://app.honcho.dev) > Settings > API Keys.
+
+#### 3. Configure
+
+Honcho reads from `~/.honcho/config.json` (shared across all Honcho-enabled applications):
+
+```json
+{
+  "apiKey": "your-honcho-api-key",
+  "hosts": {
+    "hermes": {
+      "workspace": "hermes",
+      "peerName": "your-name",
+      "aiPeer": "hermes",
+      "memoryMode": "hybrid",
+      "writeFrequency": "async",
+      "recallMode": "hybrid",
+      "sessionStrategy": "per-session",
+      "enabled": true
+    }
+  }
+}
+```
+
+`apiKey` lives at the root because it is a shared credential across all Honcho-enabled tools. All other settings are scoped under `hosts.hermes`. The `hermes honcho setup` wizard writes this structure automatically.
+
+Or set the API key as an environment variable:
+
+```bash
+hermes config set HONCHO_API_KEY your-key
+```
+
+:::info
+When an API key is present (either in `~/.honcho/config.json` or as `HONCHO_API_KEY`), Honcho auto-enables unless explicitly set to `"enabled": false`.
+:::
+
+## Configuration
+
+### Global Config (`~/.honcho/config.json`)
+
+Settings are scoped to `hosts.hermes` and fall back to root-level globals when the host field is absent. Root-level keys are managed by the user or the honcho CLI -- Hermes only writes to its own host block (except `apiKey`, which is a shared credential at root).
+
+**Root-level (shared)**
+
+| Field | Default | Description |
+|-------|---------|-------------|
+| `apiKey` | — | Honcho API key (required, shared across all hosts) |
+| `sessions` | `{}` | Manual session name overrides per directory (shared) |
+
+**Host-level (`hosts.hermes`)**
+
+| Field | Default | Description |
+|-------|---------|-------------|
+| `workspace` | `"hermes"` | Workspace identifier |
+| `peerName` | *(derived)* | Your identity name for user modeling |
+| `aiPeer` | `"hermes"` | AI assistant identity name |
+| `environment` | `"production"` | Honcho environment |
+| `enabled` | *(auto)* | Auto-enables when API key is present |
+| `saveMessages` | `true` | Whether to sync messages to Honcho |
+| `memoryMode` | `"hybrid"` | Memory mode: `hybrid` or `honcho` |
+| `writeFrequency` | `"async"` | When to write: `async`, `turn`, `session`, or integer N |
+| `recallMode` | `"hybrid"` | Retrieval strategy: `hybrid`, `context`, or `tools` |
+| `sessionStrategy` | `"per-session"` | How sessions are scoped |
+| `sessionPeerPrefix` | `false` | Prefix session names with peer name |
+| `contextTokens` | *(Honcho default)* | Max tokens for auto-injected context |
+| `dialecticReasoningLevel` | `"low"` | Floor for dialectic reasoning: `minimal` / `low` / `medium` / `high` / `max` |
+| `dialecticMaxChars` | `600` | Char cap on dialectic results injected into system prompt |
+| `linkedHosts` | `[]` | Other host keys whose workspaces to cross-reference |
+
+All host-level fields fall back to the equivalent root-level key if not set under `hosts.hermes`. Existing configs with settings at root level continue to work.
+
+### Memory Modes
+
+| Mode | Effect |
+|------|--------|
+| `hybrid` | Write to both Honcho and local files (default) |
+| `honcho` | Honcho only — skip local file writes |
+
+Memory mode can be set globally or per-peer (user, agent1, agent2, etc):
+
+```json
+{
+  "memoryMode": {
+    "default": "hybrid",
+    "hermes": "honcho"
+  }
+}
+```
+
+To disable Honcho entirely, set `enabled: false` or remove the API key.
+
+### Recall Modes
+
+Controls how Honcho context reaches the agent:
+
+| Mode | Behavior |
+|------|----------|
+| `hybrid` | Auto-injected context + Honcho tools available (default) |
+| `context` | Auto-injected context only — Honcho tools hidden |
+| `tools` | Honcho tools only — no auto-injected context |
+
+### Write Frequency
+
+| Setting | Behavior |
+|---------|----------|
+| `async` | Background thread writes (zero blocking, default) |
+| `turn` | Synchronous write after each turn |
+| `session` | Batched write at session end |
+| *integer N* | Write every N turns |
+
+### Session Strategies
+
+| Strategy | Session key | Use case |
+|----------|-------------|----------|
+| `per-session` | Unique per run | Default. Fresh session every time. |
+| `per-directory` | CWD basename | Each project gets its own session. |
+| `per-repo` | Git repo root name | Groups subdirectories under one session. |
+| `global` | Fixed `"global"` | Single cross-project session. |
+
+Resolution order: manual map > session title > strategy-derived key > platform key.
+
+### Multi-host Configuration
+
+Multiple Honcho-enabled tools share `~/.honcho/config.json`. Each tool writes only to its own host block, reads its host block first, and falls back to root-level globals:
+
+```json
+{
+  "apiKey": "your-key",
+  "peerName": "eri",
+  "hosts": {
+    "hermes": {
+      "workspace": "my-workspace",
+      "aiPeer": "hermes-assistant",
+      "memoryMode": "honcho",
+      "linkedHosts": ["claude-code"],
+      "contextTokens": 2000,
+      "dialecticReasoningLevel": "medium"
+    },
+    "claude-code": {
+      "workspace": "my-workspace",
+      "aiPeer": "clawd"
+    }
+  }
+}
+```
+
+Resolution: `hosts.<tool>` field > root-level field > default. In this example, both tools share the root `apiKey` and `peerName`, but each has its own `aiPeer` and workspace settings.
+
+### Hermes Config (`~/.hermes/config.yaml`)
+
+Intentionally minimal — most configuration comes from `~/.honcho/config.json`:
+
+```yaml
+honcho: {}
+```
+
+## How It Works
+
+### Async Context Pipeline
+
+Honcho context is fetched asynchronously to avoid blocking the response path:
+
+```mermaid
+flowchart TD
+    user["User message"] --> cache["Consume cached Honcho context<br/>from the previous turn"]
+    cache --> prompt["Inject user, AI, and dialectic context<br/>into the system prompt"]
+    prompt --> llm["LLM call"]
+    llm --> response["Assistant response"]
+    response --> fetch["Start background fetch for Turn N+1"]
+    fetch --> ctx["Fetch context"]
+    fetch --> dia["Fetch dialectic"]
+    ctx --> next["Cache for the next turn"]
+    dia --> next
+```
+
+Turn 1 is a cold start (no cache). All subsequent turns consume cached results with zero HTTP latency on the response path. The system prompt on turn 1 uses only static context to preserve prefix cache hits at the LLM provider.
+
+### Dual-Peer Architecture
+
+Both the user and AI have peer representations in Honcho:
+
+- **User peer** — observed from user messages. Honcho learns preferences, goals, communication style.
+- **AI peer** — observed from assistant messages (`observe_me=True`). Honcho builds a representation of the agent's knowledge and behavior.
+
+Both representations are injected into the system prompt when available.
+
+### Dynamic Reasoning Level
+
+Dialectic queries scale reasoning effort with message complexity:
+
+| Message length | Reasoning level |
+|----------------|-----------------|
+| < 120 chars | Config default (typically `low`) |
+| 120-400 chars | One level above default (cap: `high`) |
+| > 400 chars | Two levels above default (cap: `high`) |
+
+`max` is never selected automatically.
+
+### Gateway Integration
+
+The gateway creates short-lived `AIAgent` instances per request. Honcho managers are owned at the gateway session layer (`_honcho_managers` dict) so they persist across requests within the same session and flush at real session boundaries (reset, resume, expiry, server stop).
+
+#### Session Isolation
+
+Each gateway session (e.g., a Telegram chat, a Discord channel) gets its own Honcho session context. The session key — derived from the platform and chat ID — is threaded through the entire tool dispatch chain so that Honcho tool calls always execute against the correct session, even when multiple users are messaging concurrently.
+
+This means:
+- **`honcho_profile`**, **`honcho_search`**, **`honcho_context`**, and **`honcho_conclude`** all resolve the correct session at call time, not at startup
+- Background memory flushes (triggered by `/reset`, `/resume`, or session expiry) preserve the original session key so they write to the correct Honcho session
+- Synthetic flush turns (where the agent saves memories before context is lost) skip Honcho sync to avoid polluting conversation history with internal bookkeeping
+
+#### Session Lifecycle
+
+| Event | What happens to Honcho |
+|-------|------------------------|
+| New message arrives | Agent inherits the gateway's Honcho manager + session key |
+| `/reset` | Memory flush fires with the old session key, then Honcho manager shuts down |
+| `/resume` | Current session is flushed, then the resumed session's Honcho context loads |
+| Session expiry | Automatic flush + shutdown after the configured idle timeout |
+| Gateway stop | All active Honcho managers are flushed and shut down gracefully |
+
+## Tools
+
+When Honcho is active, four tools become available. Availability is gated dynamically — they are invisible when Honcho is disabled.
+
+### `honcho_profile`
+
+Fast peer card retrieval (no LLM). Returns a curated list of key facts about the user.
+
+### `honcho_search`
+
+Semantic search over memory (no LLM). Returns raw excerpts ranked by relevance. Cheaper and faster than `honcho_context` — good for factual lookups.
+
+Parameters:
+- `query` (string) — search query
+- `max_tokens` (integer, optional) — result token budget
+
+### `honcho_context`
+
+Dialectic Q&A powered by Honcho's LLM. Synthesizes an answer from accumulated conversation history.
+
+Parameters:
+- `query` (string) — natural language question
+- `peer` (string, optional) — `"user"` (default) or `"ai"`. Querying `"ai"` asks about the assistant's own history and identity.
+
+Example queries the agent might make:
+
+```
+"What are this user's main goals?"
+"What communication style does this user prefer?"
+"What topics has this user discussed recently?"
+"What is this user's technical expertise level?"
+```
+
+### `honcho_conclude`
+
+Writes a fact to Honcho memory. Use when the user explicitly states a preference, correction, or project context worth remembering. Feeds into the user's peer card and representation.
+
+Parameters:
+- `conclusion` (string) — the fact to persist
+
+## CLI Commands
+
+```
+hermes honcho setup                        # Interactive setup wizard
+hermes honcho status                       # Show config and connection status
+hermes honcho sessions                     # List directory → session name mappings
+hermes honcho map <name>                   # Map current directory to a session name
+hermes honcho peer                         # Show peer names and dialectic settings
+hermes honcho peer --user NAME             # Set user peer name
+hermes honcho peer --ai NAME               # Set AI peer name
+hermes honcho peer --reasoning LEVEL       # Set dialectic reasoning level
+hermes honcho mode                         # Show current memory mode
+hermes honcho mode [hybrid|honcho|local]   # Set memory mode
+hermes honcho tokens                       # Show token budget settings
+hermes honcho tokens --context N           # Set context token cap
+hermes honcho tokens --dialectic N         # Set dialectic char cap
+hermes honcho identity                     # Show AI peer identity
+hermes honcho identity <file>              # Seed AI peer identity from file (SOUL.md, etc.)
+hermes honcho migrate                      # Migration guide: OpenClaw → Hermes + Honcho
+```
+
+### Doctor Integration
+
+`hermes doctor` includes a Honcho section that validates config, API key, and connection status.
+
+## Migration
+
+### From Local Memory
+
+When Honcho activates on an instance with existing local history, migration runs automatically:
+
+1. **Conversation history** — prior messages are uploaded as an XML transcript file
+2. **Memory files** — existing `MEMORY.md`, `USER.md`, and `SOUL.md` are uploaded for context
+
+### From OpenClaw
+
+```bash
+hermes honcho migrate
+```
+
+Walks through converting an OpenClaw native Honcho setup to the shared `~/.honcho/config.json` format.
+
+## AI Peer Identity
+
+Honcho can build a representation of the AI assistant over time (via `observe_me=True`). You can also seed the AI peer explicitly:
+
+```bash
+hermes honcho identity ~/.hermes/SOUL.md
+```
+
+This uploads the file content through Honcho's observation pipeline. The AI peer representation is then injected into the system prompt alongside the user's, giving the agent awareness of its own accumulated identity.
+
+```bash
+hermes honcho identity --show
+```
+
+Shows the current AI peer representation from Honcho.
+
+## Use Cases
+
+- **Personalized responses** — Honcho learns how each user prefers to communicate
+- **Goal tracking** — remembers what users are working toward across sessions
+- **Expertise adaptation** — adjusts technical depth based on user's background
+- **Cross-platform memory** — same user understanding across CLI, Telegram, Discord, etc.
+- **Multi-user support** — each user (via messaging platforms) gets their own user model
+
+:::tip
+Honcho is fully opt-in — zero behavior change when disabled or unconfigured. All Honcho calls are non-fatal; if the service is unreachable, the agent continues normally.
+:::
--- a/hermes_code/website/docs/user-guide/features/hooks.md
+++ b/hermes_code/website/docs/user-guide/features/hooks.md
@ -0,0 +1,182 @@
+---
+sidebar_position: 6
+title: "Event Hooks"
+description: "Run custom code at key lifecycle points — log activity, send alerts, post to webhooks"
+---
+
+# Event Hooks
+
+The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.
+
+## Creating a Hook
+
+Each hook is a directory under `~/.hermes/hooks/` containing two files:
+
+```text
+~/.hermes/hooks/
+└── my-hook/
+    ├── HOOK.yaml      # Declares which events to listen for
+    └── handler.py     # Python handler function
+```
+
+### HOOK.yaml
+
+```yaml
+name: my-hook
+description: Log all agent activity to a file
+events:
+  - agent:start
+  - agent:end
+  - agent:step
+```
+
+The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
+
+### handler.py
+
+```python
+import json
+from datetime import datetime
+from pathlib import Path
+
+LOG_FILE = Path.home() / ".hermes" / "hooks" / "my-hook" / "activity.log"
+
+async def handle(event_type: str, context: dict):
+    """Called for each subscribed event. Must be named 'handle'."""
+    entry = {
+        "timestamp": datetime.now().isoformat(),
+        "event": event_type,
+        **context,
+    }
+    with open(LOG_FILE, "a") as f:
+        f.write(json.dumps(entry) + "\n")
+```
+
+**Handler rules:**
+- Must be named `handle`
+- Receives `event_type` (string) and `context` (dict)
+- Can be `async def` or regular `def` — both work
+- Errors are caught and logged, never crashing the agent
+
+## Available Events
+
+| Event | When it fires | Context keys |
+|-------|---------------|--------------|
+| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
+| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
+| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
+| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
+| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
+| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
+| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
+
+### Wildcard Matching
+
+Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
+
+## Examples
+
+### Telegram Alert on Long Tasks
+
+Send yourself a message when the agent takes more than 10 steps:
+
+```yaml
+# ~/.hermes/hooks/long-task-alert/HOOK.yaml
+name: long-task-alert
+description: Alert when agent is taking many steps
+events:
+  - agent:step
+```
+
+```python
+# ~/.hermes/hooks/long-task-alert/handler.py
+import os
+import httpx
+
+THRESHOLD = 10
+BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
+CHAT_ID = os.getenv("TELEGRAM_HOME_CHANNEL")
+
+async def handle(event_type: str, context: dict):
+    iteration = context.get("iteration", 0)
+    if iteration == THRESHOLD and BOT_TOKEN and CHAT_ID:
+        tools = ", ".join(context.get("tool_names", []))
+        text = f"⚠️ Agent has been running for {iteration} steps. Last tools: {tools}"
+        async with httpx.AsyncClient() as client:
+            await client.post(
+                f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
+                json={"chat_id": CHAT_ID, "text": text},
+            )
+```
+
+### Command Usage Logger
+
+Track which slash commands are used:
+
+```yaml
+# ~/.hermes/hooks/command-logger/HOOK.yaml
+name: command-logger
+description: Log slash command usage
+events:
+  - command:*
+```
+
+```python
+# ~/.hermes/hooks/command-logger/handler.py
+import json
+from datetime import datetime
+from pathlib import Path
+
+LOG = Path.home() / ".hermes" / "logs" / "command_usage.jsonl"
+
+def handle(event_type: str, context: dict):
+    LOG.parent.mkdir(parents=True, exist_ok=True)
+    entry = {
+        "ts": datetime.now().isoformat(),
+        "command": context.get("command"),
+        "args": context.get("args"),
+        "platform": context.get("platform"),
+        "user": context.get("user_id"),
+    }
+    with open(LOG, "a") as f:
+        f.write(json.dumps(entry) + "\n")
+```
+
+### Session Start Webhook
+
+POST to an external service on new sessions:
+
+```yaml
+# ~/.hermes/hooks/session-webhook/HOOK.yaml
+name: session-webhook
+description: Notify external service on new sessions
+events:
+  - session:start
+  - session:reset
+```
+
+```python
+# ~/.hermes/hooks/session-webhook/handler.py
+import httpx
+
+WEBHOOK_URL = "https://your-service.example.com/hermes-events"
+
+async def handle(event_type: str, context: dict):
+    async with httpx.AsyncClient() as client:
+        await client.post(WEBHOOK_URL, json={
+            "event": event_type,
+            **context,
+        }, timeout=5)
+```
+
+## How It Works
+
+1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
+2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
+3. Handlers are registered for their declared events
+4. At each lifecycle point, `hooks.emit()` fires all matching handlers
+5. Errors in any handler are caught and logged — a broken hook never crashes the agent
+
+:::info
+Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
+:::
--- a/hermes_code/website/docs/user-guide/features/image-generation.md
+++ b/hermes_code/website/docs/user-guide/features/image-generation.md
@ -0,0 +1,150 @@
+---
+title: Image Generation
+description: Generate high-quality images using FLUX 2 Pro with automatic upscaling via FAL.ai.
+sidebar_label: Image Generation
+sidebar_position: 6
+---
+
+# Image Generation
+
+Hermes Agent can generate images from text prompts using FAL.ai's **FLUX 2 Pro** model with automatic 2x upscaling via the **Clarity Upscaler** for enhanced quality.
+
+## Setup
+
+### Get a FAL API Key
+
+1. Sign up at [fal.ai](https://fal.ai/)
+2. Generate an API key from your dashboard
+
+### Configure the Key
+
+```bash
+# Add to ~/.hermes/.env
+FAL_KEY=your-fal-api-key-here
+```
+
+### Install the Client Library
+
+```bash
+pip install fal-client
+```
+
+:::info
+The image generation tool is automatically available when `FAL_KEY` is set. No additional toolset configuration is needed.
+:::
+
+## How It Works
+
+When you ask Hermes to generate an image:
+
+1. **Generation** — Your prompt is sent to the FLUX 2 Pro model (`fal-ai/flux-2-pro`)
+2. **Upscaling** — The generated image is automatically upscaled 2x using the Clarity Upscaler (`fal-ai/clarity-upscaler`)
+3. **Delivery** — The upscaled image URL is returned
+
+If upscaling fails for any reason, the original image is returned as a fallback.
+
+## Usage
+
+Simply ask Hermes to create an image:
+
+```
+Generate an image of a serene mountain landscape with cherry blossoms
+```
+
+```
+Create a portrait of a wise old owl perched on an ancient tree branch
+```
+
+```
+Make me a futuristic cityscape with flying cars and neon lights
+```
+
+## Parameters
+
+The `image_generate_tool` accepts these parameters:
+
+| Parameter | Default | Range | Description |
+|-----------|---------|-------|-------------|
+| `prompt` | *(required)* | — | Text description of the desired image |
+| `aspect_ratio` | `"landscape"` | `landscape`, `square`, `portrait` | Image aspect ratio |
+| `num_inference_steps` | `50` | 1–100 | Number of denoising steps (more = higher quality, slower) |
+| `guidance_scale` | `4.5` | 0.1–20.0 | How closely to follow the prompt |
+| `num_images` | `1` | 1–4 | Number of images to generate |
+| `output_format` | `"png"` | `png`, `jpeg` | Image file format |
+| `seed` | *(random)* | any integer | Random seed for reproducible results |
+
+## Aspect Ratios
+
+The tool uses simplified aspect ratio names that map to FLUX 2 Pro image sizes:
+
+| Aspect Ratio | Maps To | Best For |
+|-------------|---------|----------|
+| `landscape` | `landscape_16_9` | Wallpapers, banners, scenes |
+| `square` | `square_hd` | Profile pictures, social media posts |
+| `portrait` | `portrait_16_9` | Character art, phone wallpapers |
+
+:::tip
+You can also use the raw FLUX 2 Pro size presets directly: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9`. Custom sizes up to 2048x2048 are also supported.
+:::
+
+## Automatic Upscaling
+
+Every generated image is automatically upscaled 2x using FAL.ai's Clarity Upscaler with these settings:
+
+| Setting | Value |
+|---------|-------|
+| Upscale Factor | 2x |
+| Creativity | 0.35 |
+| Resemblance | 0.6 |
+| Guidance Scale | 4 |
+| Inference Steps | 18 |
+| Positive Prompt | `"masterpiece, best quality, highres"` + your original prompt |
+| Negative Prompt | `"(worst quality, low quality, normal quality:2)"` |
+
+The upscaler enhances detail and resolution while preserving the original composition. If the upscaler fails (network issue, rate limit), the original resolution image is returned automatically.
+
+## Example Prompts
+
+Here are some effective prompts to try:
+
+```
+A candid street photo of a woman with a pink bob and bold eyeliner
+```
+
+```
+Modern architecture building with glass facade, sunset lighting
+```
+
+```
+Abstract art with vibrant colors and geometric patterns
+```
+
+```
+Portrait of a wise old owl perched on ancient tree branch
+```
+
+```
+Futuristic cityscape with flying cars and neon lights
+```
+
+## Debugging
+
+Enable debug logging for image generation:
+
+```bash
+export IMAGE_TOOLS_DEBUG=true
+```
+
+Debug logs are saved to `./logs/image_tools_debug_<session_id>.json` with details about each generation request, parameters, timing, and any errors.
+
+## Safety Settings
+
+The image generation tool runs with safety checks disabled by default (`safety_tolerance: 5`, the most permissive setting). This is configured at the code level and is not user-adjustable.
+
+## Limitations
+
+- **Requires FAL API key** — image generation incurs API costs on your FAL.ai account
+- **No image editing** — this is text-to-image only, no inpainting or img2img
+- **URL-based delivery** — images are returned as temporary FAL.ai URLs, not saved locally
+- **Upscaling adds latency** — the automatic 2x upscale step adds processing time
+- **Max 4 images per request** — `num_images` is capped at 4
--- a/hermes_code/website/docs/user-guide/features/mcp.md
+++ b/hermes_code/website/docs/user-guide/features/mcp.md
@ -0,0 +1,411 @@
+---
+sidebar_position: 4
+title: "MCP (Model Context Protocol)"
+description: "Connect Hermes Agent to external tool servers via MCP — and control exactly which MCP tools Hermes loads"
+---
+
+# MCP (Model Context Protocol)
+
+MCP lets Hermes Agent connect to external tool servers so the agent can use tools that live outside Hermes itself — GitHub, databases, file systems, browser stacks, internal APIs, and more.
+
+If you have ever wanted Hermes to use a tool that already exists somewhere else, MCP is usually the cleanest way to do it.
+
+## What MCP gives you
+
+- Access to external tool ecosystems without writing a native Hermes tool first
+- Local stdio servers and remote HTTP MCP servers in the same config
+- Automatic tool discovery and registration at startup
+- Utility wrappers for MCP resources and prompts when supported by the server
+- Per-server filtering so you can expose only the MCP tools you actually want Hermes to see
+
+## Quick start
+
+1. Install MCP support (already included if you used the standard install script):
+
+```bash
+cd ~/.hermes/hermes-agent
+uv pip install -e ".[mcp]"
+```
+
+2. Add an MCP server to `~/.hermes/config.yaml`:
+
+```yaml
+mcp_servers:
+  filesystem:
+    command: "npx"
+    args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
+```
+
+3. Start Hermes:
+
+```bash
+hermes chat
+```
+
+4. Ask Hermes to use the MCP-backed capability.
+
+For example:
+
+```text
+List the files in /home/user/projects and summarize the repo structure.
+```
+
+Hermes will discover the MCP server's tools and use them like any other tool.
+
+## Two kinds of MCP servers
+
+### Stdio servers
+
+Stdio servers run as local subprocesses and talk over stdin/stdout.
+
+```yaml
+mcp_servers:
+  github:
+    command: "npx"
+    args: ["-y", "@modelcontextprotocol/server-github"]
+    env:
+      GITHUB_PERSONAL_ACCESS_TOKEN: "***"
+```
+
+Use stdio servers when:
+- the server is installed locally
+- you want low-latency access to local resources
+- you are following MCP server docs that show `command`, `args`, and `env`
+
+### HTTP servers
+
+HTTP MCP servers are remote endpoints Hermes connects to directly.
+
+```yaml
+mcp_servers:
+  remote_api:
+    url: "https://mcp.example.com/mcp"
+    headers:
+      Authorization: "Bearer ***"
+```
+
+Use HTTP servers when:
+- the MCP server is hosted elsewhere
+- your organization exposes internal MCP endpoints
+- you do not want Hermes spawning a local subprocess for that integration
+
+## Basic configuration reference
+
+Hermes reads MCP config from `~/.hermes/config.yaml` under `mcp_servers`.
+
+### Common keys
+
+| Key | Type | Meaning |
+|---|---|---|
+| `command` | string | Executable for a stdio MCP server |
+| `args` | list | Arguments for the stdio server |
+| `env` | mapping | Environment variables passed to the stdio server |
+| `url` | string | HTTP MCP endpoint |
+| `headers` | mapping | HTTP headers for remote servers |
+| `timeout` | number | Tool call timeout |
+| `connect_timeout` | number | Initial connection timeout |
+| `enabled` | bool | If `false`, Hermes skips the server entirely |
+| `tools` | mapping | Per-server tool filtering and utility policy |
+
+### Minimal stdio example
+
+```yaml
+mcp_servers:
+  filesystem:
+    command: "npx"
+    args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
+```
+
+### Minimal HTTP example
+
+```yaml
+mcp_servers:
+  company_api:
+    url: "https://mcp.internal.example.com"
+    headers:
+      Authorization: "Bearer ***"
+```
+
+## How Hermes registers MCP tools
+
+Hermes prefixes MCP tools so they do not collide with built-in names:
+
+```text
+mcp_<server_name>_<tool_name>
+```
+
+Examples:
+
+| Server | MCP tool | Registered name |
+|---|---|---|
+| `filesystem` | `read_file` | `mcp_filesystem_read_file` |
+| `github` | `create-issue` | `mcp_github_create_issue` |
+| `my-api` | `query.data` | `mcp_my_api_query_data` |
+
+In practice, you usually do not need to call the prefixed name manually — Hermes sees the tool and chooses it during normal reasoning.
+
+## MCP utility tools
+
+When supported, Hermes also registers utility tools around MCP resources and prompts:
+
+- `list_resources`
+- `read_resource`
+- `list_prompts`
+- `get_prompt`
+
+These are registered per server with the same prefix pattern, for example:
+
+- `mcp_github_list_resources`
+- `mcp_github_get_prompt`
+
+### Important
+
+These utility tools are now capability-aware:
+- Hermes only registers resource utilities if the MCP session actually supports resource operations
+- Hermes only registers prompt utilities if the MCP session actually supports prompt operations
+
+So a server that exposes callable tools but no resources/prompts will not get those extra wrappers.
+
+## Per-server filtering
+
+This is the main feature added by the PR work.
+
+You can now control which tools each MCP server contributes to Hermes.
+
+### Disable a server entirely
+
+```yaml
+mcp_servers:
+  legacy:
+    url: "https://mcp.legacy.internal"
+    enabled: false
+```
+
+If `enabled: false`, Hermes skips the server completely and does not even attempt a connection.
+
+### Whitelist server tools
+
+```yaml
+mcp_servers:
+  github:
+    command: "npx"
+    args: ["-y", "@modelcontextprotocol/server-github"]
+    env:
+      GITHUB_PERSONAL_ACCESS_TOKEN: "***"
+    tools:
+      include: [create_issue, list_issues]
+```
+
+Only those MCP server tools are registered.
+
+### Blacklist server tools
+
+```yaml
+mcp_servers:
+  stripe:
+    url: "https://mcp.stripe.com"
+    tools:
+      exclude: [delete_customer]
+```
+
+All server tools are registered except the excluded ones.
+
+### Precedence rule
+
+If both are present:
+
+```yaml
+tools:
+  include: [create_issue]
+  exclude: [create_issue, delete_issue]
+```
+
+`include` wins.
+
+### Filter utility tools too
+
+You can also separately disable Hermes-added utility wrappers:
+
+```yaml
+mcp_servers:
+  docs:
+    url: "https://mcp.docs.example.com"
+    tools:
+      prompts: false
+      resources: false
+```
+
+That means:
+- `tools.resources: false` disables `list_resources` and `read_resource`
+- `tools.prompts: false` disables `list_prompts` and `get_prompt`
+
+### Full example
+
+```yaml
+mcp_servers:
+  github:
+    command: "npx"
+    args: ["-y", "@modelcontextprotocol/server-github"]
+    env:
+      GITHUB_PERSONAL_ACCESS_TOKEN: "***"
+    tools:
+      include: [create_issue, list_issues, search_code]
+      prompts: false
+
+  stripe:
+    url: "https://mcp.stripe.com"
+    headers:
+      Authorization: "Bearer ***"
+    tools:
+      exclude: [delete_customer]
+      resources: false
+
+  legacy:
+    url: "https://mcp.legacy.internal"
+    enabled: false
+```
+
+## What happens if everything is filtered out?
+
+If your config filters out all callable tools and disables or omits all supported utilities, Hermes does not create an empty runtime MCP toolset for that server.
+
+That keeps the tool list clean.
+
+## Runtime behavior
+
+### Discovery time
+
+Hermes discovers MCP servers at startup and registers their tools into the normal tool registry.
+
+### Reloading
+
+If you change MCP config, use:
+
+```text
+/reload-mcp
+```
+
+This reloads MCP servers from config and refreshes the available tool list.
+
+### Toolsets
+
+Each configured MCP server also creates a runtime toolset when it contributes at least one registered tool:
+
+```text
+mcp-<server>
+```
+
+That makes MCP servers easier to reason about at the toolset level.
+
+## Security model
+
+### Stdio env filtering
+
+For stdio servers, Hermes does not blindly pass your full shell environment.
+
+Only explicitly configured `env` plus a safe baseline are passed through. This reduces accidental secret leakage.
+
+### Config-level exposure control
+
+The new filtering support is also a security control:
+- disable dangerous tools you do not want the model to see
+- expose only a minimal whitelist for a sensitive server
+- disable resource/prompt wrappers when you do not want that surface exposed
+
+## Example use cases
+
+### GitHub server with a minimal issue-management surface
+
+```yaml
+mcp_servers:
+  github:
+    command: "npx"
+    args: ["-y", "@modelcontextprotocol/server-github"]
+    env:
+      GITHUB_PERSONAL_ACCESS_TOKEN: "***"
+    tools:
+      include: [list_issues, create_issue, update_issue]
+      prompts: false
+      resources: false
+```
+
+Use it like:
+
+```text
+Show me open issues labeled bug, then draft a new issue for the flaky MCP reconnection behavior.
+```
+
+### Stripe server with dangerous actions removed
+
+```yaml
+mcp_servers:
+  stripe:
+    url: "https://mcp.stripe.com"
+    headers:
+      Authorization: "Bearer ***"
+    tools:
+      exclude: [delete_customer, refund_payment]
+```
+
+Use it like:
+
+```text
+Look up the last 10 failed payments and summarize common failure reasons.
+```
+
+### Filesystem server for a single project root
+
+```yaml
+mcp_servers:
+  project_fs:
+    command: "npx"
+    args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/my-project"]
+```
+
+Use it like:
+
+```text
+Inspect the project root and explain the directory layout.
+```
+
+## Troubleshooting
+
+### MCP server not connecting
+
+Check:
+
+```bash
+# Verify MCP deps are installed (already included in standard install)
+cd ~/.hermes/hermes-agent && uv pip install -e ".[mcp]"
+
+node --version
+npx --version
+```
+
+Then verify your config and restart Hermes.
+
+### Tools not appearing
+
+Possible causes:
+- the server failed to connect
+- discovery failed
+- your filter config excluded the tools
+- the utility capability does not exist on that server
+- the server is disabled with `enabled: false`
+
+If you are intentionally filtering, this is expected.
+
+### Why didn't resource or prompt utilities appear?
+
+Because Hermes now only registers those wrappers when both are true:
+1. your config allows them
+2. the server session actually supports the capability
+
+This is intentional and keeps the tool list honest.
+
+## Related docs
+
+- [Use MCP with Hermes](/docs/guides/use-mcp-with-hermes)
+- [CLI Commands](/docs/reference/cli-commands)
+- [Slash Commands](/docs/reference/slash-commands)
+- [FAQ](/docs/reference/faq)
--- a/hermes_code/website/docs/user-guide/features/memory.md
+++ b/hermes_code/website/docs/user-guide/features/memory.md
@ -0,0 +1,218 @@
+---
+sidebar_position: 3
+title: "Persistent Memory"
+description: "How Hermes Agent remembers across sessions — MEMORY.md, USER.md, and session search"
+---
+
+# Persistent Memory
+
+Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it has learned.
+
+## How It Works
+
+Two files make up the agent's memory:
+
+| File | Purpose | Char Limit |
+|------|---------|------------|
+| **MEMORY.md** | Agent's personal notes — environment facts, conventions, things learned | 2,200 chars (~800 tokens) |
+| **USER.md** | User profile — your preferences, communication style, expectations | 1,375 chars (~500 tokens) |
+
+Both are stored in `~/.hermes/memories/` and are injected into the system prompt as a frozen snapshot at session start. The agent manages its own memory via the `memory` tool — it can add, replace, or remove entries.
+
+:::info
+Character limits keep memory focused. When memory is full, the agent consolidates or replaces entries to make room for new information.
+:::
+
+## How Memory Appears in the System Prompt
+
+At the start of every session, memory entries are loaded from disk and rendered into the system prompt as a frozen block:
+
+```
+══════════════════════════════════════════════
+MEMORY (your personal notes) [67% — 1,474/2,200 chars]
+══════════════════════════════════════════════
+User's project is a Rust web service at ~/code/myapi using Axum + SQLx
+§
+This machine runs Ubuntu 22.04, has Docker and Podman installed
+§
+User prefers concise responses, dislikes verbose explanations
+```
+
+The format includes:
+- A header showing which store (MEMORY or USER PROFILE)
+- Usage percentage and character counts so the agent knows capacity
+- Individual entries separated by `§` (section sign) delimiters
+- Entries can be multiline
+
+**Frozen snapshot pattern:** The system prompt injection is captured once at session start and never changes mid-session. This is intentional — it preserves the LLM's prefix cache for performance. When the agent adds/removes memory entries during a session, the changes are persisted to disk immediately but won't appear in the system prompt until the next session starts. Tool responses always show the live state.
+
+## Memory Tool Actions
+
+The agent uses the `memory` tool with these actions:
+
+- **add** — Add a new memory entry
+- **replace** — Replace an existing entry with updated content (uses substring matching via `old_text`)
+- **remove** — Remove an entry that's no longer relevant (uses substring matching via `old_text`)
+
+There is no `read` action — memory content is automatically injected into the system prompt at session start. The agent sees its memories as part of its conversation context.
+
+### Substring Matching
+
+The `replace` and `remove` actions use short unique substring matching — you don't need the full entry text. The `old_text` parameter just needs to be a unique substring that identifies exactly one entry:
+
+```python
+# If memory contains "User prefers dark mode in all editors"
+memory(action="replace", target="memory",
+       old_text="dark mode",
+       content="User prefers light mode in VS Code, dark mode in terminal")
+```
+
+If the substring matches multiple entries, an error is returned asking for a more specific match.
+
+## Two Targets Explained
+
+### `memory` — Agent's Personal Notes
+
+For information the agent needs to remember about the environment, workflows, and lessons learned:
+
+- Environment facts (OS, tools, project structure)
+- Project conventions and configuration
+- Tool quirks and workarounds discovered
+- Completed task diary entries
+- Skills and techniques that worked
+
+### `user` — User Profile
+
+For information about the user's identity, preferences, and communication style:
+
+- Name, role, timezone
+- Communication preferences (concise vs detailed, format preferences)
+- Pet peeves and things to avoid
+- Workflow habits
+- Technical skill level
+
+## What to Save vs Skip
+
+### Save These (Proactively)
+
+The agent saves automatically — you don't need to ask. It saves when it learns:
+
+- **User preferences:** "I prefer TypeScript over JavaScript" → save to `user`
+- **Environment facts:** "This server runs Debian 12 with PostgreSQL 16" → save to `memory`
+- **Corrections:** "Don't use `sudo` for Docker commands, user is in docker group" → save to `memory`
+- **Conventions:** "Project uses tabs, 120-char line width, Google-style docstrings" → save to `memory`
+- **Completed work:** "Migrated database from MySQL to PostgreSQL on 2026-01-15" → save to `memory`
+- **Explicit requests:** "Remember that my API key rotation happens monthly" → save to `memory`
+
+### Skip These
+
+- **Trivial/obvious info:** "User asked about Python" — too vague to be useful
+- **Easily re-discovered facts:** "Python 3.12 supports f-string nesting" — can web search this
+- **Raw data dumps:** Large code blocks, log files, data tables — too big for memory
+- **Session-specific ephemera:** Temporary file paths, one-off debugging context
+- **Information already in context files:** SOUL.md and AGENTS.md content
+
+## Capacity Management
+
+Memory has strict character limits to keep system prompts bounded:
+
+| Store | Limit | Typical entries |
+|-------|-------|----------------|
+| memory | 2,200 chars | 8-15 entries |
+| user | 1,375 chars | 5-10 entries |
+
+### What Happens When Memory is Full
+
+When you try to add an entry that would exceed the limit, the tool returns an error:
+
+```json
+{
+  "success": false,
+  "error": "Memory at 2,100/2,200 chars. Adding this entry (250 chars) would exceed the limit. Replace or remove existing entries first.",
+  "current_entries": ["..."],
+  "usage": "2,100/2,200"
+}
+```
+
+The agent should then:
+1. Read the current entries (shown in the error response)
+2. Identify entries that can be removed or consolidated
+3. Use `replace` to merge related entries into shorter versions
+4. Then `add` the new entry
+
+**Best practice:** When memory is above 80% capacity (visible in the system prompt header), consolidate entries before adding new ones. For example, merge three separate "project uses X" entries into one comprehensive project description entry.
+
+### Practical Examples of Good Memory Entries
+
+**Compact, information-dense entries work best:**
+
+```
+# Good: Packs multiple related facts
+User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh with oh-my-zsh. Editor: VS Code with Vim keybindings.
+
+# Good: Specific, actionable convention
+Project ~/code/api uses Go 1.22, sqlc for DB queries, chi router. Run tests with 'make test'. CI via GitHub Actions.
+
+# Good: Lesson learned with context
+The staging server (10.0.1.50) needs SSH port 2222, not 22. Key is at ~/.ssh/staging_ed25519.
+
+# Bad: Too vague
+User has a project.
+
+# Bad: Too verbose
+On January 5th, 2026, the user asked me to look at their project which is
+located at ~/code/api. I discovered it uses Go version 1.22 and...
+```
+
+## Duplicate Prevention
+
+The memory system automatically rejects exact duplicate entries. If you try to add content that already exists, it returns success with a "no duplicate added" message.
+
+## Security Scanning
+
+Memory entries are scanned for injection and exfiltration patterns before being accepted, since they're injected into the system prompt. Content matching threat patterns (prompt injection, credential exfiltration, SSH backdoors) or containing invisible Unicode characters is blocked.
+
+## Session Search
+
+Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
+
+- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
+- Search queries return relevant past conversations with Gemini Flash summarization
+- The agent can find things it discussed weeks ago, even if they're not in its active memory
+
+```bash
+hermes sessions list    # Browse past sessions
+```
+
+### session_search vs memory
+
+| Feature | Persistent Memory | Session Search |
+|---------|------------------|----------------|
+| **Capacity** | ~1,300 tokens total | Unlimited (all sessions) |
+| **Speed** | Instant (in system prompt) | Requires search + LLM summarization |
+| **Use case** | Key facts always available | Finding specific past conversations |
+| **Management** | Manually curated by agent | Automatic — all sessions stored |
+| **Token cost** | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |
+
+**Memory** is for critical facts that should always be in context. **Session search** is for "did we discuss X last week?" queries where the agent needs to recall specifics from past conversations.
+
+## Configuration
+
+```yaml
+# In ~/.hermes/config.yaml
+memory:
+  memory_enabled: true
+  user_profile_enabled: true
+  memory_char_limit: 2200   # ~800 tokens
+  user_char_limit: 1375     # ~500 tokens
+```
+
+## Honcho Integration (Cross-Session User Modeling)
+
+For deeper, AI-generated user understanding that works across sessions and platforms, you can enable [Honcho Memory](./honcho.md). Honcho runs alongside built-in memory in `hybrid` mode (the default) — `MEMORY.md` and `USER.md` stay as-is, and Honcho adds a persistent user modeling layer on top.
+
+```bash
+hermes honcho setup
+```
+
+See the [Honcho Memory](./honcho.md) docs for full configuration, tools, and CLI reference.
--- a/hermes_code/website/docs/user-guide/features/personality.md
+++ b/hermes_code/website/docs/user-guide/features/personality.md
@ -0,0 +1,271 @@
+---
+sidebar_position: 9
+title: "Personality & SOUL.md"
+description: "Customize Hermes Agent's personality with a global SOUL.md, built-in personalities, and custom persona definitions"
+---
+
+# Personality & SOUL.md
+
+Hermes Agent's personality is fully customizable. `SOUL.md` is the **primary identity** — it's the first thing in the system prompt and defines who the agent is.
+
+- `SOUL.md` — a durable persona file that lives in `HERMES_HOME` and serves as the agent's identity (slot #1 in the system prompt)
+- built-in or custom `/personality` presets — session-level system-prompt overlays
+
+If you want to change who Hermes is — or replace it with an entirely different agent persona — edit `SOUL.md`.
+
+## How SOUL.md works now
+
+Hermes now seeds a default `SOUL.md` automatically in:
+
+```text
+~/.hermes/SOUL.md
+```
+
+More precisely, it uses the current instance's `HERMES_HOME`, so if you run Hermes with a custom home directory, it will use:
+
+```text
+$HERMES_HOME/SOUL.md
+```
+
+### Important behavior
+
+- **SOUL.md is the agent's primary identity.** It occupies slot #1 in the system prompt, replacing the hardcoded default identity.
+- Hermes creates a starter `SOUL.md` automatically if one does not exist yet
+- Existing user `SOUL.md` files are never overwritten
+- Hermes loads `SOUL.md` only from `HERMES_HOME`
+- Hermes does not look in the current working directory for `SOUL.md`
+- If `SOUL.md` exists but is empty, or cannot be loaded, Hermes falls back to a built-in default identity
+- If `SOUL.md` has content, that content is injected verbatim after security scanning and truncation
+- SOUL.md is **not** duplicated in the context files section — it appears only once, as the identity
+
+That makes `SOUL.md` a true per-user or per-instance identity, not just an additive layer.
+
+## Why this design
+
+This keeps personality predictable.
+
+If Hermes loaded `SOUL.md` from whatever directory you happened to launch it in, your personality could change unexpectedly between projects. By loading only from `HERMES_HOME`, the personality belongs to the Hermes instance itself.
+
+That also makes it easier to teach users:
+- "Edit `~/.hermes/SOUL.md` to change Hermes' default personality."
+
+## Where to edit it
+
+For most users:
+
+```bash
+~/.hermes/SOUL.md
+```
+
+If you use a custom home:
+
+```bash
+$HERMES_HOME/SOUL.md
+```
+
+## What should go in SOUL.md?
+
+Use it for durable voice and personality guidance, such as:
+- tone
+- communication style
+- level of directness
+- default interaction style
+- what to avoid stylistically
+- how Hermes should handle uncertainty, disagreement, or ambiguity
+
+Use it less for:
+- one-off project instructions
+- file paths
+- repo conventions
+- temporary workflow details
+
+Those belong in `AGENTS.md`, not `SOUL.md`.
+
+## Good SOUL.md content
+
+A good SOUL file is:
+- stable across contexts
+- broad enough to apply in many conversations
+- specific enough to materially shape the voice
+- focused on communication and identity, not task-specific instructions
+
+### Example
+
+```markdown
+# Personality
+
+You are a pragmatic senior engineer with strong taste.
+You optimize for truth, clarity, and usefulness over politeness theater.
+
+## Style
+- Be direct without being cold
+- Prefer substance over filler
+- Push back when something is a bad idea
+- Admit uncertainty plainly
+- Keep explanations compact unless depth is useful
+
+## What to avoid
+- Sycophancy
+- Hype language
+- Repeating the user's framing if it's wrong
+- Overexplaining obvious things
+
+## Technical posture
+- Prefer simple systems over clever systems
+- Care about operational reality, not idealized architecture
+- Treat edge cases as part of the design, not cleanup
+```
+
+## What Hermes injects into the prompt
+
+`SOUL.md` content goes directly into slot #1 of the system prompt — the agent identity position. No wrapper language is added around it.
+
+The content goes through:
+- prompt-injection scanning
+- truncation if it is too large
+
+If the file is empty, whitespace-only, or cannot be read, Hermes falls back to a built-in default identity ("You are Hermes Agent, an intelligent AI assistant created by Nous Research..."). This fallback also applies when `skip_context_files` is set (e.g., in subagent/delegation contexts).
+
+## Security scanning
+
+`SOUL.md` is scanned like other context-bearing files for prompt injection patterns before inclusion.
+
+That means you should still keep it focused on persona/voice rather than trying to sneak in strange meta-instructions.
+
+## SOUL.md vs AGENTS.md
+
+This is the most important distinction.
+
+### SOUL.md
+Use for:
+- identity
+- tone
+- style
+- communication defaults
+- personality-level behavior
+
+### AGENTS.md
+Use for:
+- project architecture
+- coding conventions
+- tool preferences
+- repo-specific workflows
+- commands, ports, paths, deployment notes
+
+A useful rule:
+- if it should follow you everywhere, it belongs in `SOUL.md`
+- if it belongs to a project, it belongs in `AGENTS.md`
+
+## SOUL.md vs `/personality`
+
+`SOUL.md` is your durable default personality.
+
+`/personality` is a session-level overlay that changes or supplements the current system prompt.
+
+So:
+- `SOUL.md` = baseline voice
+- `/personality` = temporary mode switch
+
+Examples:
+- keep a pragmatic default SOUL, then use `/personality teacher` for a tutoring conversation
+- keep a concise SOUL, then use `/personality creative` for brainstorming
+
+## Built-in personalities
+
+Hermes ships with built-in personalities you can switch to with `/personality`.
+
+| Name | Description |
+|------|-------------|
+| **helpful** | Friendly, general-purpose assistant |
+| **concise** | Brief, to-the-point responses |
+| **technical** | Detailed, accurate technical expert |
+| **creative** | Innovative, outside-the-box thinking |
+| **teacher** | Patient educator with clear examples |
+| **kawaii** | Cute expressions, sparkles, and enthusiasm ★ |
+| **catgirl** | Neko-chan with cat-like expressions, nya~ |
+| **pirate** | Captain Hermes, tech-savvy buccaneer |
+| **shakespeare** | Bardic prose with dramatic flair |
+| **surfer** | Totally chill bro vibes |
+| **noir** | Hard-boiled detective narration |
+| **uwu** | Maximum cute with uwu-speak |
+| **philosopher** | Deep contemplation on every query |
+| **hype** | MAXIMUM ENERGY AND ENTHUSIASM!!! |
+
+## Switching personalities with commands
+
+### CLI
+
+```text
+/personality
+/personality concise
+/personality technical
+```
+
+### Messaging platforms
+
+```text
+/personality teacher
+```
+
+These are convenient overlays, but your global `SOUL.md` still gives Hermes its persistent default personality unless the overlay meaningfully changes it.
+
+## Custom personalities in config
+
+You can also define named custom personalities in `~/.hermes/config.yaml` under `agent.personalities`.
+
+```yaml
+agent:
+  personalities:
+    codereviewer: >
+      You are a meticulous code reviewer. Identify bugs, security issues,
+      performance concerns, and unclear design choices. Be precise and constructive.
+```
+
+Then switch to it with:
+
+```text
+/personality codereviewer
+```
+
+## Recommended workflow
+
+A strong default setup is:
+
+1. Keep a thoughtful global `SOUL.md` in `~/.hermes/SOUL.md`
+2. Put project instructions in `AGENTS.md`
+3. Use `/personality` only when you want a temporary mode shift
+
+That gives you:
+- a stable voice
+- project-specific behavior where it belongs
+- temporary control when needed
+
+## How personality interacts with the full prompt
+
+At a high level, the prompt stack includes:
+1. **SOUL.md** (agent identity — or built-in fallback if SOUL.md is unavailable)
+2. tool-aware behavior guidance
+3. memory/user context
+4. skills guidance
+5. context files (`AGENTS.md`, `.cursorrules`)
+6. timestamp
+7. platform-specific formatting hints
+8. optional system-prompt overlays such as `/personality`
+
+`SOUL.md` is the foundation — everything else builds on top of it.
+
+## Related docs
+
+- [Context Files](/docs/user-guide/features/context-files)
+- [Configuration](/docs/user-guide/configuration)
+- [Tips & Best Practices](/docs/guides/tips)
+- [SOUL.md Guide](/docs/guides/use-soul-with-hermes)
+
+## CLI appearance vs conversational personality
+
+Conversational personality and CLI appearance are separate:
+
+- `SOUL.md`, `agent.system_prompt`, and `/personality` affect how Hermes speaks
+- `display.skin` and `/skin` affect how Hermes looks in the terminal
+
+For terminal appearance, see [Skins & Themes](./skins.md).
--- a/hermes_code/website/docs/user-guide/features/plugins.md
+++ b/hermes_code/website/docs/user-guide/features/plugins.md
@ -0,0 +1,92 @@
+---
+sidebar_position: 20
+---
+
+# Plugins
+
+Hermes has a plugin system for adding custom tools, hooks, slash commands, and integrations without modifying core code.
+
+**→ [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin)** — step-by-step guide with a complete working example.
+
+## Quick overview
+
+Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code:
+
+```
+~/.hermes/plugins/my-plugin/
+├── plugin.yaml      # manifest
+├── __init__.py      # register() — wires schemas to handlers
+├── schemas.py       # tool schemas (what the LLM sees)
+└── tools.py         # tool handlers (what runs when called)
+```
+
+Start Hermes — your tools appear alongside built-in tools. The model can call them immediately.
+
+Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable them only for trusted repositories by setting `HERMES_ENABLE_PROJECT_PLUGINS=true` before starting Hermes.
+
+## What plugins can do
+
+| Capability | How |
+|-----------|-----|
+| Add tools | `ctx.register_tool(name, schema, handler)` |
+| Add hooks | `ctx.register_hook("post_tool_call", callback)` |
+| Add slash commands | `ctx.register_command("mycommand", handler)` |
+| Ship data files | `Path(__file__).parent / "data" / "file.yaml"` |
+| Bundle skills | Copy `skill.md` to `~/.hermes/skills/` at load time |
+| Gate on env vars | `requires_env: [API_KEY]` in plugin.yaml |
+| Distribute via pip | `[project.entry-points."hermes_agent.plugins"]` |
+
+## Plugin discovery
+
+| Source | Path | Use case |
+|--------|------|----------|
+| User | `~/.hermes/plugins/` | Personal plugins |
+| Project | `.hermes/plugins/` | Project-specific plugins (requires `HERMES_ENABLE_PROJECT_PLUGINS=true`) |
+| pip | `hermes_agent.plugins` entry_points | Distributed packages |
+
+## Available hooks
+
+| Hook | Fires when |
+|------|-----------|
+| `pre_tool_call` | Before any tool executes |
+| `post_tool_call` | After any tool returns |
+| `pre_llm_call` | Before LLM API request |
+| `post_llm_call` | After LLM API response |
+| `on_session_start` | Session begins |
+| `on_session_end` | Session ends |
+
+## Slash commands
+
+Plugins can register slash commands that work in both CLI and messaging platforms:
+
+```python
+def register(ctx):
+    ctx.register_command(
+        name="greet",
+        handler=lambda args: f"Hello, {args or 'world'}!",
+        description="Greet someone",
+        args_hint="[name]",
+        aliases=("hi",),
+    )
+```
+
+The handler receives the argument string (everything after `/greet`) and returns a string to display. Registered commands automatically appear in `/help`, tab autocomplete, Telegram bot menu, and Slack subcommand mapping.
+
+| Parameter | Description |
+|-----------|-------------|
+| `name` | Command name without slash |
+| `handler` | Callable that takes `args: str` and returns `str | None` |
+| `description` | Shown in `/help` |
+| `args_hint` | Usage hint, e.g. `"[name]"` |
+| `aliases` | Tuple of alternative names |
+| `cli_only` | Only available in CLI |
+| `gateway_only` | Only available in messaging platforms |
+
+## Managing plugins
+
+```
+/plugins              # list loaded plugins in a session
+hermes config set display.show_cost true  # show cost in status bar
+```
+
+See the **[full guide](/docs/guides/build-a-hermes-plugin)** for handler contracts, schema format, hook behavior, error handling, and common mistakes.
--- a/hermes_code/website/docs/user-guide/features/provider-routing.md
+++ b/hermes_code/website/docs/user-guide/features/provider-routing.md
@ -0,0 +1,200 @@
+---
+title: Provider Routing
+description: Configure OpenRouter provider preferences to optimize for cost, speed, or quality.
+sidebar_label: Provider Routing
+sidebar_position: 7
+---
+
+# Provider Routing
+
+When using [OpenRouter](https://openrouter.ai) as your LLM provider, Hermes Agent supports **provider routing** — fine-grained control over which underlying AI providers handle your requests and how they're prioritized.
+
+OpenRouter routes requests to many providers (e.g., Anthropic, Google, AWS Bedrock, Together AI). Provider routing lets you optimize for cost, speed, quality, or enforce specific provider requirements.
+
+## Configuration
+
+Add a `provider_routing` section to your `~/.hermes/config.yaml`:
+
+```yaml
+provider_routing:
+  sort: "price"           # How to rank providers
+  only: []                # Whitelist: only use these providers
+  ignore: []              # Blacklist: never use these providers
+  order: []               # Explicit provider priority order
+  require_parameters: false  # Only use providers that support all parameters
+  data_collection: null   # Control data collection ("allow" or "deny")
+```
+
+:::info
+Provider routing only applies when using OpenRouter. It has no effect with direct provider connections (e.g., connecting directly to the Anthropic API).
+:::
+
+## Options
+
+### `sort`
+
+Controls how OpenRouter ranks available providers for your request.
+
+| Value | Description |
+|-------|-------------|
+| `"price"` | Cheapest provider first |
+| `"throughput"` | Fastest tokens-per-second first |
+| `"latency"` | Lowest time-to-first-token first |
+
+```yaml
+provider_routing:
+  sort: "price"
+```
+
+### `only`
+
+Whitelist of provider names. When set, **only** these providers will be used. All others are excluded.
+
+```yaml
+provider_routing:
+  only:
+    - "Anthropic"
+    - "Google"
+```
+
+### `ignore`
+
+Blacklist of provider names. These providers will **never** be used, even if they offer the cheapest or fastest option.
+
+```yaml
+provider_routing:
+  ignore:
+    - "Together"
+    - "DeepInfra"
+```
+
+### `order`
+
+Explicit priority order. Providers listed first are preferred. Unlisted providers are used as fallbacks.
+
+```yaml
+provider_routing:
+  order:
+    - "Anthropic"
+    - "Google"
+    - "AWS Bedrock"
+```
+
+### `require_parameters`
+
+When `true`, OpenRouter will only route to providers that support **all** parameters in your request (like `temperature`, `top_p`, `tools`, etc.). This avoids silent parameter drops.
+
+```yaml
+provider_routing:
+  require_parameters: true
+```
+
+### `data_collection`
+
+Controls whether providers can use your prompts for training. Options are `"allow"` or `"deny"`.
+
+```yaml
+provider_routing:
+  data_collection: "deny"
+```
+
+## Practical Examples
+
+### Optimize for Cost
+
+Route to the cheapest available provider. Good for high-volume usage and development:
+
+```yaml
+provider_routing:
+  sort: "price"
+```
+
+### Optimize for Speed
+
+Prioritize low-latency providers for interactive use:
+
+```yaml
+provider_routing:
+  sort: "latency"
+```
+
+### Optimize for Throughput
+
+Best for long-form generation where tokens-per-second matters:
+
+```yaml
+provider_routing:
+  sort: "throughput"
+```
+
+### Lock to Specific Providers
+
+Ensure all requests go through a specific provider for consistency:
+
+```yaml
+provider_routing:
+  only:
+    - "Anthropic"
+```
+
+### Avoid Specific Providers
+
+Exclude providers you don't want to use (e.g., for data privacy):
+
+```yaml
+provider_routing:
+  ignore:
+    - "Together"
+    - "Lepton"
+  data_collection: "deny"
+```
+
+### Preferred Order with Fallbacks
+
+Try your preferred providers first, fall back to others if unavailable:
+
+```yaml
+provider_routing:
+  order:
+    - "Anthropic"
+    - "Google"
+  require_parameters: true
+```
+
+## How It Works
+
+Provider routing preferences are passed to the OpenRouter API via the `extra_body.provider` field on every API call. This applies to both:
+
+- **CLI mode** — configured in `~/.hermes/config.yaml`, loaded at startup
+- **Gateway mode** — same config file, loaded when the gateway starts
+
+The routing config is read from `config.yaml` and passed as parameters when creating the `AIAgent`:
+
+```
+providers_allowed  ← from provider_routing.only
+providers_ignored  ← from provider_routing.ignore
+providers_order    ← from provider_routing.order
+provider_sort      ← from provider_routing.sort
+provider_require_parameters ← from provider_routing.require_parameters
+provider_data_collection    ← from provider_routing.data_collection
+```
+
+:::tip
+You can combine multiple options. For example, sort by price but exclude certain providers and require parameter support:
+
+```yaml
+provider_routing:
+  sort: "price"
+  ignore: ["Together"]
+  require_parameters: true
+  data_collection: "deny"
+```
+:::
+
+## Default Behavior
+
+When no `provider_routing` section is configured (the default), OpenRouter uses its own default routing logic, which generally balances cost and availability automatically.
+
+:::tip Provider Routing vs. Fallback Models
+Provider routing controls which **sub-providers within OpenRouter** handle your requests. For automatic failover to an entirely different provider when your primary model fails, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
+:::
--- a/hermes_code/website/docs/user-guide/features/rl-training.md
+++ b/hermes_code/website/docs/user-guide/features/rl-training.md
@ -0,0 +1,234 @@
+---
+sidebar_position: 13
+title: "RL Training"
+description: "Reinforcement learning on agent behaviors with Tinker-Atropos — environment discovery, training, and evaluation"
+---
+
+# RL Training
+
+Hermes Agent includes an integrated RL (Reinforcement Learning) training pipeline built on **Tinker-Atropos**. This enables training language models on environment-specific tasks using GRPO (Group Relative Policy Optimization) with LoRA adapters, orchestrated entirely through the agent's tool interface.
+
+## Overview
+
+The RL training system consists of three components:
+
+1. **Atropos** — A trajectory API server that coordinates environment interactions, manages rollout groups, and computes advantages
+2. **Tinker** — A training service that handles model weights, LoRA training, sampling/inference, and optimizer steps
+3. **Environments** — Python classes that define tasks, scoring, and reward functions (e.g., GSM8K math problems)
+
+The agent can discover environments, configure training parameters, launch training runs, and monitor metrics — all through a set of `rl_*` tools.
+
+## Requirements
+
+RL training requires:
+
+- **Python >= 3.11** (Tinker package requirement)
+- **TINKER_API_KEY** — API key for the Tinker training service
+- **WANDB_API_KEY** — API key for Weights & Biases metrics tracking
+- The `tinker-atropos` submodule (at `tinker-atropos/` relative to the Hermes root)
+
+```bash
+# Set up API keys
+hermes config set TINKER_API_KEY your-tinker-key
+hermes config set WANDB_API_KEY your-wandb-key
+```
+
+When both keys are present and Python >= 3.11 is available, the `rl` toolset is automatically enabled.
+
+## Available Tools
+
+| Tool | Description |
+|------|-------------|
+| `rl_list_environments` | Discover available RL environments |
+| `rl_select_environment` | Select an environment and load its config |
+| `rl_get_current_config` | View configurable and locked fields |
+| `rl_edit_config` | Modify configurable training parameters |
+| `rl_start_training` | Launch a training run (spawns 3 processes) |
+| `rl_check_status` | Monitor training progress and WandB metrics |
+| `rl_stop_training` | Stop a running training job |
+| `rl_get_results` | Get final metrics and model weights path |
+| `rl_list_runs` | List all active and completed runs |
+| `rl_test_inference` | Quick inference test using OpenRouter |
+
+## Workflow
+
+### 1. Discover Environments
+
+```
+List the available RL environments
+```
+
+The agent calls `rl_list_environments()` which scans `tinker-atropos/tinker_atropos/environments/` using AST parsing to find Python classes inheriting from `BaseEnv`. Each environment defines:
+
+- **Dataset loading** — where training data comes from (e.g., HuggingFace datasets)
+- **Prompt construction** — how to format items for the model
+- **Scoring/verification** — how to evaluate model outputs and assign rewards
+
+### 2. Select and Configure
+
+```
+Select the GSM8K environment and show me the configuration
+```
+
+The agent calls `rl_select_environment("gsm8k_tinker")`, then `rl_get_current_config()` to see all parameters.
+
+Configuration fields are divided into two categories:
+
+**Configurable fields** (can be modified):
+- `group_size` — Number of completions per item (default: 16)
+- `batch_size` — Training batch size (default: 128)
+- `wandb_name` — WandB run name (auto-set to `{env}-{timestamp}`)
+- Other environment-specific parameters
+
+**Locked fields** (infrastructure settings, cannot be changed):
+- `tokenizer_name` — Model tokenizer (e.g., `Qwen/Qwen3-8B`)
+- `rollout_server_url` — Atropos API URL (`http://localhost:8000`)
+- `max_token_length` — Maximum token length (8192)
+- `max_num_workers` — Maximum parallel workers (2048)
+- `total_steps` — Total training steps (2500)
+- `lora_rank` — LoRA adapter rank (32)
+- `learning_rate` — Learning rate (4e-5)
+- `max_token_trainer_length` — Max tokens for trainer (9000)
+
+### 3. Start Training
+
+```
+Start the training run
+```
+
+The agent calls `rl_start_training()` which:
+
+1. Generates a YAML config file merging locked settings with configurable overrides
+2. Creates a unique run ID
+3. Spawns three processes:
+   - **Atropos API server** (`run-api`) — trajectory coordination
+   - **Tinker trainer** (`launch_training.py`) — LoRA training + FastAPI inference server on port 8001
+   - **Environment** (`environment.py serve`) — the selected environment connecting to Atropos
+
+The processes start with staggered delays (5s for API, 30s for trainer, 90s more for environment) to ensure proper initialization order.
+
+### 4. Monitor Progress
+
+```
+Check the status of training run abc12345
+```
+
+The agent calls `rl_check_status(run_id)` which reports:
+
+- Process status (running/exited for each of the 3 processes)
+- Running time
+- WandB metrics (step, reward mean, percent correct, eval accuracy)
+- Log file locations for debugging
+
+:::note Rate Limiting
+Status checks are rate-limited to once every **30 minutes** per run ID. This prevents excessive polling during long-running training jobs that take hours.
+:::
+
+### 5. Stop or Get Results
+
+```
+Stop the training run
+# or
+Get the final results for run abc12345
+```
+
+`rl_stop_training()` terminates all three processes in reverse order (environment → trainer → API). `rl_get_results()` retrieves final WandB metrics and training history.
+
+## Inference Testing
+
+Before committing to a full training run, you can test if an environment works correctly using `rl_test_inference`. This runs a few steps of inference and scoring using OpenRouter — no Tinker API needed, just an `OPENROUTER_API_KEY`.
+
+```
+Test the selected environment with inference
+```
+
+Default configuration:
+- **3 steps × 16 completions = 48 rollouts per model**
+- Tests 3 models at different scales for robustness:
+  - `qwen/qwen3-8b` (small)
+  - `z-ai/glm-4.7-flash` (medium)
+  - `minimax/minimax-m2.7` (large)
+- Total: ~144 rollouts
+
+This validates:
+- Environment loads correctly
+- Prompt construction works
+- Inference response parsing is robust across model scales
+- Verifier/scoring logic produces valid rewards
+
+## Tinker API Integration
+
+The trainer uses the [Tinker](https://tinker.computer) API for model training operations:
+
+- **ServiceClient** — Creates training and sampling clients
+- **Training client** — Handles forward-backward passes with importance sampling loss, optimizer steps (Adam), and weight checkpointing
+- **Sampling client** — Provides inference using the latest trained weights
+
+The training loop:
+1. Fetches a batch of rollouts from Atropos (prompt + completions + scores)
+2. Converts to Tinker Datum objects with padded logprobs and advantages
+3. Runs forward-backward pass with importance sampling loss
+4. Takes an optimizer step (Adam: lr=4e-5, β1=0.9, β2=0.95)
+5. Saves weights and creates a new sampling client for next-step inference
+6. Logs metrics to WandB
+
+## Architecture Diagram
+
+```mermaid
+flowchart LR
+    api["Atropos API<br/>run-api<br/>port 8000"]
+    env["Environment<br/>BaseEnv implementation"]
+    infer["OpenAI / sglang<br/>inference API<br/>port 8001"]
+    trainer["Tinker Trainer<br/>LoRA training + FastAPI"]
+
+    env <--> api
+    env --> infer
+    api -->|"batches: tokens, scores, logprobs"| trainer
+    trainer -->|"serves inference"| infer
+```
+
+## Creating Custom Environments
+
+To create a new RL environment:
+
+1. Create a Python file in `tinker-atropos/tinker_atropos/environments/`
+2. Define a class that inherits from `BaseEnv`
+3. Implement the required methods:
+   - `load_dataset()` — Load your training data
+   - `get_next_item()` — Provide the next item to the model
+   - `score_answer()` — Score model outputs and assign rewards
+   - `collect_trajectories()` — Collect and return trajectories
+4. Optionally define a custom config class inheriting from `BaseEnvConfig`
+
+Study the existing `gsm8k_tinker.py` as a template. The agent can help you create new environments — it can read existing environment files, inspect HuggingFace datasets, and write new environment code.
+
+## WandB Metrics
+
+Training runs log to Weights & Biases with these key metrics:
+
+| Metric | Description |
+|--------|-------------|
+| `train/loss` | Training loss (importance sampling) |
+| `train/learning_rate` | Current learning rate |
+| `reward/mean` | Mean reward across groups |
+| `logprobs/mean` | Mean reference logprobs |
+| `logprobs/mean_training` | Mean training logprobs |
+| `logprobs/diff` | Logprob drift (reference - training) |
+| `advantages/mean` | Mean advantage values |
+| `advantages/std` | Advantage standard deviation |
+
+## Log Files
+
+Each training run generates log files in `~/.hermes/logs/rl_training/`:
+
+```
+logs/
+├── api_{run_id}.log        # Atropos API server logs
+├── trainer_{run_id}.log    # Tinker trainer logs
+├── env_{run_id}.log        # Environment process logs
+└── inference_tests/        # Inference test results
+    ├── test_{env}_{model}.jsonl
+    └── test_{env}_{model}.log
+```
+
+These are invaluable for debugging when training fails or produces unexpected results.
--- a/hermes_code/website/docs/user-guide/features/skills.md
+++ b/hermes_code/website/docs/user-guide/features/skills.md
@ -0,0 +1,375 @@
+---
+sidebar_position: 2
+title: "Skills System"
+description: "On-demand knowledge documents — progressive disclosure, agent-managed skills, and the Skills Hub"
+---
+
+# Skills System
+
+Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
+
+All skills live in **`~/.hermes/skills/`** — a single directory that serves as the source of truth. On fresh install, bundled skills are copied from the repo. Hub-installed and agent-created skills also go here. The agent can modify or delete any skill.
+
+See also:
+
+- [Bundled Skills Catalog](/docs/reference/skills-catalog)
+- [Official Optional Skills Catalog](/docs/reference/optional-skills-catalog)
+
+## Using Skills
+
+Every installed skill is automatically available as a slash command:
+
+```bash
+# In the CLI or any messaging platform:
+/gif-search funny cats
+/axolotl help me fine-tune Llama 3 on my dataset
+/github-pr-workflow create a PR for the auth refactor
+/plan design a rollout for migrating our auth provider
+
+# Just the skill name loads it and lets the agent ask what you need:
+/excalidraw
+```
+
+The bundled `plan` skill is a good example of a skill-backed slash command with custom behavior. Running `/plan [request]` tells Hermes to inspect context if needed, write a markdown implementation plan instead of executing the task, and save the result under `.hermes/plans/` relative to the active workspace/backend working directory.
+
+You can also interact with skills through natural conversation:
+
+```bash
+hermes chat --toolsets skills -q "What skills do you have?"
+hermes chat --toolsets skills -q "Show me the axolotl skill"
+```
+
+## Progressive Disclosure
+
+Skills use a token-efficient loading pattern:
+
+```
+Level 0: skills_list()           → [{name, description, category}, ...]   (~3k tokens)
+Level 1: skill_view(name)        → Full content + metadata       (varies)
+Level 2: skill_view(name, path)  → Specific reference file       (varies)
+```
+
+The agent only loads the full skill content when it actually needs it.
+
+## SKILL.md Format
+
+```markdown
+---
+name: my-skill
+description: Brief description of what this skill does
+version: 1.0.0
+platforms: [macos, linux]     # Optional — restrict to specific OS platforms
+metadata:
+  hermes:
+    tags: [python, automation]
+    category: devops
+    fallback_for_toolsets: [web]    # Optional — conditional activation (see below)
+    requires_toolsets: [terminal]   # Optional — conditional activation (see below)
+---
+
+# Skill Title
+
+## When to Use
+Trigger conditions for this skill.
+
+## Procedure
+1. Step one
+2. Step two
+
+## Pitfalls
+- Known failure modes and fixes
+
+## Verification
+How to confirm it worked.
+```
+
+### Platform-Specific Skills
+
+Skills can restrict themselves to specific operating systems using the `platforms` field:
+
+| Value | Matches |
+|-------|---------|
+| `macos` | macOS (Darwin) |
+| `linux` | Linux |
+| `windows` | Windows |
+
+```yaml
+platforms: [macos]            # macOS only (e.g., iMessage, Apple Reminders, FindMy)
+platforms: [macos, linux]     # macOS and Linux
+```
+
+When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted, the skill loads on all platforms.
+
+### Conditional Activation (Fallback Skills)
+
+Skills can automatically show or hide themselves based on which tools are available in the current session. This is most useful for **fallback skills** — free or local alternatives that should only appear when a premium tool is unavailable.
+
+```yaml
+metadata:
+  hermes:
+    fallback_for_toolsets: [web]      # Show ONLY when these toolsets are unavailable
+    requires_toolsets: [terminal]     # Show ONLY when these toolsets are available
+    fallback_for_tools: [web_search]  # Show ONLY when these specific tools are unavailable
+    requires_tools: [terminal]        # Show ONLY when these specific tools are available
+```
+
+| Field | Behavior |
+|-------|----------|
+| `fallback_for_toolsets` | Skill is **hidden** when the listed toolsets are available. Shown when they're missing. |
+| `fallback_for_tools` | Same, but checks individual tools instead of toolsets. |
+| `requires_toolsets` | Skill is **hidden** when the listed toolsets are unavailable. Shown when they're present. |
+| `requires_tools` | Same, but checks individual tools. |
+
+**Example:** The built-in `duckduckgo-search` skill uses `fallback_for_toolsets: [web]`. When you have `FIRECRAWL_API_KEY` set, the web toolset is available and the agent uses `web_search` — the DuckDuckGo skill stays hidden. If the API key is missing, the web toolset is unavailable and the DuckDuckGo skill automatically appears as a fallback.
+
+Skills without any conditional fields behave exactly as before — they're always shown.
+
+## Secure Setup on Load
+
+Skills can declare required environment variables without disappearing from discovery:
+
+```yaml
+required_environment_variables:
+  - name: TENOR_API_KEY
+    prompt: Tenor API key
+    help: Get a key from https://developers.google.com/tenor
+    required_for: full functionality
+```
+
+When a missing value is encountered, Hermes asks for it securely only when the skill is actually loaded in the local CLI. You can skip setup and keep using the skill. Messaging surfaces never ask for secrets in chat — they tell you to use `hermes setup` or `~/.hermes/.env` locally instead.
+
+Once set, declared env vars are **automatically passed through** to `execute_code` and `terminal` sandboxes — the skill's scripts can use `$TENOR_API_KEY` directly. For non-skill env vars, use the `terminal.env_passthrough` config option. See [Environment Variable Passthrough](/docs/user-guide/security#environment-variable-passthrough) for details.
+
+## Skill Directory Structure
+
+```text
+~/.hermes/skills/                  # Single source of truth
+├── mlops/                         # Category directory
+│   ├── axolotl/
+│   │   ├── SKILL.md               # Main instructions (required)
+│   │   ├── references/            # Additional docs
+│   │   ├── templates/             # Output formats
+│   │   ├── scripts/               # Helper scripts callable from the skill
+│   │   └── assets/                # Supplementary files
+│   └── vllm/
+│       └── SKILL.md
+├── devops/
+│   └── deploy-k8s/                # Agent-created skill
+│       ├── SKILL.md
+│       └── references/
+├── .hub/                          # Skills Hub state
+│   ├── lock.json
+│   ├── quarantine/
+│   └── audit.log
+└── .bundled_manifest              # Tracks seeded bundled skills
+```
+
+## Agent-Managed Skills (skill_manage tool)
+
+The agent can create, update, and delete its own skills via the `skill_manage` tool. This is the agent's **procedural memory** — when it figures out a non-trivial workflow, it saves the approach as a skill for future reuse.
+
+### When the Agent Creates Skills
+
+- After completing a complex task (5+ tool calls) successfully
+- When it hit errors or dead ends and found the working path
+- When the user corrected its approach
+- When it discovered a non-trivial workflow
+
+### Actions
+
+| Action | Use for | Key params |
+|--------|---------|------------|
+| `create` | New skill from scratch | `name`, `content` (full SKILL.md), optional `category` |
+| `patch` | Targeted fixes (preferred) | `name`, `old_string`, `new_string` |
+| `edit` | Major structural rewrites | `name`, `content` (full SKILL.md replacement) |
+| `delete` | Remove a skill entirely | `name` |
+| `write_file` | Add/update supporting files | `name`, `file_path`, `file_content` |
+| `remove_file` | Remove a supporting file | `name`, `file_path` |
+
+:::tip
+The `patch` action is preferred for updates — it's more token-efficient than `edit` because only the changed text appears in the tool call.
+:::
+
+## Skills Hub
+
+Browse, search, install, and manage skills from online registries, `skills.sh`, direct well-known skill endpoints, and official optional skills.
+
+### Common commands
+
+```bash
+hermes skills browse                              # Browse all hub skills (official first)
+hermes skills browse --source official            # Browse only official optional skills
+hermes skills search kubernetes                   # Search all sources
+hermes skills search react --source skills-sh     # Search the skills.sh directory
+hermes skills search https://mintlify.com/docs --source well-known
+hermes skills inspect openai/skills/k8s           # Preview before installing
+hermes skills install openai/skills/k8s           # Install with security scan
+hermes skills install official/security/1password
+hermes skills install skills-sh/vercel-labs/json-render/json-render-react --force
+hermes skills install well-known:https://mintlify.com/docs/.well-known/skills/mintlify
+hermes skills list --source hub                   # List hub-installed skills
+hermes skills check                               # Check installed hub skills for upstream updates
+hermes skills update                              # Reinstall hub skills with upstream changes when needed
+hermes skills audit                               # Re-scan all hub skills for security
+hermes skills uninstall k8s                       # Remove a hub skill
+hermes skills publish skills/my-skill --to github --repo owner/repo
+hermes skills snapshot export setup.json          # Export skill config
+hermes skills tap add myorg/skills-repo           # Add a custom GitHub source
+```
+
+### Supported hub sources
+
+| Source | Example | Notes |
+|--------|---------|-------|
+| `official` | `official/security/1password` | Optional skills shipped with Hermes. |
+| `skills-sh` | `skills-sh/vercel-labs/agent-skills/vercel-react-best-practices` | Searchable via `hermes skills search <query> --source skills-sh`. Hermes resolves alias-style skills when the skills.sh slug differs from the repo folder. |
+| `well-known` | `well-known:https://mintlify.com/docs/.well-known/skills/mintlify` | Skills served directly from `/.well-known/skills/index.json` on a website. Search using the site or docs URL. |
+| `github` | `openai/skills/k8s` | Direct GitHub repo/path installs and custom taps. |
+| `clawhub`, `lobehub`, `claude-marketplace` | Source-specific identifiers | Community or marketplace integrations. |
+
+### Integrated hubs and registries
+
+Hermes currently integrates with these skills ecosystems and discovery sources:
+
+#### 1. Official optional skills (`official`)
+
+These are maintained in the Hermes repository itself and install with builtin trust.
+
+- Catalog: [Official Optional Skills Catalog](../../reference/optional-skills-catalog)
+- Source in repo: `optional-skills/`
+- Example:
+
+```bash
+hermes skills browse --source official
+hermes skills install official/security/1password
+```
+
+#### 2. skills.sh (`skills-sh`)
+
+This is Vercel's public skills directory. Hermes can search it directly, inspect skill detail pages, resolve alias-style slugs, and install from the underlying source repo.
+
+- Directory: [skills.sh](https://skills.sh/)
+- CLI/tooling repo: [vercel-labs/skills](https://github.com/vercel-labs/skills)
+- Official Vercel skills repo: [vercel-labs/agent-skills](https://github.com/vercel-labs/agent-skills)
+- Example:
+
+```bash
+hermes skills search react --source skills-sh
+hermes skills inspect skills-sh/vercel-labs/json-render/json-render-react
+hermes skills install skills-sh/vercel-labs/json-render/json-render-react --force
+```
+
+#### 3. Well-known skill endpoints (`well-known`)
+
+This is URL-based discovery from sites that publish `/.well-known/skills/index.json`. It is not a single centralized hub — it is a web discovery convention.
+
+- Example live endpoint: [Mintlify docs skills index](https://mintlify.com/docs/.well-known/skills/index.json)
+- Reference server implementation: [vercel-labs/skills-handler](https://github.com/vercel-labs/skills-handler)
+- Example:
+
+```bash
+hermes skills search https://mintlify.com/docs --source well-known
+hermes skills inspect well-known:https://mintlify.com/docs/.well-known/skills/mintlify
+hermes skills install well-known:https://mintlify.com/docs/.well-known/skills/mintlify
+```
+
+#### 4. Direct GitHub skills (`github`)
+
+Hermes can install directly from GitHub repositories and GitHub-based taps. This is useful when you already know the repo/path or want to add your own custom source repo.
+
+- OpenAI skills: [openai/skills](https://github.com/openai/skills)
+- Anthropic skills: [anthropics/skills](https://github.com/anthropics/skills)
+- Example community tap source: [VoltAgent/awesome-agent-skills](https://github.com/VoltAgent/awesome-agent-skills)
+- Example:
+
+```bash
+hermes skills install openai/skills/k8s
+hermes skills tap add myorg/skills-repo
+```
+
+#### 5. ClawHub (`clawhub`)
+
+A third-party skills marketplace integrated as a community source.
+
+- Site: [clawhub.ai](https://clawhub.ai/)
+- Hermes source id: `clawhub`
+
+#### 6. Claude marketplace-style repos (`claude-marketplace`)
+
+Hermes supports marketplace repos that publish Claude-compatible plugin/marketplace manifests.
+
+Known integrated sources include:
+- [anthropics/skills](https://github.com/anthropics/skills)
+- [aiskillstore/marketplace](https://github.com/aiskillstore/marketplace)
+
+Hermes source id: `claude-marketplace`
+
+#### 7. LobeHub (`lobehub`)
+
+Hermes can search and convert agent entries from LobeHub's public catalog into installable Hermes skills.
+
+- Site: [LobeHub](https://lobehub.com/)
+- Public agents index: [chat-agents.lobehub.com](https://chat-agents.lobehub.com/)
+- Backing repo: [lobehub/lobe-chat-agents](https://github.com/lobehub/lobe-chat-agents)
+- Hermes source id: `lobehub`
+
+### Security scanning and `--force`
+
+All hub-installed skills go through a **security scanner** that checks for data exfiltration, prompt injection, destructive commands, supply-chain signals, and other threats.
+
+`hermes skills inspect ...` now also surfaces upstream metadata when available:
+- repo URL
+- skills.sh detail page URL
+- install command
+- weekly installs
+- upstream security audit statuses
+- well-known index/endpoint URLs
+
+Use `--force` when you have reviewed a third-party skill and want to override a non-dangerous policy block:
+
+```bash
+hermes skills install skills-sh/anthropics/skills/pdf --force
+```
+
+Important behavior:
+- `--force` can override policy blocks for caution/warn-style findings.
+- `--force` does **not** override a `dangerous` scan verdict.
+- Official optional skills (`official/...`) are treated as builtin trust and do not show the third-party warning panel.
+
+### Trust levels
+
+| Level | Source | Policy |
+|-------|--------|--------|
+| `builtin` | Ships with Hermes | Always trusted |
+| `official` | `optional-skills/` in the repo | Builtin trust, no third-party warning |
+| `trusted` | Trusted registries/repos such as `openai/skills`, `anthropics/skills` | More permissive policy than community sources |
+| `community` | Everything else (`skills.sh`, well-known endpoints, custom GitHub repos, most marketplaces) | Non-dangerous findings can be overridden with `--force`; `dangerous` verdicts stay blocked |
+
+### Update lifecycle
+
+The hub now tracks enough provenance to re-check upstream copies of installed skills:
+
+```bash
+hermes skills check          # Report which installed hub skills changed upstream
+hermes skills update         # Reinstall only the skills with updates available
+hermes skills update react   # Update one specific installed hub skill
+```
+
+This uses the stored source identifier plus the current upstream bundle content hash to detect drift.
+
+### Slash commands (inside chat)
+
+All the same commands work with `/skills`:
+
+```text
+/skills browse
+/skills search react --source skills-sh
+/skills search https://mintlify.com/docs --source well-known
+/skills inspect skills-sh/vercel-labs/json-render/json-render-react
+/skills install openai/skills/skill-creator --force
+/skills check
+/skills update
+/skills list
+```
+
+Official optional skills still use identifiers like `official/security/1password` and `official/migration/openclaw-migration`.
--- a/hermes_code/website/docs/user-guide/features/skins.md
+++ b/hermes_code/website/docs/user-guide/features/skins.md
@ -0,0 +1,81 @@
+---
+sidebar_position: 10
+title: "Skins & Themes"
+description: "Customize the Hermes CLI with built-in and user-defined skins"
+---
+
+# Skins & Themes
+
+Skins control the **visual presentation** of the Hermes CLI: banner colors, spinner faces and verbs, response-box labels, branding text, and the tool activity prefix.
+
+Conversational style and visual style are separate concepts:
+
+- **Personality** changes the agent's tone and wording.
+- **Skin** changes the CLI's appearance.
+
+## Change skins
+
+```bash
+/skin                # show the current skin and list available skins
+/skin ares           # switch to a built-in skin
+/skin mytheme        # switch to a custom skin from ~/.hermes/skins/mytheme.yaml
+```
+
+Or set the default skin in `~/.hermes/config.yaml`:
+
+```yaml
+display:
+  skin: default
+```
+
+## Built-in skins
+
+| Skin | Description | Agent branding |
+|------|-------------|----------------|
+| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` |
+| `ares` | War-god theme — crimson and bronze | `Ares Agent` |
+| `mono` | Monochrome — clean grayscale | `Hermes Agent` |
+| `slate` | Cool blue — developer-focused | `Hermes Agent` |
+| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` |
+| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` |
+| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` |
+
+## What a skin can customize
+
+| Area | Keys |
+|------|------|
+| Banner + response colors | `colors.banner_*`, `colors.response_border` |
+| Spinner animation | `spinner.waiting_faces`, `spinner.thinking_faces`, `spinner.thinking_verbs`, `spinner.wings` |
+| Branding text | `branding.agent_name`, `branding.welcome`, `branding.response_label`, `branding.prompt_symbol` |
+| Tool activity prefix | `tool_prefix` |
+
+## Custom skins
+
+Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin.
+
+```yaml
+name: cyberpunk
+description: Neon terminal theme
+
+colors:
+  banner_border: "#FF00FF"
+  banner_title: "#00FFFF"
+  banner_accent: "#FF1493"
+
+spinner:
+  thinking_verbs: ["jacking in", "decrypting", "uploading"]
+  wings:
+    - ["⟨⚡", "⚡⟩"]
+
+branding:
+  agent_name: "Cyber Agent"
+  response_label: " ⚡ Cyber "
+
+tool_prefix: "▏"
+```
+
+## Operational notes
+
+- Built-in skins load from `hermes_cli/skin_engine.py`.
+- Unknown skins automatically fall back to `default`.
+- `/skin` updates the active CLI theme immediately for the current session.
--- a/hermes_code/website/docs/user-guide/features/tools.md
+++ b/hermes_code/website/docs/user-guide/features/tools.md
@ -0,0 +1,165 @@
+---
+sidebar_position: 1
+title: "Tools & Toolsets"
+description: "Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends"
+---
+
+# Tools & Toolsets
+
+Tools are functions that extend the agent's capabilities. They're organized into logical **toolsets** that can be enabled or disabled per platform.
+
+## Available Tools
+
+Hermes ships with a broad built-in tool registry covering web search, browser automation, terminal execution, file editing, memory, delegation, RL training, messaging delivery, Home Assistant, Honcho memory, and more.
+
+High-level categories:
+
+| Category | Examples | Description |
+|----------|----------|-------------|
+| **Web** | `web_search`, `web_extract` | Search the web and extract page content. |
+| **Terminal & Files** | `terminal`, `process`, `read_file`, `patch` | Execute commands and manipulate files. |
+| **Browser** | `browser_navigate`, `browser_snapshot`, `browser_vision` | Interactive browser automation with text and vision support. |
+| **Media** | `vision_analyze`, `image_generate`, `text_to_speech` | Multimodal analysis and generation. |
+| **Agent orchestration** | `todo`, `clarify`, `execute_code`, `delegate_task` | Planning, clarification, code execution, and subagent delegation. |
+| **Memory & recall** | `memory`, `session_search`, `honcho_*` | Persistent memory, session search, and Honcho cross-session context. |
+| **Automation & delivery** | `cronjob`, `send_message` | Scheduled tasks with create/list/update/pause/resume/run/remove actions, plus outbound messaging delivery. |
+| **Integrations** | `ha_*`, MCP server tools, `rl_*` | Home Assistant, MCP, RL training, and other integrations. |
+
+For the authoritative code-derived registry, see [Built-in Tools Reference](/docs/reference/tools-reference) and [Toolsets Reference](/docs/reference/toolsets-reference).
+
+## Using Toolsets
+
+```bash
+# Use specific toolsets
+hermes chat --toolsets "web,terminal"
+
+# See all available tools
+hermes tools
+
+# Configure tools per platform (interactive)
+hermes tools
+```
+
+Common toolsets include `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, `honcho`, `homeassistant`, and `rl`.
+
+See [Toolsets Reference](/docs/reference/toolsets-reference) for the full set, including platform presets such as `hermes-cli`, `hermes-telegram`, and dynamic MCP toolsets like `mcp-<server>`.
+
+## Terminal Backends
+
+The terminal tool can execute commands in different environments:
+
+| Backend | Description | Use Case |
+|---------|-------------|----------|
+| `local` | Run on your machine (default) | Development, trusted tasks |
+| `docker` | Isolated containers | Security, reproducibility |
+| `ssh` | Remote server | Sandboxing, keep agent away from its own code |
+| `singularity` | HPC containers | Cluster computing, rootless |
+| `modal` | Cloud execution | Serverless, scale |
+| `daytona` | Cloud sandbox workspace | Persistent remote dev environments |
+
+### Configuration
+
+```yaml
+# In ~/.hermes/config.yaml
+terminal:
+  backend: local    # or: docker, ssh, singularity, modal, daytona
+  cwd: "."          # Working directory
+  timeout: 180      # Command timeout in seconds
+```
+
+### Docker Backend
+
+```yaml
+terminal:
+  backend: docker
+  docker_image: python:3.11-slim
+```
+
+### SSH Backend
+
+Recommended for security — agent can't modify its own code:
+
+```yaml
+terminal:
+  backend: ssh
+```
+```bash
+# Set credentials in ~/.hermes/.env
+TERMINAL_SSH_HOST=my-server.example.com
+TERMINAL_SSH_USER=myuser
+TERMINAL_SSH_KEY=~/.ssh/id_rsa
+```
+
+### Singularity/Apptainer
+
+```bash
+# Pre-build SIF for parallel workers
+apptainer build ~/python.sif docker://python:3.11-slim
+
+# Configure
+hermes config set terminal.backend singularity
+hermes config set terminal.singularity_image ~/python.sif
+```
+
+### Modal (Serverless Cloud)
+
+```bash
+uv pip install "swe-rex[modal]"
+modal setup
+hermes config set terminal.backend modal
+```
+
+### Container Resources
+
+Configure CPU, memory, disk, and persistence for all container backends:
+
+```yaml
+terminal:
+  backend: docker  # or singularity, modal, daytona
+  container_cpu: 1              # CPU cores (default: 1)
+  container_memory: 5120        # Memory in MB (default: 5GB)
+  container_disk: 51200         # Disk in MB (default: 50GB)
+  container_persistent: true    # Persist filesystem across sessions (default: true)
+```
+
+When `container_persistent: true`, installed packages, files, and config survive across sessions.
+
+### Container Security
+
+All container backends run with security hardening:
+
+- Read-only root filesystem (Docker)
+- All Linux capabilities dropped
+- No privilege escalation
+- PID limits (256 processes)
+- Full namespace isolation
+- Persistent workspace via volumes, not writable root layer
+
+Docker can optionally receive an explicit env allowlist via `terminal.docker_forward_env`, but forwarded variables are visible to commands inside the container and should be treated as exposed to that session.
+
+## Background Process Management
+
+Start background processes and manage them:
+
+```python
+terminal(command="pytest -v tests/", background=true)
+# Returns: {"session_id": "proc_abc123", "pid": 12345}
+
+# Then manage with the process tool:
+process(action="list")       # Show all running processes
+process(action="poll", session_id="proc_abc123")   # Check status
+process(action="wait", session_id="proc_abc123")   # Block until done
+process(action="log", session_id="proc_abc123")    # Full output
+process(action="kill", session_id="proc_abc123")   # Terminate
+process(action="write", session_id="proc_abc123", data="y")  # Send input
+```
+
+PTY mode (`pty=true`) enables interactive CLI tools like Codex and Claude Code.
+
+## Sudo Support
+
+If a command needs sudo, you'll be prompted for your password (cached for the session). Or set `SUDO_PASSWORD` in `~/.hermes/.env`.
+
+:::warning
+On messaging platforms, if sudo fails, the output includes a tip to add `SUDO_PASSWORD` to `~/.hermes/.env`.
+:::
--- a/hermes_code/website/docs/user-guide/features/tts.md
+++ b/hermes_code/website/docs/user-guide/features/tts.md
@ -0,0 +1,128 @@
+---
+sidebar_position: 9
+title: "Voice & TTS"
+description: "Text-to-speech and voice message transcription across all platforms"
+---
+
+# Voice & TTS
+
+Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
+
+## Text-to-Speech
+
+Convert text to speech with four providers:
+
+| Provider | Quality | Cost | API Key |
+|----------|---------|------|---------|
+| **Edge TTS** (default) | Good | Free | None needed |
+| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
+| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
+| **NeuTTS** | Good | Free | None needed |
+
+### Platform Delivery
+
+| Platform | Delivery | Format |
+|----------|----------|--------|
+| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
+| Discord | Voice bubble (Opus/OGG), falls back to file attachment | Opus/MP3 |
+| WhatsApp | Audio file attachment | MP3 |
+| CLI | Saved to `~/.hermes/audio_cache/` | MP3 |
+
+### Configuration
+
+```yaml
+# In ~/.hermes/config.yaml
+tts:
+  provider: "edge"              # "edge" | "elevenlabs" | "openai" | "neutts"
+  edge:
+    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
+  elevenlabs:
+    voice_id: "pNInz6obpgDQGcFmaJgB"  # Adam
+    model_id: "eleven_multilingual_v2"
+  openai:
+    model: "gpt-4o-mini-tts"
+    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer
+    base_url: "https://api.openai.com/v1"  # Override for OpenAI-compatible TTS endpoints
+  neutts:
+    ref_audio: ''
+    ref_text: ''
+    model: neuphonic/neutts-air-q4-gguf
+    device: cpu
+```
+
+### Telegram Voice Bubbles & ffmpeg
+
+Telegram voice bubbles require Opus/OGG audio format:
+
+- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
+- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
+- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
+
+```bash
+# Ubuntu/Debian
+sudo apt install ffmpeg
+
+# macOS
+brew install ffmpeg
+
+# Fedora
+sudo dnf install ffmpeg
+```
+
+Without ffmpeg, Edge TTS and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
+
+:::tip
+If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
+:::
+
+## Voice Message Transcription (STT)
+
+Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
+
+| Provider | Quality | Cost | API Key |
+|----------|---------|------|---------| 
+| **Local Whisper** (default) | Good | Free | None needed |
+| **Groq Whisper API** | Good–Best | Free tier | `GROQ_API_KEY` |
+| **OpenAI Whisper API** | Good–Best | Paid | `VOICE_TOOLS_OPENAI_KEY` or `OPENAI_API_KEY` |
+
+:::info Zero Config
+Local transcription works out of the box when `faster-whisper` is installed. If that's unavailable, Hermes can also use a local `whisper` CLI from common install locations (like `/opt/homebrew/bin`) or a custom command via `HERMES_LOCAL_STT_COMMAND`.
+:::
+
+### Configuration
+
+```yaml
+# In ~/.hermes/config.yaml
+stt:
+  provider: "local"           # "local" | "groq" | "openai"
+  local:
+    model: "base"             # tiny, base, small, medium, large-v3
+  openai:
+    model: "whisper-1"        # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
+```
+
+### Provider Details
+
+**Local (faster-whisper)** — Runs Whisper locally via [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Uses CPU by default, GPU if available. Model sizes:
+
+| Model | Size | Speed | Quality |
+|-------|------|-------|---------|
+| `tiny` | ~75 MB | Fastest | Basic |
+| `base` | ~150 MB | Fast | Good (default) |
+| `small` | ~500 MB | Medium | Better |
+| `medium` | ~1.5 GB | Slower | Great |
+| `large-v3` | ~3 GB | Slowest | Best |
+
+**Groq API** — Requires `GROQ_API_KEY`. Good cloud fallback when you want a free hosted STT option.
+
+**OpenAI API** — Accepts `VOICE_TOOLS_OPENAI_KEY` first and falls back to `OPENAI_API_KEY`. Supports `whisper-1`, `gpt-4o-mini-transcribe`, and `gpt-4o-transcribe`.
+
+**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders.
+
+### Fallback Behavior
+
+If your configured provider isn't available, Hermes automatically falls back:
+- **Local faster-whisper unavailable** → Tries a local `whisper` CLI or `HERMES_LOCAL_STT_COMMAND` before cloud providers
+- **Groq key not set** → Falls back to local transcription, then OpenAI
+- **OpenAI key not set** → Falls back to local transcription, then Groq
+- **Nothing available** → Voice messages pass through with an accurate note to the user
--- a/hermes_code/website/docs/user-guide/features/vision.md
+++ b/hermes_code/website/docs/user-guide/features/vision.md
@ -0,0 +1,187 @@
+---
+title: Vision & Image Paste
+description: Paste images from your clipboard into the Hermes CLI for multimodal vision analysis.
+sidebar_label: Vision & Image Paste
+sidebar_position: 7
+---
+
+# Vision & Image Paste
+
+Hermes Agent supports **multimodal vision** — you can paste images from your clipboard directly into the CLI and ask the agent to analyze, describe, or work with them. Images are sent to the model as base64-encoded content blocks, so any vision-capable model can process them.
+
+## How It Works
+
+1. Copy an image to your clipboard (screenshot, browser image, etc.)
+2. Attach it using one of the methods below
+3. Type your question and press Enter
+4. The image appears as a `[📎 Image #1]` badge above the input
+5. On submit, the image is sent to the model as a vision content block
+
+You can attach multiple images before sending — each gets its own badge. Press `Ctrl+C` to clear all attached images.
+
+Images are saved to `~/.hermes/images/` as PNG files with timestamped filenames.
+
+## Paste Methods
+
+How you attach an image depends on your terminal environment. Not all methods work everywhere — here's the full breakdown:
+
+### `/paste` Command
+
+**The most reliable method. Works everywhere.**
+
+```
+/paste
+```
+
+Type `/paste` and press Enter. Hermes checks your clipboard for an image and attaches it. This works in every environment because it explicitly calls the clipboard backend — no terminal keybinding interception to worry about.
+
+### Ctrl+V / Cmd+V (Bracketed Paste)
+
+When you paste text that's on the clipboard alongside an image, Hermes automatically checks for an image too. This works when:
+- Your clipboard contains **both text and an image** (some apps put both on the clipboard when you copy)
+- Your terminal supports bracketed paste (most modern terminals do)
+
+:::warning
+If your clipboard has **only an image** (no text), Ctrl+V does nothing in most terminals. Terminals can only paste text — there's no standard mechanism to paste binary image data. Use `/paste` or Alt+V instead.
+:::
+
+### Alt+V
+
+Alt key combinations pass through most terminal emulators (they're sent as ESC + key rather than being intercepted). Press `Alt+V` to check the clipboard for an image.
+
+:::caution
+**Does not work in VSCode's integrated terminal.** VSCode intercepts many Alt+key combos for its own UI. Use `/paste` instead.
+:::
+
+### Ctrl+V (Raw — Linux Only)
+
+On Linux desktop terminals (GNOME Terminal, Konsole, Alacritty, etc.), `Ctrl+V` is **not** the paste shortcut — `Ctrl+Shift+V` is. So `Ctrl+V` sends a raw byte to the application, and Hermes catches it to check the clipboard. This only works on Linux desktop terminals with X11 or Wayland clipboard access.
+
+## Platform Compatibility
+
+| Environment | `/paste` | Ctrl+V text+image | Alt+V | Notes |
+|---|:---:|:---:|:---:|---|
+| **macOS Terminal / iTerm2** | ✅ | ✅ | ✅ | Best experience — `osascript` always available |
+| **Linux X11 desktop** | ✅ | ✅ | ✅ | Requires `xclip` (`apt install xclip`) |
+| **Linux Wayland desktop** | ✅ | ✅ | ✅ | Requires `wl-paste` (`apt install wl-clipboard`) |
+| **WSL2 (Windows Terminal)** | ✅ | ✅¹ | ✅ | Uses `powershell.exe` — no extra install needed |
+| **VSCode Terminal (local)** | ✅ | ✅¹ | ❌ | VSCode intercepts Alt+key |
+| **VSCode Terminal (SSH)** | ❌² | ❌² | ❌ | Remote clipboard not accessible |
+| **SSH terminal (any)** | ❌² | ❌² | ❌² | Remote clipboard not accessible |
+
+¹ Only when clipboard has both text and an image (image-only clipboard = nothing happens)
+² See [SSH & Remote Sessions](#ssh--remote-sessions) below
+
+## Platform-Specific Setup
+
+### macOS
+
+**No setup required.** Hermes uses `osascript` (built into macOS) to read the clipboard. For faster performance, optionally install `pngpaste`:
+
+```bash
+brew install pngpaste
+```
+
+### Linux (X11)
+
+Install `xclip`:
+
+```bash
+# Ubuntu/Debian
+sudo apt install xclip
+
+# Fedora
+sudo dnf install xclip
+
+# Arch
+sudo pacman -S xclip
+```
+
+### Linux (Wayland)
+
+Modern Linux desktops (Ubuntu 22.04+, Fedora 34+) often use Wayland by default. Install `wl-clipboard`:
+
+```bash
+# Ubuntu/Debian
+sudo apt install wl-clipboard
+
+# Fedora
+sudo dnf install wl-clipboard
+
+# Arch
+sudo pacman -S wl-clipboard
+```
+
+:::tip How to check if you're on Wayland
+```bash
+echo $XDG_SESSION_TYPE
+# "wayland" = Wayland, "x11" = X11, "tty" = no display server
+```
+:::
+
+### WSL2
+
+**No extra setup required.** Hermes detects WSL2 automatically (via `/proc/version`) and uses `powershell.exe` to access the Windows clipboard through .NET's `System.Windows.Forms.Clipboard`. This is built into WSL2's Windows interop — `powershell.exe` is available by default.
+
+The clipboard data is transferred as base64-encoded PNG over stdout, so no file path conversion or temp files are needed.
+
+:::info WSLg Note
+If you're running WSLg (WSL2 with GUI support), Hermes tries the PowerShell path first, then falls back to `wl-paste`. WSLg's clipboard bridge only supports BMP format for images — Hermes auto-converts BMP to PNG using Pillow (if installed) or ImageMagick's `convert` command.
+:::
+
+#### Verify WSL2 clipboard access
+
+```bash
+# 1. Check WSL detection
+grep -i microsoft /proc/version
+
+# 2. Check PowerShell is accessible
+which powershell.exe
+
+# 3. Copy an image, then check
+powershell.exe -NoProfile -Command "Add-Type -AssemblyName System.Windows.Forms; [System.Windows.Forms.Clipboard]::ContainsImage()"
+# Should print "True"
+```
+
+## SSH & Remote Sessions
+
+**Clipboard paste does not work over SSH.** When you SSH into a remote machine, the Hermes CLI runs on the remote host. All clipboard tools (`xclip`, `wl-paste`, `powershell.exe`, `osascript`) read the clipboard of the machine they run on — which is the remote server, not your local machine. Your local clipboard is inaccessible from the remote side.
+
+### Workarounds for SSH
+
+1. **Upload the image file** — Save the image locally, upload it to the remote server via `scp`, VSCode's file explorer (drag-and-drop), or any file transfer method. Then reference it by path. *(A `/attach <filepath>` command is planned for a future release.)*
+
+2. **Use a URL** — If the image is accessible online, just paste the URL in your message. The agent can use `vision_analyze` to look at any image URL directly.
+
+3. **X11 forwarding** — Connect with `ssh -X` to forward X11. This lets `xclip` on the remote machine access your local X11 clipboard. Requires an X server running locally (XQuartz on macOS, built-in on Linux X11 desktops). Slow for large images.
+
+4. **Use a messaging platform** — Send images to Hermes via Telegram, Discord, Slack, or WhatsApp. These platforms handle image upload natively and are not affected by clipboard/terminal limitations.
+
+## Why Terminals Can't Paste Images
+
+This is a common source of confusion, so here's the technical explanation:
+
+Terminals are **text-based** interfaces. When you press Ctrl+V (or Cmd+V), the terminal emulator:
+
+1. Reads the clipboard for **text content**
+2. Wraps it in [bracketed paste](https://en.wikipedia.org/wiki/Bracketed-paste) escape sequences
+3. Sends it to the application through the terminal's text stream
+
+If the clipboard contains only an image (no text), the terminal has nothing to send. There is no standard terminal escape sequence for binary image data. The terminal simply does nothing.
+
+This is why Hermes uses a separate clipboard check — instead of receiving image data through the terminal paste event, it calls OS-level tools (`osascript`, `powershell.exe`, `xclip`, `wl-paste`) directly via subprocess to read the clipboard independently.
+
+## Supported Models
+
+Image paste works with any vision-capable model. The image is sent as a base64-encoded data URL in the OpenAI vision content format:
+
+```json
+{
+  "type": "image_url",
+  "image_url": {
+    "url": "data:image/png;base64,..."
+  }
+}
+```
+
+Most modern models support this format, including GPT-4 Vision, Claude (with vision), Gemini, and open-source multimodal models served through OpenRouter.
--- a/hermes_code/website/docs/user-guide/features/voice-mode.md
+++ b/hermes_code/website/docs/user-guide/features/voice-mode.md
@ -0,0 +1,508 @@
+---
+sidebar_position: 10
+title: "Voice Mode"
+description: "Real-time voice conversations with Hermes Agent — CLI, Telegram, Discord (DMs, text channels, and voice channels)"
+---
+
+# Voice Mode
+
+Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
+
+If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
+
+## Prerequisites
+
+Before using voice features, make sure you have:
+
+1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
+2. **An LLM provider configured** — run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
+3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
+
+:::tip
+The `~/.hermes/` directory and default `config.yaml` are created automatically the first time you run `hermes`. You only need to create `~/.hermes/.env` manually for API keys.
+:::
+
+## Overview
+
+| Feature | Platform | Description |
+|---------|----------|-------------|
+| **Interactive Voice** | CLI | Press Ctrl+B to record, agent auto-detects silence and responds |
+| **Auto Voice Reply** | Telegram, Discord | Agent sends spoken audio alongside text responses |
+| **Voice Channel** | Discord | Bot joins VC, listens to users speaking, speaks replies back |
+
+## Requirements
+
+### Python Packages
+
+```bash
+# CLI voice mode (microphone + audio playback)
+pip install "hermes-agent[voice]"
+
+# Discord + Telegram messaging (includes discord.py[voice] for VC support)
+pip install "hermes-agent[messaging]"
+
+# Premium TTS (ElevenLabs)
+pip install "hermes-agent[tts-premium]"
+
+# Local TTS (NeuTTS, optional)
+python -m pip install -U neutts[all]
+
+# Everything at once
+pip install "hermes-agent[all]"
+```
+
+| Extra | Packages | Required For |
+|-------|----------|-------------|
+| `voice` | `sounddevice`, `numpy` | CLI voice mode |
+| `messaging` | `discord.py[voice]`, `python-telegram-bot`, `aiohttp` | Discord & Telegram bots |
+| `tts-premium` | `elevenlabs` | ElevenLabs TTS provider |
+
+Optional local TTS provider: install `neutts` separately with `python -m pip install -U neutts[all]`. On first use it downloads the model automatically.
+
+:::info
+`discord.py[voice]` installs **PyNaCl** (for voice encryption) and **opus bindings** automatically. This is required for Discord voice channel support.
+:::
+
+### System Dependencies
+
+```bash
+# macOS
+brew install portaudio ffmpeg opus
+brew install espeak-ng   # for NeuTTS
+
+# Ubuntu/Debian
+sudo apt install portaudio19-dev ffmpeg libopus0
+sudo apt install espeak-ng   # for NeuTTS
+```
+
+| Dependency | Purpose | Required For |
+|-----------|---------|-------------|
+| **PortAudio** | Microphone input and audio playback | CLI voice mode |
+| **ffmpeg** | Audio format conversion (MP3 → Opus, PCM → WAV) | All platforms |
+| **Opus** | Discord voice codec | Discord voice channels |
+| **espeak-ng** | Phonemizer backend | Local NeuTTS provider |
+
+### API Keys
+
+Add to `~/.hermes/.env`:
+
+```bash
+# Speech-to-Text — local provider needs NO key at all
+# pip install faster-whisper          # Free, runs locally, recommended
+GROQ_API_KEY=your-key                 # Groq Whisper — fast, free tier (cloud)
+VOICE_TOOLS_OPENAI_KEY=your-key       # OpenAI Whisper — paid (cloud)
+
+# Text-to-Speech (optional — Edge TTS and NeuTTS work without any key)
+ELEVENLABS_API_KEY=***           # ElevenLabs — premium quality
+# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
+```
+
+:::tip
+If `faster-whisper` is installed, voice mode works with **zero API keys** for STT. The model (~150 MB for `base`) downloads automatically on first use.
+:::
+
+---
+
+## CLI Voice Mode
+
+### Quick Start
+
+Start the CLI and enable voice mode:
+
+```bash
+hermes                # Start the interactive CLI
+```
+
+Then use these commands inside the CLI:
+
+```
+/voice          Toggle voice mode on/off
+/voice on       Enable voice mode
+/voice off      Disable voice mode
+/voice tts      Toggle TTS output
+/voice status   Show current state
+```
+
+### How It Works
+
+1. Start the CLI with `hermes` and enable voice mode with `/voice on`
+2. **Press Ctrl+B** — a beep plays (880Hz), recording starts
+3. **Speak** — a live audio level bar shows your input: `● [▁▂▃▅▇▇▅▂] ❯`
+4. **Stop speaking** — after 3 seconds of silence, recording auto-stops
+5. **Two beeps** play (660Hz) confirming the recording ended
+6. Audio is transcribed via Whisper and sent to the agent
+7. If TTS is enabled, the agent's reply is spoken aloud
+8. Recording **automatically restarts** — speak again without pressing any key
+
+This loop continues until you press **Ctrl+B** during recording (exits continuous mode) or 3 consecutive recordings detect no speech.
+
+:::tip
+The record key is configurable via `voice.record_key` in `~/.hermes/config.yaml` (default: `ctrl+b`).
+:::
+
+### Silence Detection
+
+Two-stage algorithm detects when you've finished speaking:
+
+1. **Speech confirmation** — waits for audio above the RMS threshold (200) for at least 0.3s, tolerating brief dips between syllables
+2. **End detection** — once speech is confirmed, triggers after 3.0 seconds of continuous silence
+
+If no speech is detected at all for 15 seconds, recording stops automatically.
+
+Both `silence_threshold` and `silence_duration` are configurable in `config.yaml`.
+
+### Streaming TTS
+
+When TTS is enabled, the agent speaks its reply **sentence-by-sentence** as it generates text — you don't wait for the full response:
+
+1. Buffers text deltas into complete sentences (min 20 chars)
+2. Strips markdown formatting and `<think>` blocks
+3. Generates and plays audio per sentence in real-time
+
+### Hallucination Filter
+
+Whisper sometimes generates phantom text from silence or background noise ("Thank you for watching", "Subscribe", etc.). The agent filters these out using a set of 26 known hallucination phrases across multiple languages, plus a regex pattern that catches repetitive variations.
+
+---
+
+## Gateway Voice Reply (Telegram & Discord)
+
+If you haven't set up your messaging bots yet, see the platform-specific guides:
+- [Telegram Setup Guide](../messaging/telegram.md)
+- [Discord Setup Guide](../messaging/discord.md)
+
+Start the gateway to connect to your messaging platforms:
+
+```bash
+hermes gateway        # Start the gateway (connects to configured platforms)
+hermes gateway setup  # Interactive setup wizard for first-time configuration
+```
+
+### Discord: Channels vs DMs
+
+The bot supports two interaction modes on Discord:
+
+| Mode | How to Talk | Mention Required | Setup |
+|------|------------|-----------------|-------|
+| **Direct Message (DM)** | Open the bot's profile → "Message" | No | Works immediately |
+| **Server Channel** | Type in a text channel where the bot is present | Yes (`@botname`) | Bot must be invited to the server |
+
+**DM (recommended for personal use):** Just open a DM with the bot and type — no @mention needed. Voice replies and all commands work the same as in channels.
+
+**Server channels:** The bot only responds when you @mention it (e.g. `@hermesbyt4 hello`). Make sure you select the **bot user** from the mention popup, not the role with the same name.
+
+:::tip
+To disable the mention requirement in server channels, add to `~/.hermes/.env`:
+```bash
+DISCORD_REQUIRE_MENTION=false
+```
+Or set specific channels as free-response (no mention needed):
+```bash
+DISCORD_FREE_RESPONSE_CHANNELS=123456789,987654321
+```
+:::
+
+### Commands
+
+These work in both Telegram and Discord (DMs and text channels):
+
+```
+/voice          Toggle voice mode on/off
+/voice on       Voice replies only when you send a voice message
+/voice tts      Voice replies for ALL messages
+/voice off      Disable voice replies
+/voice status   Show current setting
+```
+
+### Modes
+
+| Mode | Command | Behavior |
+|------|---------|----------|
+| `off` | `/voice off` | Text only (default) |
+| `voice_only` | `/voice on` | Speaks reply only when you send a voice message |
+| `all` | `/voice tts` | Speaks reply to every message |
+
+Voice mode setting is persisted across gateway restarts.
+
+### Platform Delivery
+
+| Platform | Format | Notes |
+|----------|--------|-------|
+| **Telegram** | Voice bubble (Opus/OGG) | Plays inline in chat. ffmpeg converts MP3 → Opus if needed |
+| **Discord** | Native voice bubble (Opus/OGG) | Plays inline like a user voice message. Falls back to file attachment if voice bubble API fails |
+
+---
+
+## Discord Voice Channels
+
+The most immersive voice feature: the bot joins a Discord voice channel, listens to users speaking, transcribes their speech, processes through the agent, and speaks the reply back in the voice channel.
+
+### Setup
+
+#### 1. Discord Bot Permissions
+
+If you already have a Discord bot set up for text (see [Discord Setup Guide](../messaging/discord.md)), you need to add voice permissions.
+
+Go to the [Discord Developer Portal](https://discord.com/developers/applications) → your application → **Installation** → **Default Install Settings** → **Guild Install**:
+
+**Add these permissions to the existing text permissions:**
+
+| Permission | Purpose | Required |
+|-----------|---------|----------|
+| **Connect** | Join voice channels | Yes |
+| **Speak** | Play TTS audio in voice channels | Yes |
+| **Use Voice Activity** | Detect when users are speaking | Recommended |
+
+**Updated Permissions Integer:**
+
+| Level | Integer | What's Included |
+|-------|---------|----------------|
+| Text only | `274878286912` | View Channels, Send Messages, Read History, Embeds, Attachments, Threads, Reactions |
+| Text + Voice | `274881432640` | All above + Connect, Speak |
+
+**Re-invite the bot** with the updated permissions URL:
+
+```
+https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274881432640
+```
+
+Replace `YOUR_APP_ID` with your Application ID from the Developer Portal.
+
+:::warning
+Re-inviting the bot to a server it's already in will update its permissions without removing it. You won't lose any data or configuration.
+:::
+
+#### 2. Privileged Gateway Intents
+
+In the [Developer Portal](https://discord.com/developers/applications) → your application → **Bot** → **Privileged Gateway Intents**, enable all three:
+
+| Intent | Purpose |
+|--------|---------|
+| **Presence Intent** | Detect user online/offline status |
+| **Server Members Intent** | Map voice SSRC identifiers to Discord user IDs |
+| **Message Content Intent** | Read text message content in channels |
+
+All three are required for full voice channel functionality. **Server Members Intent** is especially critical — without it, the bot cannot identify who is speaking in the voice channel.
+
+#### 3. Opus Codec
+
+The Opus codec library must be installed on the machine running the gateway:
+
+```bash
+# macOS (Homebrew)
+brew install opus
+
+# Ubuntu/Debian
+sudo apt install libopus0
+```
+
+The bot auto-loads the codec from:
+- **macOS:** `/opt/homebrew/lib/libopus.dylib`
+- **Linux:** `libopus.so.0`
+
+#### 4. Environment Variables
+
+```bash
+# ~/.hermes/.env
+
+# Discord bot (already configured for text)
+DISCORD_BOT_TOKEN=your-bot-token
+DISCORD_ALLOWED_USERS=your-user-id
+
+# STT — local provider needs no key (pip install faster-whisper)
+# GROQ_API_KEY=your-key            # Alternative: cloud-based, fast, free tier
+
+# TTS — optional. Edge TTS and NeuTTS need no key.
+# ELEVENLABS_API_KEY=***      # Premium quality
+# VOICE_TOOLS_OPENAI_KEY=***  # OpenAI TTS / Whisper
+```
+
+### Start the Gateway
+
+```bash
+hermes gateway        # Start with existing configuration
+```
+
+The bot should come online in Discord within a few seconds.
+
+### Commands
+
+Use these in the Discord text channel where the bot is present:
+
+```
+/voice join      Bot joins your current voice channel
+/voice channel   Alias for /voice join
+/voice leave     Bot disconnects from voice channel
+/voice status    Show voice mode and connected channel
+```
+
+:::info
+You must be in a voice channel before running `/voice join`. The bot joins the same VC you're in.
+:::
+
+### How It Works
+
+When the bot joins a voice channel, it:
+
+1. **Listens** to each user's audio stream independently
+2. **Detects silence** — 1.5s of silence after at least 0.5s of speech triggers processing
+3. **Transcribes** the audio via Whisper STT (local, Groq, or OpenAI)
+4. **Processes** through the full agent pipeline (session, tools, memory)
+5. **Speaks** the reply back in the voice channel via TTS
+
+### Text Channel Integration
+
+When the bot is in a voice channel:
+
+- Transcripts appear in the text channel: `[Voice] @user: what you said`
+- Agent responses are sent as text in the channel AND spoken in the VC
+- The text channel is the one where `/voice join` was issued
+
+### Echo Prevention
+
+The bot automatically pauses its audio listener while playing TTS replies, preventing it from hearing and re-processing its own output.
+
+### Access Control
+
+Only users listed in `DISCORD_ALLOWED_USERS` can interact via voice. Other users' audio is silently ignored.
+
+```bash
+# ~/.hermes/.env
+DISCORD_ALLOWED_USERS=284102345871466496
+```
+
+---
+
+## Configuration Reference
+
+### config.yaml
+
+```yaml
+# Voice recording (CLI)
+voice:
+  record_key: "ctrl+b"            # Key to start/stop recording
+  max_recording_seconds: 120       # Maximum recording length
+  auto_tts: false                  # Auto-enable TTS when voice mode starts
+  silence_threshold: 200           # RMS level (0-32767) below which counts as silence
+  silence_duration: 3.0            # Seconds of silence before auto-stop
+
+# Speech-to-Text
+stt:
+  provider: "local"                  # "local" (free) | "groq" | "openai"
+  local:
+    model: "base"                    # tiny, base, small, medium, large-v3
+  # model: "whisper-1"              # Legacy: used when provider is not set
+
+# Text-to-Speech
+tts:
+  provider: "edge"                 # "edge" (free) | "elevenlabs" | "openai" | "neutts"
+  edge:
+    voice: "en-US-AriaNeural"      # 322 voices, 74 languages
+  elevenlabs:
+    voice_id: "pNInz6obpgDQGcFmaJgB"    # Adam
+    model_id: "eleven_multilingual_v2"
+  openai:
+    model: "gpt-4o-mini-tts"
+    voice: "alloy"                 # alloy, echo, fable, onyx, nova, shimmer
+    base_url: "https://api.openai.com/v1"  # optional: override for self-hosted or OpenAI-compatible endpoints
+  neutts:
+    ref_audio: ''
+    ref_text: ''
+    model: neuphonic/neutts-air-q4-gguf
+    device: cpu
+```
+
+### Environment Variables
+
+```bash
+# Speech-to-Text providers (local needs no key)
+# pip install faster-whisper        # Free local STT — no API key needed
+GROQ_API_KEY=...                    # Groq Whisper (fast, free tier)
+VOICE_TOOLS_OPENAI_KEY=...         # OpenAI Whisper (paid)
+
+# STT advanced overrides (optional)
+STT_GROQ_MODEL=whisper-large-v3-turbo    # Override default Groq STT model
+STT_OPENAI_MODEL=whisper-1               # Override default OpenAI STT model
+GROQ_BASE_URL=https://api.groq.com/openai/v1     # Custom Groq endpoint
+STT_OPENAI_BASE_URL=https://api.openai.com/v1    # Custom OpenAI STT endpoint
+
+# Text-to-Speech providers (Edge TTS and NeuTTS need no key)
+ELEVENLABS_API_KEY=***             # ElevenLabs (premium quality)
+# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
+
+# Discord voice channel
+DISCORD_BOT_TOKEN=...
+DISCORD_ALLOWED_USERS=...
+```
+
+### STT Provider Comparison
+
+| Provider | Model | Speed | Quality | Cost | API Key |
+|----------|-------|-------|---------|------|---------|
+| **Local** | `base` | Fast (depends on CPU/GPU) | Good | Free | No |
+| **Local** | `small` | Medium | Better | Free | No |
+| **Local** | `large-v3` | Slow | Best | Free | No |
+| **Groq** | `whisper-large-v3-turbo` | Very fast (~0.5s) | Good | Free tier | Yes |
+| **Groq** | `whisper-large-v3` | Fast (~1s) | Better | Free tier | Yes |
+| **OpenAI** | `whisper-1` | Fast (~1s) | Good | Paid | Yes |
+| **OpenAI** | `gpt-4o-transcribe` | Medium (~2s) | Best | Paid | Yes |
+
+Provider priority (automatic fallback): **local** > **groq** > **openai**
+
+### TTS Provider Comparison
+
+| Provider | Quality | Cost | Latency | Key Required |
+|----------|---------|------|---------|-------------|
+| **Edge TTS** | Good | Free | ~1s | No |
+| **ElevenLabs** | Excellent | Paid | ~2s | Yes |
+| **OpenAI TTS** | Good | Paid | ~1.5s | Yes |
+| **NeuTTS** | Good | Free | Depends on CPU/GPU | No |
+
+NeuTTS uses the `tts.neutts` config block above.
+
+---
+
+## Troubleshooting
+
+### "No audio device found" (CLI)
+
+PortAudio is not installed:
+
+```bash
+brew install portaudio    # macOS
+sudo apt install portaudio19-dev  # Ubuntu
+```
+
+### Bot doesn't respond in Discord server channels
+
+The bot requires an @mention by default in server channels. Make sure you:
+
+1. Type `@` and select the **bot user** (with the #discriminator), not the **role** with the same name
+2. Or use DMs instead — no mention needed
+3. Or set `DISCORD_REQUIRE_MENTION=false` in `~/.hermes/.env`
+
+### Bot joins VC but doesn't hear me
+
+- Check your Discord user ID is in `DISCORD_ALLOWED_USERS`
+- Make sure you're not muted in Discord
+- The bot needs a SPEAKING event from Discord before it can map your audio — start speaking within a few seconds of joining
+
+### Bot hears me but doesn't respond
+
+- Verify STT is available: install `faster-whisper` (no key needed) or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY`
+- Check the LLM model is configured and accessible
+- Review gateway logs: `tail -f ~/.hermes/logs/gateway.log`
+
+### Bot responds in text but not in voice channel
+
+- TTS provider may be failing — check API key and quota
+- Edge TTS (free, no key) is the default fallback
+- Check logs for TTS errors
+
+### Whisper returns garbage text
+
+The hallucination filter catches most cases automatically. If you're still getting phantom transcripts:
+
+- Use a quieter environment
+- Adjust `silence_threshold` in config (higher = less sensitive)
+- Try a different STT model