The architecture has been updated
This commit is contained in:
parent
805f7a017e
commit
a01257ead9
1119 changed files with 226 additions and 352 deletions
8
hermes_code/website/docs/user-guide/_category_.json
Normal file
8
hermes_code/website/docs/user-guide/_category_.json
Normal file
|
|
@ -0,0 +1,8 @@
|
|||
{
|
||||
"label": "User Guide",
|
||||
"position": 2,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "Learn how to use Hermes Agent effectively."
|
||||
}
|
||||
}
|
||||
203
hermes_code/website/docs/user-guide/checkpoints-and-rollback.md
Normal file
203
hermes_code/website/docs/user-guide/checkpoints-and-rollback.md
Normal file
|
|
@ -0,0 +1,203 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "Checkpoints and /rollback"
|
||||
description: "Filesystem safety nets for destructive operations using shadow git repos and automatic snapshots"
|
||||
---
|
||||
|
||||
# Checkpoints and `/rollback`
|
||||
|
||||
Hermes Agent automatically snapshots your project before **destructive operations** and lets you restore it with a single command. Checkpoints are **enabled by default** — there's zero cost when no file-mutating tools fire.
|
||||
|
||||
This safety net is powered by an internal **Checkpoint Manager** that keeps a separate shadow git repository under `~/.hermes/checkpoints/` — your real project `.git` is never touched.
|
||||
|
||||
## What Triggers a Checkpoint
|
||||
|
||||
Checkpoints are taken automatically before:
|
||||
|
||||
- **File tools** — `write_file` and `patch`
|
||||
- **Destructive terminal commands** — `rm`, `mv`, `sed -i`, `truncate`, `shred`, output redirects (`>`), and `git reset`/`clean`/`checkout`
|
||||
|
||||
The agent creates **at most one checkpoint per directory per turn**, so long-running sessions don't spam snapshots.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/rollback` | List all checkpoints with change stats |
|
||||
| `/rollback <N>` | Restore to checkpoint N (also undoes last chat turn) |
|
||||
| `/rollback diff <N>` | Preview diff between checkpoint N and current state |
|
||||
| `/rollback <N> <file>` | Restore a single file from checkpoint N |
|
||||
|
||||
## How Checkpoints Work
|
||||
|
||||
At a high level:
|
||||
|
||||
- Hermes detects when tools are about to **modify files** in your working tree.
|
||||
- Once per conversation turn (per directory), it:
|
||||
- Resolves a reasonable project root for the file.
|
||||
- Initialises or reuses a **shadow git repo** tied to that directory.
|
||||
- Stages and commits the current state with a short, human‑readable reason.
|
||||
- These commits form a checkpoint history that you can inspect and restore via `/rollback`.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
user["User command\n(hermes, gateway)"]
|
||||
agent["AIAgent\n(run_agent.py)"]
|
||||
tools["File & terminal tools"]
|
||||
cpMgr["CheckpointManager"]
|
||||
shadowRepo["Shadow git repo\n~/.hermes/checkpoints/<hash>"]
|
||||
|
||||
user --> agent
|
||||
agent -->|"tool call"| tools
|
||||
tools -->|"before mutate\nensure_checkpoint()"| cpMgr
|
||||
cpMgr -->|"git add/commit"| shadowRepo
|
||||
cpMgr -->|"OK / skipped"| tools
|
||||
tools -->|"apply changes"| agent
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Checkpoints are enabled by default. Configure in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
checkpoints:
|
||||
enabled: true # master switch (default: true)
|
||||
max_snapshots: 50 # max checkpoints per directory
|
||||
```
|
||||
|
||||
To disable:
|
||||
|
||||
```yaml
|
||||
checkpoints:
|
||||
enabled: false
|
||||
```
|
||||
|
||||
When disabled, the Checkpoint Manager is a no‑op and never attempts git operations.
|
||||
|
||||
## Listing Checkpoints
|
||||
|
||||
From a CLI session:
|
||||
|
||||
```
|
||||
/rollback
|
||||
```
|
||||
|
||||
Hermes responds with a formatted list showing change statistics:
|
||||
|
||||
```text
|
||||
📸 Checkpoints for /path/to/project:
|
||||
|
||||
1. 4270a8c 2026-03-16 04:36 before patch (1 file, +1/-0)
|
||||
2. eaf4c1f 2026-03-16 04:35 before write_file
|
||||
3. b3f9d2e 2026-03-16 04:34 before terminal: sed -i s/old/new/ config.py (1 file, +1/-1)
|
||||
|
||||
/rollback <N> restore to checkpoint N
|
||||
/rollback diff <N> preview changes since checkpoint N
|
||||
/rollback <N> <file> restore a single file from checkpoint N
|
||||
```
|
||||
|
||||
Each entry shows:
|
||||
|
||||
- Short hash
|
||||
- Timestamp
|
||||
- Reason (what triggered the snapshot)
|
||||
- Change summary (files changed, insertions/deletions)
|
||||
|
||||
## Previewing Changes with `/rollback diff`
|
||||
|
||||
Before committing to a restore, preview what has changed since a checkpoint:
|
||||
|
||||
```
|
||||
/rollback diff 1
|
||||
```
|
||||
|
||||
This shows a git diff stat summary followed by the actual diff:
|
||||
|
||||
```text
|
||||
test.py | 2 +-
|
||||
1 file changed, 1 insertion(+), 1 deletion(-)
|
||||
|
||||
diff --git a/test.py b/test.py
|
||||
--- a/test.py
|
||||
+++ b/test.py
|
||||
@@ -1 +1 @@
|
||||
-print('original content')
|
||||
+print('modified content')
|
||||
```
|
||||
|
||||
Long diffs are capped at 80 lines to avoid flooding the terminal.
|
||||
|
||||
## Restoring with `/rollback`
|
||||
|
||||
Restore to a checkpoint by number:
|
||||
|
||||
```
|
||||
/rollback 1
|
||||
```
|
||||
|
||||
Behind the scenes, Hermes:
|
||||
|
||||
1. Verifies the target commit exists in the shadow repo.
|
||||
2. Takes a **pre‑rollback snapshot** of the current state so you can "undo the undo" later.
|
||||
3. Restores tracked files in your working directory.
|
||||
4. **Undoes the last conversation turn** so the agent's context matches the restored filesystem state.
|
||||
|
||||
On success:
|
||||
|
||||
```text
|
||||
✅ Restored to checkpoint 4270a8c5: before patch
|
||||
A pre-rollback snapshot was saved automatically.
|
||||
(^_^)b Undid 4 message(s). Removed: "Now update test.py to ..."
|
||||
4 message(s) remaining in history.
|
||||
Chat turn undone to match restored file state.
|
||||
```
|
||||
|
||||
The conversation undo ensures the agent doesn't "remember" changes that have been rolled back, avoiding confusion on the next turn.
|
||||
|
||||
## Single-File Restore
|
||||
|
||||
Restore just one file from a checkpoint without affecting the rest of the directory:
|
||||
|
||||
```
|
||||
/rollback 1 src/broken_file.py
|
||||
```
|
||||
|
||||
This is useful when the agent made changes to multiple files but only one needs to be reverted.
|
||||
|
||||
## Safety and Performance Guards
|
||||
|
||||
To keep checkpointing safe and fast, Hermes applies several guardrails:
|
||||
|
||||
- **Git availability** — if `git` is not found on `PATH`, checkpoints are transparently disabled.
|
||||
- **Directory scope** — Hermes skips overly broad directories (root `/`, home `$HOME`).
|
||||
- **Repository size** — directories with more than 50,000 files are skipped to avoid slow git operations.
|
||||
- **No‑change snapshots** — if there are no changes since the last snapshot, the checkpoint is skipped.
|
||||
- **Non‑fatal errors** — all errors inside the Checkpoint Manager are logged at debug level; your tools continue to run.
|
||||
|
||||
## Where Checkpoints Live
|
||||
|
||||
All shadow repos live under:
|
||||
|
||||
```text
|
||||
~/.hermes/checkpoints/
|
||||
├── <hash1>/ # shadow git repo for one working directory
|
||||
├── <hash2>/
|
||||
└── ...
|
||||
```
|
||||
|
||||
Each `<hash>` is derived from the absolute path of the working directory. Inside each shadow repo you'll find:
|
||||
|
||||
- Standard git internals (`HEAD`, `refs/`, `objects/`)
|
||||
- An `info/exclude` file containing a curated ignore list
|
||||
- A `HERMES_WORKDIR` file pointing back to the original project root
|
||||
|
||||
You normally never need to touch these manually.
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **Leave checkpoints enabled** — they're on by default and have zero cost when no files are modified.
|
||||
- **Use `/rollback diff` before restoring** — preview what will change to pick the right checkpoint.
|
||||
- **Use `/rollback` instead of `git reset`** when you want to undo agent-driven changes only.
|
||||
- **Combine with Git worktrees** for maximum safety — keep each Hermes session in its own worktree/branch, with checkpoints as an extra layer.
|
||||
|
||||
For running multiple agents in parallel on the same repo, see the guide on [Git worktrees](./git-worktrees.md).
|
||||
349
hermes_code/website/docs/user-guide/cli.md
Normal file
349
hermes_code/website/docs/user-guide/cli.md
Normal file
|
|
@ -0,0 +1,349 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: "CLI Interface"
|
||||
description: "Master the Hermes Agent terminal interface — commands, keybindings, personalities, and more"
|
||||
---
|
||||
|
||||
# CLI Interface
|
||||
|
||||
Hermes Agent's CLI is a full terminal user interface (TUI) — not a web UI. It features multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output. Built for people who live in the terminal.
|
||||
|
||||
## Running the CLI
|
||||
|
||||
```bash
|
||||
# Start an interactive session (default)
|
||||
hermes
|
||||
|
||||
# Single query mode (non-interactive)
|
||||
hermes chat -q "Hello"
|
||||
|
||||
# With a specific model
|
||||
hermes chat --model "anthropic/claude-sonnet-4"
|
||||
|
||||
# With a specific provider
|
||||
hermes chat --provider nous # Use Nous Portal
|
||||
hermes chat --provider openrouter # Force OpenRouter
|
||||
|
||||
# With specific toolsets
|
||||
hermes chat --toolsets "web,terminal,skills"
|
||||
|
||||
# Start with one or more skills preloaded
|
||||
hermes -s hermes-agent-dev,github-auth
|
||||
hermes chat -s github-pr-workflow -q "open a draft PR"
|
||||
|
||||
# Resume previous sessions
|
||||
hermes --continue # Resume the most recent CLI session (-c)
|
||||
hermes --resume <session_id> # Resume a specific session by ID (-r)
|
||||
|
||||
# Verbose mode (debug output)
|
||||
hermes chat --verbose
|
||||
|
||||
# Isolated git worktree (for running multiple agents in parallel)
|
||||
hermes -w # Interactive mode in worktree
|
||||
hermes -w -q "Fix issue #123" # Single query in worktree
|
||||
```
|
||||
|
||||
## Interface Layout
|
||||
|
||||
<img className="docs-terminal-figure" src="/img/docs/cli-layout.svg" alt="Stylized preview of the Hermes CLI layout showing the banner, conversation area, and fixed input prompt." />
|
||||
<p className="docs-figure-caption">The Hermes CLI banner, conversation stream, and fixed input prompt rendered as a stable docs figure instead of fragile text art.</p>
|
||||
|
||||
The welcome banner shows your model, terminal backend, working directory, available tools, and installed skills at a glance.
|
||||
|
||||
### Status Bar
|
||||
|
||||
A persistent status bar sits above the input area, updating in real time:
|
||||
|
||||
```
|
||||
⚕ claude-sonnet-4-20250514 │ 12.4K/200K │ [██████░░░░] 6% │ $0.06 │ 15m
|
||||
```
|
||||
|
||||
| Element | Description |
|
||||
|---------|-------------|
|
||||
| Model name | Current model (truncated if longer than 26 chars) |
|
||||
| Token count | Context tokens used / max context window |
|
||||
| Context bar | Visual fill indicator with color-coded thresholds |
|
||||
| Cost | Estimated session cost (or `n/a` for unknown/zero-priced models) |
|
||||
| Duration | Elapsed session time |
|
||||
|
||||
The bar adapts to terminal width — full layout at ≥ 76 columns, compact at 52–75, minimal (model + duration only) below 52.
|
||||
|
||||
**Context color coding:**
|
||||
|
||||
| Color | Threshold | Meaning |
|
||||
|-------|-----------|---------|
|
||||
| Green | < 50% | Plenty of room |
|
||||
| Yellow | 50–80% | Getting full |
|
||||
| Orange | 80–95% | Approaching limit |
|
||||
| Red | ≥ 95% | Near overflow — consider `/compress` |
|
||||
|
||||
Use `/usage` for a detailed breakdown including per-category costs (input vs output tokens).
|
||||
|
||||
### Session Resume Display
|
||||
|
||||
When resuming a previous session (`hermes -c` or `hermes --resume <id>`), a "Previous Conversation" panel appears between the banner and the input prompt, showing a compact recap of the conversation history. See [Sessions — Conversation Recap on Resume](sessions.md#conversation-recap-on-resume) for details and configuration.
|
||||
|
||||
## Keybindings
|
||||
|
||||
| Key | Action |
|
||||
|-----|--------|
|
||||
| `Enter` | Send message |
|
||||
| `Alt+Enter` or `Ctrl+J` | New line (multi-line input) |
|
||||
| `Alt+V` | Paste an image from the clipboard when supported by the terminal |
|
||||
| `Ctrl+V` | Paste text and opportunistically attach clipboard images |
|
||||
| `Ctrl+B` | Start/stop voice recording when voice mode is enabled (`voice.record_key`, default: `ctrl+b`) |
|
||||
| `Ctrl+C` | Interrupt agent (double-press within 2s to force exit) |
|
||||
| `Ctrl+D` | Exit |
|
||||
| `Tab` | Accept auto-suggestion (ghost text) or autocomplete slash commands |
|
||||
|
||||
## Slash Commands
|
||||
|
||||
Type `/` to see the autocomplete dropdown. Hermes supports a large set of CLI slash commands, dynamic skill commands, and user-defined quick commands.
|
||||
|
||||
Common examples:
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/help` | Show command help |
|
||||
| `/model` | Show or change the current model |
|
||||
| `/tools` | List currently available tools |
|
||||
| `/skills browse` | Browse the skills hub and official optional skills |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session |
|
||||
| `/skin` | Show or switch the active CLI skin |
|
||||
| `/voice on` | Enable CLI voice mode (press `Ctrl+B` to record) |
|
||||
| `/voice tts` | Toggle spoken playback for Hermes replies |
|
||||
| `/reasoning high` | Increase reasoning effort |
|
||||
| `/title My Session` | Name the current session |
|
||||
|
||||
For the full built-in CLI and messaging lists, see [Slash Commands Reference](../reference/slash-commands.md).
|
||||
|
||||
For setup, providers, silence tuning, and messaging/Discord voice usage, see [Voice Mode](features/voice-mode.md).
|
||||
|
||||
:::tip
|
||||
Commands are case-insensitive — `/HELP` works the same as `/help`. Installed skills also become slash commands automatically.
|
||||
:::
|
||||
|
||||
## Quick Commands
|
||||
|
||||
You can define custom commands that run shell commands instantly without invoking the LLM. These work in both the CLI and messaging platforms (Telegram, Discord, etc.).
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/config.yaml
|
||||
quick_commands:
|
||||
status:
|
||||
type: exec
|
||||
command: systemctl status hermes-agent
|
||||
gpu:
|
||||
type: exec
|
||||
command: nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader
|
||||
```
|
||||
|
||||
Then type `/status` or `/gpu` in any chat. See the [Configuration guide](/docs/user-guide/configuration#quick-commands) for more examples.
|
||||
|
||||
## Preloading Skills at Launch
|
||||
|
||||
If you already know which skills you want active for the session, pass them at launch time:
|
||||
|
||||
```bash
|
||||
hermes -s hermes-agent-dev,github-auth
|
||||
hermes chat -s github-pr-workflow -s github-auth
|
||||
```
|
||||
|
||||
Hermes loads each named skill into the session prompt before the first turn. The same flag works in interactive mode and single-query mode.
|
||||
|
||||
## Skill Slash Commands
|
||||
|
||||
Every installed skill in `~/.hermes/skills/` is automatically registered as a slash command. The skill name becomes the command:
|
||||
|
||||
```
|
||||
/gif-search funny cats
|
||||
/axolotl help me fine-tune Llama 3 on my dataset
|
||||
/github-pr-workflow create a PR for the auth refactor
|
||||
|
||||
# Just the skill name loads it and lets the agent ask what you need:
|
||||
/excalidraw
|
||||
```
|
||||
|
||||
## Personalities
|
||||
|
||||
Set a predefined personality to change the agent's tone:
|
||||
|
||||
```
|
||||
/personality pirate
|
||||
/personality kawaii
|
||||
/personality concise
|
||||
```
|
||||
|
||||
Built-in personalities include: `helpful`, `concise`, `technical`, `creative`, `teacher`, `kawaii`, `catgirl`, `pirate`, `shakespeare`, `surfer`, `noir`, `uwu`, `philosopher`, `hype`.
|
||||
|
||||
You can also define custom personalities in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
personalities:
|
||||
helpful: "You are a helpful, friendly AI assistant."
|
||||
kawaii: "You are a kawaii assistant! Use cute expressions..."
|
||||
pirate: "Arrr! Ye be talkin' to Captain Hermes..."
|
||||
# Add your own!
|
||||
```
|
||||
|
||||
## Multi-line Input
|
||||
|
||||
There are two ways to enter multi-line messages:
|
||||
|
||||
1. **`Alt+Enter` or `Ctrl+J`** — inserts a new line
|
||||
2. **Backslash continuation** — end a line with `\` to continue:
|
||||
|
||||
```
|
||||
❯ Write a function that:\
|
||||
1. Takes a list of numbers\
|
||||
2. Returns the sum
|
||||
```
|
||||
|
||||
:::info
|
||||
Pasting multi-line text is supported — use `Alt+Enter` or `Ctrl+J` to insert newlines, or simply paste content directly.
|
||||
:::
|
||||
|
||||
## Interrupting the Agent
|
||||
|
||||
You can interrupt the agent at any point:
|
||||
|
||||
- **Type a new message + Enter** while the agent is working — it interrupts and processes your new instructions
|
||||
- **`Ctrl+C`** — interrupt the current operation (press twice within 2s to force exit)
|
||||
- In-progress terminal commands are killed immediately (SIGTERM, then SIGKILL after 1s)
|
||||
- Multiple messages typed during interrupt are combined into one prompt
|
||||
|
||||
## Tool Progress Display
|
||||
|
||||
The CLI shows animated feedback as the agent works:
|
||||
|
||||
**Thinking animation** (during API calls):
|
||||
```
|
||||
◜ (。•́︿•̀。) pondering... (1.2s)
|
||||
◠ (⊙_⊙) contemplating... (2.4s)
|
||||
✧٩(ˊᗜˋ*)و✧ got it! (3.1s)
|
||||
```
|
||||
|
||||
**Tool execution feed:**
|
||||
```
|
||||
┊ 💻 terminal `ls -la` (0.3s)
|
||||
┊ 🔍 web_search (1.2s)
|
||||
┊ 📄 web_extract (2.1s)
|
||||
```
|
||||
|
||||
Cycle through display modes with `/verbose`: `off → new → all → verbose`.
|
||||
|
||||
## Session Management
|
||||
|
||||
### Resuming Sessions
|
||||
|
||||
When you exit a CLI session, a resume command is printed:
|
||||
|
||||
```
|
||||
Resume this session with:
|
||||
hermes --resume 20260225_143052_a1b2c3
|
||||
|
||||
Session: 20260225_143052_a1b2c3
|
||||
Duration: 12m 34s
|
||||
Messages: 28 (5 user, 18 tool calls)
|
||||
```
|
||||
|
||||
Resume options:
|
||||
|
||||
```bash
|
||||
hermes --continue # Resume the most recent CLI session
|
||||
hermes -c # Short form
|
||||
hermes -c "my project" # Resume a named session (latest in lineage)
|
||||
hermes --resume 20260225_143052_a1b2c3 # Resume a specific session by ID
|
||||
hermes --resume "refactoring auth" # Resume by title
|
||||
hermes -r 20260225_143052_a1b2c3 # Short form
|
||||
```
|
||||
|
||||
Resuming restores the full conversation history from SQLite. The agent sees all previous messages, tool calls, and responses — just as if you never left.
|
||||
|
||||
Use `/title My Session Name` inside a chat to name the current session, or `hermes sessions rename <id> <title>` from the command line. Use `hermes sessions list` to browse past sessions.
|
||||
|
||||
### Session Storage
|
||||
|
||||
CLI sessions are stored in Hermes's SQLite state database under `~/.hermes/state.db`. The database keeps:
|
||||
|
||||
- session metadata (ID, title, timestamps, token counters)
|
||||
- message history
|
||||
- lineage across compressed/resumed sessions
|
||||
- full-text search indexes used by `session_search`
|
||||
|
||||
Some messaging adapters also keep per-platform transcript files alongside the database, but the CLI itself resumes from the SQLite session store.
|
||||
|
||||
### Context Compression
|
||||
|
||||
Long conversations are automatically summarized when approaching context limits:
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
compression:
|
||||
enabled: true
|
||||
threshold: 0.50 # Compress at 50% of context limit by default
|
||||
summary_model: "google/gemini-3-flash-preview" # Model used for summarization
|
||||
```
|
||||
|
||||
When compression triggers, middle turns are summarized while the first 3 and last 4 turns are always preserved.
|
||||
|
||||
## Background Sessions
|
||||
|
||||
Run a prompt in a separate background session while continuing to use the CLI for other work:
|
||||
|
||||
```
|
||||
/background Analyze the logs in /var/log and summarize any errors from today
|
||||
```
|
||||
|
||||
Hermes immediately confirms the task and gives you back the prompt:
|
||||
|
||||
```
|
||||
🔄 Background task #1 started: "Analyze the logs in /var/log and summarize..."
|
||||
Task ID: bg_143022_a1b2c3
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
Each `/background` prompt spawns a **completely separate agent session** in a daemon thread:
|
||||
|
||||
- **Isolated conversation** — the background agent has no knowledge of your current session's history. It receives only the prompt you provide.
|
||||
- **Same configuration** — the background agent inherits your model, provider, toolsets, reasoning settings, and fallback model from the current session.
|
||||
- **Non-blocking** — your foreground session stays fully interactive. You can chat, run commands, or even start more background tasks.
|
||||
- **Multiple tasks** — you can run several background tasks simultaneously. Each gets a numbered ID.
|
||||
|
||||
### Results
|
||||
|
||||
When a background task finishes, the result appears as a panel in your terminal:
|
||||
|
||||
```
|
||||
╭─ ⚕ Hermes (background #1) ──────────────────────────────────╮
|
||||
│ Found 3 errors in syslog from today: │
|
||||
│ 1. OOM killer invoked at 03:22 — killed process nginx │
|
||||
│ 2. Disk I/O error on /dev/sda1 at 07:15 │
|
||||
│ 3. Failed SSH login attempts from 192.168.1.50 at 14:30 │
|
||||
╰──────────────────────────────────────────────────────────────╯
|
||||
```
|
||||
|
||||
If the task fails, you'll see an error notification instead. If `display.bell_on_complete` is enabled in your config, the terminal bell rings when the task finishes.
|
||||
|
||||
### Use Cases
|
||||
|
||||
- **Long-running research** — "/background research the latest developments in quantum error correction" while you work on code
|
||||
- **File processing** — "/background analyze all Python files in this repo and list any security issues" while you continue a conversation
|
||||
- **Parallel investigations** — start multiple background tasks to explore different angles simultaneously
|
||||
|
||||
:::info
|
||||
Background sessions do not appear in your main conversation history. They are standalone sessions with their own task ID (e.g., `bg_143022_a1b2c3`).
|
||||
:::
|
||||
|
||||
## Quiet Mode
|
||||
|
||||
By default, the CLI runs in quiet mode which:
|
||||
- Suppresses verbose logging from tools
|
||||
- Enables kawaii-style animated feedback
|
||||
- Keeps output clean and user-friendly
|
||||
|
||||
For debug output:
|
||||
```bash
|
||||
hermes chat --verbose
|
||||
```
|
||||
1544
hermes_code/website/docs/user-guide/configuration.md
Normal file
1544
hermes_code/website/docs/user-guide/configuration.md
Normal file
File diff suppressed because it is too large
Load diff
|
|
@ -0,0 +1,8 @@
|
|||
{
|
||||
"label": "Features",
|
||||
"position": 4,
|
||||
"link": {
|
||||
"type": "generated-index",
|
||||
"description": "Explore the powerful features of Hermes Agent."
|
||||
}
|
||||
}
|
||||
197
hermes_code/website/docs/user-guide/features/acp.md
Normal file
197
hermes_code/website/docs/user-guide/features/acp.md
Normal file
|
|
@ -0,0 +1,197 @@
|
|||
---
|
||||
sidebar_position: 11
|
||||
title: "ACP Editor Integration"
|
||||
description: "Use Hermes Agent inside ACP-compatible editors such as VS Code, Zed, and JetBrains"
|
||||
---
|
||||
|
||||
# ACP Editor Integration
|
||||
|
||||
Hermes Agent can run as an ACP server, letting ACP-compatible editors talk to Hermes over stdio and render:
|
||||
|
||||
- chat messages
|
||||
- tool activity
|
||||
- file diffs
|
||||
- terminal commands
|
||||
- approval prompts
|
||||
- streamed thinking / response chunks
|
||||
|
||||
ACP is a good fit when you want Hermes to behave like an editor-native coding agent instead of a standalone CLI or messaging bot.
|
||||
|
||||
## What Hermes exposes in ACP mode
|
||||
|
||||
Hermes runs with a curated `hermes-acp` toolset designed for editor workflows. It includes:
|
||||
|
||||
- file tools: `read_file`, `write_file`, `patch`, `search_files`
|
||||
- terminal tools: `terminal`, `process`
|
||||
- web/browser tools
|
||||
- memory, todo, session search
|
||||
- skills
|
||||
- execute_code and delegate_task
|
||||
- vision
|
||||
|
||||
It intentionally excludes things that do not fit typical editor UX, such as messaging delivery and cronjob management.
|
||||
|
||||
## Installation
|
||||
|
||||
Install Hermes normally, then add the ACP extra:
|
||||
|
||||
```bash
|
||||
pip install -e '.[acp]'
|
||||
```
|
||||
|
||||
This installs the `agent-client-protocol` dependency and enables:
|
||||
|
||||
- `hermes acp`
|
||||
- `hermes-acp`
|
||||
- `python -m acp_adapter`
|
||||
|
||||
## Launching the ACP server
|
||||
|
||||
Any of the following starts Hermes in ACP mode:
|
||||
|
||||
```bash
|
||||
hermes acp
|
||||
```
|
||||
|
||||
```bash
|
||||
hermes-acp
|
||||
```
|
||||
|
||||
```bash
|
||||
python -m acp_adapter
|
||||
```
|
||||
|
||||
Hermes logs to stderr so stdout remains reserved for ACP JSON-RPC traffic.
|
||||
|
||||
## Editor setup
|
||||
|
||||
### VS Code
|
||||
|
||||
Install an ACP client extension, then point it at the repo's `acp_registry/` directory.
|
||||
|
||||
Example settings snippet:
|
||||
|
||||
```json
|
||||
{
|
||||
"acpClient.agents": [
|
||||
{
|
||||
"name": "hermes-agent",
|
||||
"registryDir": "/path/to/hermes-agent/acp_registry"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Zed
|
||||
|
||||
Example settings snippet:
|
||||
|
||||
```json
|
||||
{
|
||||
"acp": {
|
||||
"agents": [
|
||||
{
|
||||
"name": "hermes-agent",
|
||||
"registry_dir": "/path/to/hermes-agent/acp_registry"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### JetBrains
|
||||
|
||||
Use an ACP-compatible plugin and point it at:
|
||||
|
||||
```text
|
||||
/path/to/hermes-agent/acp_registry
|
||||
```
|
||||
|
||||
## Registry manifest
|
||||
|
||||
The ACP registry manifest lives at:
|
||||
|
||||
```text
|
||||
acp_registry/agent.json
|
||||
```
|
||||
|
||||
It advertises a command-based agent whose launch command is:
|
||||
|
||||
```text
|
||||
hermes acp
|
||||
```
|
||||
|
||||
## Configuration and credentials
|
||||
|
||||
ACP mode uses the same Hermes configuration as the CLI:
|
||||
|
||||
- `~/.hermes/.env`
|
||||
- `~/.hermes/config.yaml`
|
||||
- `~/.hermes/skills/`
|
||||
- `~/.hermes/state.db`
|
||||
|
||||
Provider resolution uses Hermes' normal runtime resolver, so ACP inherits the currently configured provider and credentials.
|
||||
|
||||
## Session behavior
|
||||
|
||||
ACP sessions are tracked by the ACP adapter's in-memory session manager while the server is running.
|
||||
|
||||
Each session stores:
|
||||
|
||||
- session ID
|
||||
- working directory
|
||||
- selected model
|
||||
- current conversation history
|
||||
- cancel event
|
||||
|
||||
The underlying `AIAgent` still uses Hermes' normal persistence/logging paths, but ACP `list/load/resume/fork` are scoped to the currently running ACP server process.
|
||||
|
||||
## Working directory behavior
|
||||
|
||||
ACP sessions bind the editor's cwd to the Hermes task ID so file and terminal tools run relative to the editor workspace, not the server process cwd.
|
||||
|
||||
## Approvals
|
||||
|
||||
Dangerous terminal commands can be routed back to the editor as approval prompts. ACP approval options are simpler than the CLI flow:
|
||||
|
||||
- allow once
|
||||
- allow always
|
||||
- deny
|
||||
|
||||
On timeout or error, the approval bridge denies the request.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### ACP agent does not appear in the editor
|
||||
|
||||
Check:
|
||||
|
||||
- the editor is pointed at the correct `acp_registry/` path
|
||||
- Hermes is installed and on your PATH
|
||||
- the ACP extra is installed (`pip install -e '.[acp]'`)
|
||||
|
||||
### ACP starts but immediately errors
|
||||
|
||||
Try these checks:
|
||||
|
||||
```bash
|
||||
hermes doctor
|
||||
hermes status
|
||||
hermes acp
|
||||
```
|
||||
|
||||
### Missing credentials
|
||||
|
||||
ACP mode does not have its own login flow. It uses Hermes' existing provider setup. Configure credentials with:
|
||||
|
||||
```bash
|
||||
hermes model
|
||||
```
|
||||
|
||||
or by editing `~/.hermes/.env`.
|
||||
|
||||
## See also
|
||||
|
||||
- [ACP Internals](../../developer-guide/acp-internals.md)
|
||||
- [Provider Runtime Resolution](../../developer-guide/provider-runtime.md)
|
||||
- [Tools Runtime](../../developer-guide/tools-runtime.md)
|
||||
236
hermes_code/website/docs/user-guide/features/api-server.md
Normal file
236
hermes_code/website/docs/user-guide/features/api-server.md
Normal file
|
|
@ -0,0 +1,236 @@
|
|||
---
|
||||
sidebar_position: 14
|
||||
title: "API Server"
|
||||
description: "Expose hermes-agent as an OpenAI-compatible API for any frontend"
|
||||
---
|
||||
|
||||
# API Server
|
||||
|
||||
The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.
|
||||
|
||||
Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. Tool calls execute invisibly server-side.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Enable the API server
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
API_SERVER_ENABLED=true
|
||||
API_SERVER_KEY=change-me-local-dev
|
||||
# Optional: only if a browser must call Hermes directly
|
||||
# API_SERVER_CORS_ORIGINS=http://localhost:3000
|
||||
```
|
||||
|
||||
### 2. Start the gateway
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
You'll see:
|
||||
|
||||
```
|
||||
[API Server] API server listening on http://127.0.0.1:8642
|
||||
```
|
||||
|
||||
### 3. Connect a frontend
|
||||
|
||||
Point any OpenAI-compatible client at `http://localhost:8642/v1`:
|
||||
|
||||
```bash
|
||||
# Test with curl
|
||||
curl http://localhost:8642/v1/chat/completions \
|
||||
-H "Authorization: Bearer change-me-local-dev" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||
```
|
||||
|
||||
Or connect Open WebUI, LobeChat, or any other frontend — see the [Open WebUI integration guide](/docs/user-guide/messaging/open-webui) for step-by-step instructions.
|
||||
|
||||
## Endpoints
|
||||
|
||||
### POST /v1/chat/completions
|
||||
|
||||
Standard OpenAI Chat Completions format. Stateless — the full conversation is included in each request via the `messages` array.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"model": "hermes-agent",
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a Python expert."},
|
||||
{"role": "user", "content": "Write a fibonacci function"}
|
||||
],
|
||||
"stream": false
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "chatcmpl-abc123",
|
||||
"object": "chat.completion",
|
||||
"created": 1710000000,
|
||||
"model": "hermes-agent",
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"message": {"role": "assistant", "content": "Here's a fibonacci function..."},
|
||||
"finish_reason": "stop"
|
||||
}],
|
||||
"usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
|
||||
}
|
||||
```
|
||||
|
||||
**Streaming** (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. When streaming is enabled in config, tokens are emitted live as the LLM generates them. When disabled, the full response is sent as a single SSE chunk.
|
||||
|
||||
### POST /v1/responses
|
||||
|
||||
OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.
|
||||
|
||||
**Request:**
|
||||
```json
|
||||
{
|
||||
"model": "hermes-agent",
|
||||
"input": "What files are in my project?",
|
||||
"instructions": "You are a helpful coding assistant.",
|
||||
"store": true
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"id": "resp_abc123",
|
||||
"object": "response",
|
||||
"status": "completed",
|
||||
"model": "hermes-agent",
|
||||
"output": [
|
||||
{"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
|
||||
{"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
|
||||
{"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
|
||||
],
|
||||
"usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
|
||||
}
|
||||
```
|
||||
|
||||
#### Multi-turn with previous_response_id
|
||||
|
||||
Chain responses to maintain full context (including tool calls) across turns:
|
||||
|
||||
```json
|
||||
{
|
||||
"input": "Now show me the README",
|
||||
"previous_response_id": "resp_abc123"
|
||||
}
|
||||
```
|
||||
|
||||
The server reconstructs the full conversation from the stored response chain — all previous tool calls and results are preserved.
|
||||
|
||||
#### Named conversations
|
||||
|
||||
Use the `conversation` parameter instead of tracking response IDs:
|
||||
|
||||
```json
|
||||
{"input": "Hello", "conversation": "my-project"}
|
||||
{"input": "What's in src/?", "conversation": "my-project"}
|
||||
{"input": "Run the tests", "conversation": "my-project"}
|
||||
```
|
||||
|
||||
The server automatically chains to the latest response in that conversation. Like the `/title` command for gateway sessions.
|
||||
|
||||
### GET /v1/responses/\{id\}
|
||||
|
||||
Retrieve a previously stored response by ID.
|
||||
|
||||
### DELETE /v1/responses/\{id\}
|
||||
|
||||
Delete a stored response.
|
||||
|
||||
### GET /v1/models
|
||||
|
||||
Lists `hermes-agent` as an available model. Required by most frontends for model discovery.
|
||||
|
||||
### GET /health
|
||||
|
||||
Health check. Returns `{"status": "ok"}`.
|
||||
|
||||
## System Prompt Handling
|
||||
|
||||
When a frontend sends a `system` message (Chat Completions) or `instructions` field (Responses API), hermes-agent **layers it on top** of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions.
|
||||
|
||||
This means you can customize behavior per-frontend without losing capabilities:
|
||||
- Open WebUI system prompt: "You are a Python expert. Always include type hints."
|
||||
- The agent still has terminal, file tools, web search, memory, etc.
|
||||
|
||||
## Authentication
|
||||
|
||||
Bearer token auth via the `Authorization` header:
|
||||
|
||||
```
|
||||
Authorization: Bearer ***
|
||||
```
|
||||
|
||||
Configure the key via `API_SERVER_KEY` env var. If you need a browser to call Hermes directly, also set `API_SERVER_CORS_ORIGINS` to an explicit allowlist.
|
||||
|
||||
:::warning Security
|
||||
The API server gives full access to hermes-agent's toolset, **including terminal commands**. If you change the bind address to `0.0.0.0` (network-accessible), **always set `API_SERVER_KEY`** and keep `API_SERVER_CORS_ORIGINS` narrow — without that, remote callers may be able to execute arbitrary commands on your machine.
|
||||
|
||||
The default bind address (`127.0.0.1`) is for local-only use. Browser access is disabled by default; enable it only for explicit trusted origins.
|
||||
:::
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `API_SERVER_ENABLED` | `false` | Enable the API server |
|
||||
| `API_SERVER_PORT` | `8642` | HTTP server port |
|
||||
| `API_SERVER_HOST` | `127.0.0.1` | Bind address (localhost only by default) |
|
||||
| `API_SERVER_KEY` | _(none)_ | Bearer token for auth |
|
||||
| `API_SERVER_CORS_ORIGINS` | _(none)_ | Comma-separated allowed browser origins |
|
||||
|
||||
### config.yaml
|
||||
|
||||
```yaml
|
||||
# Not yet supported — use environment variables.
|
||||
# config.yaml support coming in a future release.
|
||||
```
|
||||
|
||||
## CORS
|
||||
|
||||
The API server does **not** enable browser CORS by default.
|
||||
|
||||
For direct browser access, set an explicit allowlist:
|
||||
|
||||
```bash
|
||||
API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
|
||||
```
|
||||
|
||||
Most documented frontends such as Open WebUI connect server-to-server and do not need CORS at all.
|
||||
|
||||
## Compatible Frontends
|
||||
|
||||
Any frontend that supports the OpenAI API format works. Tested/documented integrations:
|
||||
|
||||
| Frontend | Stars | Connection |
|
||||
|----------|-------|------------|
|
||||
| [Open WebUI](/docs/user-guide/messaging/open-webui) | 126k | Full guide available |
|
||||
| LobeChat | 73k | Custom provider endpoint |
|
||||
| LibreChat | 34k | Custom endpoint in librechat.yaml |
|
||||
| AnythingLLM | 56k | Generic OpenAI provider |
|
||||
| NextChat | 87k | BASE_URL env var |
|
||||
| ChatBox | 39k | API Host setting |
|
||||
| Jan | 26k | Remote model config |
|
||||
| HF Chat-UI | 8k | OPENAI_BASE_URL |
|
||||
| big-AGI | 7k | Custom endpoint |
|
||||
| OpenAI Python SDK | — | `OpenAI(base_url="http://localhost:8642/v1")` |
|
||||
| curl | — | Direct HTTP requests |
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Response storage** — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
|
||||
- **No file upload** — vision/document analysis via uploaded files is not yet supported through the API.
|
||||
- **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.
|
||||
226
hermes_code/website/docs/user-guide/features/batch-processing.md
Normal file
226
hermes_code/website/docs/user-guide/features/batch-processing.md
Normal file
|
|
@ -0,0 +1,226 @@
|
|||
---
|
||||
sidebar_position: 12
|
||||
title: "Batch Processing"
|
||||
description: "Generate agent trajectories at scale — parallel processing, checkpointing, and toolset distributions"
|
||||
---
|
||||
|
||||
# Batch Processing
|
||||
|
||||
Batch processing lets you run the Hermes agent across hundreds or thousands of prompts in parallel, generating structured trajectory data. This is primarily used for **training data generation** — producing ShareGPT-format trajectories with tool usage statistics that can be used for fine-tuning or evaluation.
|
||||
|
||||
## Overview
|
||||
|
||||
The batch runner (`batch_runner.py`) processes a JSONL dataset of prompts, running each through a full agent session with tool access. Each prompt gets its own isolated environment. The output is structured trajectory data with full conversation history, tool call statistics, and reasoning coverage metrics.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Basic batch run
|
||||
python batch_runner.py \
|
||||
--dataset_file=data/prompts.jsonl \
|
||||
--batch_size=10 \
|
||||
--run_name=my_first_run \
|
||||
--model=anthropic/claude-sonnet-4-20250514 \
|
||||
--num_workers=4
|
||||
|
||||
# Resume an interrupted run
|
||||
python batch_runner.py \
|
||||
--dataset_file=data/prompts.jsonl \
|
||||
--batch_size=10 \
|
||||
--run_name=my_first_run \
|
||||
--resume
|
||||
|
||||
# List available toolset distributions
|
||||
python batch_runner.py --list_distributions
|
||||
```
|
||||
|
||||
## Dataset Format
|
||||
|
||||
The input dataset is a JSONL file (one JSON object per line). Each entry must have a `prompt` field:
|
||||
|
||||
```jsonl
|
||||
{"prompt": "Write a Python function that finds the longest palindromic substring"}
|
||||
{"prompt": "Create a REST API endpoint for user authentication using Flask"}
|
||||
{"prompt": "Debug this error: TypeError: cannot unpack non-iterable NoneType object"}
|
||||
```
|
||||
|
||||
Entries can optionally include:
|
||||
- `image` or `docker_image`: A container image to use for this prompt's sandbox (works with Docker, Modal, and Singularity backends)
|
||||
- `cwd`: Working directory override for the task's terminal session
|
||||
|
||||
## Configuration Options
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|-----------|---------|-------------|
|
||||
| `--dataset_file` | (required) | Path to JSONL dataset |
|
||||
| `--batch_size` | (required) | Prompts per batch |
|
||||
| `--run_name` | (required) | Name for this run (used for output dir and checkpointing) |
|
||||
| `--distribution` | `"default"` | Toolset distribution to sample from |
|
||||
| `--model` | `claude-sonnet-4-20250514` | Model to use |
|
||||
| `--base_url` | `https://openrouter.ai/api/v1` | API base URL |
|
||||
| `--api_key` | (env var) | API key for model |
|
||||
| `--max_turns` | `10` | Maximum tool-calling iterations per prompt |
|
||||
| `--num_workers` | `4` | Parallel worker processes |
|
||||
| `--resume` | `false` | Resume from checkpoint |
|
||||
| `--verbose` | `false` | Enable verbose logging |
|
||||
| `--max_samples` | all | Only process first N samples from dataset |
|
||||
| `--max_tokens` | model default | Maximum tokens per model response |
|
||||
|
||||
### Provider Routing (OpenRouter)
|
||||
|
||||
| Parameter | Description |
|
||||
|-----------|-------------|
|
||||
| `--providers_allowed` | Comma-separated providers to allow (e.g., `"anthropic,openai"`) |
|
||||
| `--providers_ignored` | Comma-separated providers to ignore (e.g., `"together,deepinfra"`) |
|
||||
| `--providers_order` | Comma-separated preferred provider order |
|
||||
| `--provider_sort` | Sort by `"price"`, `"throughput"`, or `"latency"` |
|
||||
|
||||
### Reasoning Control
|
||||
|
||||
| Parameter | Description |
|
||||
|-----------|-------------|
|
||||
| `--reasoning_effort` | Effort level: `xhigh`, `high`, `medium`, `low`, `minimal`, `none` |
|
||||
| `--reasoning_disabled` | Completely disable reasoning/thinking tokens |
|
||||
|
||||
### Advanced Options
|
||||
|
||||
| Parameter | Description |
|
||||
|-----------|-------------|
|
||||
| `--ephemeral_system_prompt` | System prompt used during execution but NOT saved to trajectories |
|
||||
| `--log_prefix_chars` | Characters to show in log previews (default: 100) |
|
||||
| `--prefill_messages_file` | Path to JSON file with prefill messages for few-shot priming |
|
||||
|
||||
## Toolset Distributions
|
||||
|
||||
Each prompt gets a randomly sampled set of toolsets from a **distribution**. This ensures training data covers diverse tool combinations. Use `--list_distributions` to see all available distributions.
|
||||
|
||||
In the current implementation, distributions assign a probability to **each individual toolset**. The sampler flips each toolset independently, then guarantees that at least one toolset is enabled. This is different from a hand-authored table of prebuilt combinations.
|
||||
|
||||
## Output Format
|
||||
|
||||
All output goes to `data/<run_name>/`:
|
||||
|
||||
```text
|
||||
data/my_run/
|
||||
├── trajectories.jsonl # Combined final output (all batches merged)
|
||||
├── batch_0.jsonl # Individual batch results
|
||||
├── batch_1.jsonl
|
||||
├── ...
|
||||
├── checkpoint.json # Resume checkpoint
|
||||
└── statistics.json # Aggregate tool usage stats
|
||||
```
|
||||
|
||||
### Trajectory Format
|
||||
|
||||
Each line in `trajectories.jsonl` is a JSON object:
|
||||
|
||||
```json
|
||||
{
|
||||
"prompt_index": 42,
|
||||
"conversations": [
|
||||
{"from": "human", "value": "Write a function..."},
|
||||
{"from": "gpt", "value": "I'll create that function...",
|
||||
"tool_calls": [...]},
|
||||
{"from": "tool", "value": "..."},
|
||||
{"from": "gpt", "value": "Here's the completed function..."}
|
||||
],
|
||||
"metadata": {
|
||||
"batch_num": 2,
|
||||
"timestamp": "2026-01-15T10:30:00",
|
||||
"model": "anthropic/claude-sonnet-4-20250514"
|
||||
},
|
||||
"completed": true,
|
||||
"partial": false,
|
||||
"api_calls": 3,
|
||||
"toolsets_used": ["terminal", "file"],
|
||||
"tool_stats": {
|
||||
"terminal": {"count": 2, "success": 2, "failure": 0},
|
||||
"read_file": {"count": 1, "success": 1, "failure": 0}
|
||||
},
|
||||
"tool_error_counts": {
|
||||
"terminal": 0,
|
||||
"read_file": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `conversations` field uses a ShareGPT-like format with `from` and `value` fields. Tool stats are normalized to include all possible tools with zero defaults, ensuring consistent schema across entries for HuggingFace datasets compatibility.
|
||||
|
||||
## Checkpointing
|
||||
|
||||
The batch runner has robust checkpointing for fault tolerance:
|
||||
|
||||
- **Checkpoint file:** Saved after each batch completes, tracking which prompt indices are done
|
||||
- **Content-based resume:** On `--resume`, the runner scans existing batch files and matches completed prompts by their actual text content (not just indices), enabling recovery even if the dataset order changes
|
||||
- **Failed prompts:** Only successfully completed prompts are marked as done — failed prompts will be retried on resume
|
||||
- **Batch merging:** On completion, all batch files (including from previous runs) are merged into a single `trajectories.jsonl`
|
||||
|
||||
### How Resume Works
|
||||
|
||||
1. Scan all `batch_*.jsonl` files for completed prompts (by content matching)
|
||||
2. Filter the dataset to exclude already-completed prompts
|
||||
3. Re-batch the remaining prompts
|
||||
4. Process only the remaining prompts
|
||||
5. Merge all batch files (old + new) into final output
|
||||
|
||||
## Quality Filtering
|
||||
|
||||
The batch runner applies automatic quality filtering:
|
||||
|
||||
- **No-reasoning filter:** Samples where zero assistant turns contain reasoning (no `<REASONING_SCRATCHPAD>` or native thinking tokens) are discarded
|
||||
- **Corrupted entry filter:** Entries with hallucinated tool names (not in the valid tool list) are filtered out during the final merge
|
||||
- **Reasoning statistics:** Tracks percentage of turns with/without reasoning across the entire run
|
||||
|
||||
## Statistics
|
||||
|
||||
After completion, the runner prints comprehensive statistics:
|
||||
|
||||
- **Tool usage:** Call counts, success/failure rates per tool
|
||||
- **Reasoning coverage:** Percentage of assistant turns with reasoning
|
||||
- **Samples discarded:** Count of samples filtered for lacking reasoning
|
||||
- **Duration:** Total processing time
|
||||
|
||||
Statistics are also saved to `statistics.json` for programmatic analysis.
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Training Data Generation
|
||||
|
||||
Generate diverse tool-use trajectories for fine-tuning:
|
||||
|
||||
```bash
|
||||
python batch_runner.py \
|
||||
--dataset_file=data/coding_prompts.jsonl \
|
||||
--batch_size=20 \
|
||||
--run_name=coding_v1 \
|
||||
--model=anthropic/claude-sonnet-4-20250514 \
|
||||
--num_workers=8 \
|
||||
--distribution=default \
|
||||
--max_turns=15
|
||||
```
|
||||
|
||||
### Model Evaluation
|
||||
|
||||
Evaluate how well a model uses tools across standardized prompts:
|
||||
|
||||
```bash
|
||||
python batch_runner.py \
|
||||
--dataset_file=data/eval_suite.jsonl \
|
||||
--batch_size=10 \
|
||||
--run_name=eval_gpt4 \
|
||||
--model=openai/gpt-4o \
|
||||
--num_workers=4 \
|
||||
--max_turns=10
|
||||
```
|
||||
|
||||
### Per-Prompt Container Images
|
||||
|
||||
For benchmarks requiring specific environments, each prompt can specify its own container image:
|
||||
|
||||
```jsonl
|
||||
{"prompt": "Install numpy and compute eigenvalues of a 3x3 matrix", "image": "python:3.11-slim"}
|
||||
{"prompt": "Compile this Rust program and run it", "image": "rust:1.75"}
|
||||
{"prompt": "Set up a Node.js Express server", "image": "node:20-alpine", "cwd": "/app"}
|
||||
```
|
||||
|
||||
The batch runner verifies Docker images are accessible before running each prompt.
|
||||
281
hermes_code/website/docs/user-guide/features/browser.md
Normal file
281
hermes_code/website/docs/user-guide/features/browser.md
Normal file
|
|
@ -0,0 +1,281 @@
|
|||
---
|
||||
title: Browser Automation
|
||||
description: Control browsers with multiple providers, local Chrome via CDP, or cloud browsers for web interaction, form filling, scraping, and more.
|
||||
sidebar_label: Browser
|
||||
sidebar_position: 5
|
||||
---
|
||||
|
||||
# Browser Automation
|
||||
|
||||
Hermes Agent includes a full browser automation toolset with multiple backend options:
|
||||
|
||||
- **Browserbase cloud mode** via [Browserbase](https://browserbase.com) for managed cloud browsers and anti-bot tooling
|
||||
- **Browser Use cloud mode** via [Browser Use](https://browser-use.com) as an alternative cloud browser provider
|
||||
- **Local Chrome via CDP** — connect browser tools to your own Chrome instance using `/browser connect`
|
||||
- **Local browser mode** via the `agent-browser` CLI and a local Chromium installation
|
||||
|
||||
In all modes, the agent can navigate websites, interact with page elements, fill forms, and extract information.
|
||||
|
||||
## Overview
|
||||
|
||||
Pages are represented as **accessibility trees** (text-based snapshots), making them ideal for LLM agents. Interactive elements get ref IDs (like `@e1`, `@e2`) that the agent uses for clicking and typing.
|
||||
|
||||
Key capabilities:
|
||||
|
||||
- **Multi-provider cloud execution** — Browserbase or Browser Use, no local browser needed
|
||||
- **Local Chrome integration** — attach to your running Chrome via CDP for hands-on browsing
|
||||
- **Built-in stealth** — random fingerprints, CAPTCHA solving, residential proxies (Browserbase)
|
||||
- **Session isolation** — each task gets its own browser session
|
||||
- **Automatic cleanup** — inactive sessions are closed after a timeout
|
||||
- **Vision analysis** — screenshot + AI analysis for visual understanding
|
||||
|
||||
## Setup
|
||||
|
||||
### Browserbase cloud mode
|
||||
|
||||
To use Browserbase-managed cloud browsers, add:
|
||||
|
||||
```bash
|
||||
# Add to ~/.hermes/.env
|
||||
BROWSERBASE_API_KEY=***
|
||||
BROWSERBASE_PROJECT_ID=your-project-id-here
|
||||
```
|
||||
|
||||
Get your credentials at [browserbase.com](https://browserbase.com).
|
||||
|
||||
### Browser Use cloud mode
|
||||
|
||||
To use Browser Use as your cloud browser provider, add:
|
||||
|
||||
```bash
|
||||
# Add to ~/.hermes/.env
|
||||
BROWSER_USE_API_KEY=***
|
||||
```
|
||||
|
||||
Get your API key at [browser-use.com](https://browser-use.com). Browser Use provides a cloud browser via its REST API. If both Browserbase and Browser Use credentials are set, Browserbase takes priority.
|
||||
|
||||
### Local Chrome via CDP (`/browser connect`)
|
||||
|
||||
Instead of a cloud provider, you can attach Hermes browser tools to your own running Chrome instance via the Chrome DevTools Protocol (CDP). This is useful when you want to see what the agent is doing in real-time, interact with pages that require your own cookies/sessions, or avoid cloud browser costs.
|
||||
|
||||
In the CLI, use:
|
||||
|
||||
```
|
||||
/browser connect # Connect to Chrome at ws://localhost:9222
|
||||
/browser connect ws://host:port # Connect to a specific CDP endpoint
|
||||
/browser status # Check current connection
|
||||
/browser disconnect # Detach and return to cloud/local mode
|
||||
```
|
||||
|
||||
If Chrome isn't already running with remote debugging, Hermes will attempt to auto-launch it with `--remote-debugging-port=9222`.
|
||||
|
||||
:::tip
|
||||
To start Chrome manually with CDP enabled:
|
||||
```bash
|
||||
# Linux
|
||||
google-chrome --remote-debugging-port=9222
|
||||
|
||||
# macOS
|
||||
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
|
||||
```
|
||||
:::
|
||||
|
||||
When connected via CDP, all browser tools (`browser_navigate`, `browser_click`, etc.) operate on your live Chrome instance instead of spinning up a cloud session.
|
||||
|
||||
### Local browser mode
|
||||
|
||||
If you do **not** set any cloud credentials and don't use `/browser connect`, Hermes can still use the browser tools through a local Chromium install driven by `agent-browser`.
|
||||
|
||||
### Optional Environment Variables
|
||||
|
||||
```bash
|
||||
# Residential proxies for better CAPTCHA solving (default: "true")
|
||||
BROWSERBASE_PROXIES=true
|
||||
|
||||
# Advanced stealth with custom Chromium — requires Scale Plan (default: "false")
|
||||
BROWSERBASE_ADVANCED_STEALTH=false
|
||||
|
||||
# Session reconnection after disconnects — requires paid plan (default: "true")
|
||||
BROWSERBASE_KEEP_ALIVE=true
|
||||
|
||||
# Custom session timeout in milliseconds (default: project default)
|
||||
# Examples: 600000 (10min), 1800000 (30min)
|
||||
BROWSERBASE_SESSION_TIMEOUT=600000
|
||||
|
||||
# Inactivity timeout before auto-cleanup in seconds (default: 300)
|
||||
BROWSER_INACTIVITY_TIMEOUT=300
|
||||
```
|
||||
|
||||
### Install agent-browser CLI
|
||||
|
||||
```bash
|
||||
npm install -g agent-browser
|
||||
# Or install locally in the repo:
|
||||
npm install
|
||||
```
|
||||
|
||||
:::info
|
||||
The `browser` toolset must be included in your config's `toolsets` list or enabled via `hermes config set toolsets '["hermes-cli", "browser"]'`.
|
||||
:::
|
||||
|
||||
## Available Tools
|
||||
|
||||
### `browser_navigate`
|
||||
|
||||
Navigate to a URL. Must be called before any other browser tool. Initializes the Browserbase session.
|
||||
|
||||
```
|
||||
Navigate to https://github.com/NousResearch
|
||||
```
|
||||
|
||||
:::tip
|
||||
For simple information retrieval, prefer `web_search` or `web_extract` — they are faster and cheaper. Use browser tools when you need to **interact** with a page (click buttons, fill forms, handle dynamic content).
|
||||
:::
|
||||
|
||||
### `browser_snapshot`
|
||||
|
||||
Get a text-based snapshot of the current page's accessibility tree. Returns interactive elements with ref IDs like `@e1`, `@e2` for use with `browser_click` and `browser_type`.
|
||||
|
||||
- **`full=false`** (default): Compact view showing only interactive elements
|
||||
- **`full=true`**: Complete page content
|
||||
|
||||
Snapshots over 8000 characters are automatically summarized by an LLM.
|
||||
|
||||
### `browser_click`
|
||||
|
||||
Click an element identified by its ref ID from the snapshot.
|
||||
|
||||
```
|
||||
Click @e5 to press the "Sign In" button
|
||||
```
|
||||
|
||||
### `browser_type`
|
||||
|
||||
Type text into an input field. Clears the field first, then types the new text.
|
||||
|
||||
```
|
||||
Type "hermes agent" into the search field @e3
|
||||
```
|
||||
|
||||
### `browser_scroll`
|
||||
|
||||
Scroll the page up or down to reveal more content.
|
||||
|
||||
```
|
||||
Scroll down to see more results
|
||||
```
|
||||
|
||||
### `browser_press`
|
||||
|
||||
Press a keyboard key. Useful for submitting forms or navigation.
|
||||
|
||||
```
|
||||
Press Enter to submit the form
|
||||
```
|
||||
|
||||
Supported keys: `Enter`, `Tab`, `Escape`, `ArrowDown`, `ArrowUp`, and more.
|
||||
|
||||
### `browser_back`
|
||||
|
||||
Navigate back to the previous page in browser history.
|
||||
|
||||
### `browser_get_images`
|
||||
|
||||
List all images on the current page with their URLs and alt text. Useful for finding images to analyze.
|
||||
|
||||
### `browser_vision`
|
||||
|
||||
Take a screenshot and analyze it with vision AI. Use this when text snapshots don't capture important visual information — especially useful for CAPTCHAs, complex layouts, or visual verification challenges.
|
||||
|
||||
The screenshot is saved persistently and the file path is returned alongside the AI analysis. On messaging platforms (Telegram, Discord, Slack, WhatsApp), you can ask the agent to share the screenshot — it will be sent as a native photo attachment via the `MEDIA:` mechanism.
|
||||
|
||||
```
|
||||
What does the chart on this page show?
|
||||
```
|
||||
|
||||
Screenshots are stored in `~/.hermes/browser_screenshots/` and automatically cleaned up after 24 hours.
|
||||
|
||||
### `browser_console`
|
||||
|
||||
Get browser console output (log/warn/error messages) and uncaught JavaScript exceptions from the current page. Essential for detecting silent JS errors that don't appear in the accessibility tree.
|
||||
|
||||
```
|
||||
Check the browser console for any JavaScript errors
|
||||
```
|
||||
|
||||
Use `clear=True` to clear the console after reading, so subsequent calls only show new messages.
|
||||
|
||||
### `browser_close`
|
||||
|
||||
Close the browser session and release resources. Call this when done to free up Browserbase session quota.
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Filling Out a Web Form
|
||||
|
||||
```
|
||||
User: Sign up for an account on example.com with my email john@example.com
|
||||
|
||||
Agent workflow:
|
||||
1. browser_navigate("https://example.com/signup")
|
||||
2. browser_snapshot() → sees form fields with refs
|
||||
3. browser_type(ref="@e3", text="john@example.com")
|
||||
4. browser_type(ref="@e5", text="SecurePass123")
|
||||
5. browser_click(ref="@e8") → clicks "Create Account"
|
||||
6. browser_snapshot() → confirms success
|
||||
7. browser_close()
|
||||
```
|
||||
|
||||
### Researching Dynamic Content
|
||||
|
||||
```
|
||||
User: What are the top trending repos on GitHub right now?
|
||||
|
||||
Agent workflow:
|
||||
1. browser_navigate("https://github.com/trending")
|
||||
2. browser_snapshot(full=true) → reads trending repo list
|
||||
3. Returns formatted results
|
||||
4. browser_close()
|
||||
```
|
||||
|
||||
## Session Recording
|
||||
|
||||
Automatically record browser sessions as WebM video files:
|
||||
|
||||
```yaml
|
||||
browser:
|
||||
record_sessions: true # default: false
|
||||
```
|
||||
|
||||
When enabled, recording starts automatically on the first `browser_navigate` and saves to `~/.hermes/browser_recordings/` when the session closes. Works in both local and cloud (Browserbase) modes. Recordings older than 72 hours are automatically cleaned up.
|
||||
|
||||
## Stealth Features
|
||||
|
||||
Browserbase provides automatic stealth capabilities:
|
||||
|
||||
| Feature | Default | Notes |
|
||||
|---------|---------|-------|
|
||||
| Basic Stealth | Always on | Random fingerprints, viewport randomization, CAPTCHA solving |
|
||||
| Residential Proxies | On | Routes through residential IPs for better access |
|
||||
| Advanced Stealth | Off | Custom Chromium build, requires Scale Plan |
|
||||
| Keep Alive | On | Session reconnection after network hiccups |
|
||||
|
||||
:::note
|
||||
If paid features aren't available on your plan, Hermes automatically falls back — first disabling `keepAlive`, then proxies — so browsing still works on free plans.
|
||||
:::
|
||||
|
||||
## Session Management
|
||||
|
||||
- Each task gets an isolated browser session via Browserbase
|
||||
- Sessions are automatically cleaned up after inactivity (default: 5 minutes)
|
||||
- A background thread checks every 30 seconds for stale sessions
|
||||
- Emergency cleanup runs on process exit to prevent orphaned sessions
|
||||
- Sessions are released via the Browserbase API (`REQUEST_RELEASE` status)
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Text-based interaction** — relies on accessibility tree, not pixel coordinates
|
||||
- **Snapshot size** — large pages may be truncated or LLM-summarized at 8000 characters
|
||||
- **Session timeout** — cloud sessions expire based on your provider's plan settings
|
||||
- **Cost** — cloud sessions consume provider credits; use `browser_close` when done. Use `/browser connect` for free local browsing.
|
||||
- **No file downloads** — cannot download files from the browser
|
||||
30
hermes_code/website/docs/user-guide/features/checkpoints.md
Normal file
30
hermes_code/website/docs/user-guide/features/checkpoints.md
Normal file
|
|
@ -0,0 +1,30 @@
|
|||
# Filesystem Checkpoints
|
||||
|
||||
Hermes automatically snapshots your working directory before making file changes, giving you a safety net to roll back if something goes wrong. Checkpoints are **enabled by default**.
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/rollback` | List all checkpoints with change stats |
|
||||
| `/rollback <N>` | Restore to checkpoint N (also undoes last chat turn) |
|
||||
| `/rollback diff <N>` | Preview diff between checkpoint N and current state |
|
||||
| `/rollback <N> <file>` | Restore a single file from checkpoint N |
|
||||
|
||||
## What Triggers Checkpoints
|
||||
|
||||
- **File tools** — `write_file` and `patch`
|
||||
- **Destructive terminal commands** — `rm`, `mv`, `sed -i`, output redirects (`>`), `git reset`/`clean`
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/config.yaml
|
||||
checkpoints:
|
||||
enabled: true # default: true
|
||||
max_snapshots: 50 # max checkpoints per directory
|
||||
```
|
||||
|
||||
## Learn More
|
||||
|
||||
For the full guide — how shadow repos work, diff previews, file-level restore, conversation undo, safety guards, and best practices — see **[Checkpoints and /rollback](../checkpoints-and-rollback.md)**.
|
||||
210
hermes_code/website/docs/user-guide/features/code-execution.md
Normal file
210
hermes_code/website/docs/user-guide/features/code-execution.md
Normal file
|
|
@ -0,0 +1,210 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "Code Execution"
|
||||
description: "Sandboxed Python execution with RPC tool access — collapse multi-step workflows into a single turn"
|
||||
---
|
||||
|
||||
# Code Execution (Programmatic Tool Calling)
|
||||
|
||||
The `execute_code` tool lets the agent write Python scripts that call Hermes tools programmatically, collapsing multi-step workflows into a single LLM turn. The script runs in a sandboxed child process on the agent host, communicating via Unix domain socket RPC.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. The agent writes a Python script using `from hermes_tools import ...`
|
||||
2. Hermes generates a `hermes_tools.py` stub module with RPC functions
|
||||
3. Hermes opens a Unix domain socket and starts an RPC listener thread
|
||||
4. The script runs in a child process — tool calls travel over the socket back to Hermes
|
||||
5. Only the script's `print()` output is returned to the LLM; intermediate tool results never enter the context window
|
||||
|
||||
```python
|
||||
# The agent can write scripts like:
|
||||
from hermes_tools import web_search, web_extract
|
||||
|
||||
results = web_search("Python 3.13 features", limit=5)
|
||||
for r in results["data"]["web"]:
|
||||
content = web_extract([r["url"]])
|
||||
# ... filter and process ...
|
||||
print(summary)
|
||||
```
|
||||
|
||||
**Available tools in sandbox:** `web_search`, `web_extract`, `read_file`, `write_file`, `search_files`, `patch`, `terminal` (foreground only).
|
||||
|
||||
## When the Agent Uses This
|
||||
|
||||
The agent uses `execute_code` when there are:
|
||||
|
||||
- **3+ tool calls** with processing logic between them
|
||||
- Bulk data filtering or conditional branching
|
||||
- Loops over results
|
||||
|
||||
The key benefit: intermediate tool results never enter the context window — only the final `print()` output comes back, dramatically reducing token usage.
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Data Processing Pipeline
|
||||
|
||||
```python
|
||||
from hermes_tools import search_files, read_file
|
||||
import json
|
||||
|
||||
# Find all config files and extract database settings
|
||||
matches = search_files("database", path=".", file_glob="*.yaml", limit=20)
|
||||
configs = []
|
||||
for match in matches.get("matches", []):
|
||||
content = read_file(match["path"])
|
||||
configs.append({"file": match["path"], "preview": content["content"][:200]})
|
||||
|
||||
print(json.dumps(configs, indent=2))
|
||||
```
|
||||
|
||||
### Multi-Step Web Research
|
||||
|
||||
```python
|
||||
from hermes_tools import web_search, web_extract
|
||||
import json
|
||||
|
||||
# Search, extract, and summarize in one turn
|
||||
results = web_search("Rust async runtime comparison 2025", limit=5)
|
||||
summaries = []
|
||||
for r in results["data"]["web"]:
|
||||
page = web_extract([r["url"]])
|
||||
for p in page.get("results", []):
|
||||
if p.get("content"):
|
||||
summaries.append({
|
||||
"title": r["title"],
|
||||
"url": r["url"],
|
||||
"excerpt": p["content"][:500]
|
||||
})
|
||||
|
||||
print(json.dumps(summaries, indent=2))
|
||||
```
|
||||
|
||||
### Bulk File Refactoring
|
||||
|
||||
```python
|
||||
from hermes_tools import search_files, read_file, patch
|
||||
|
||||
# Find all Python files using deprecated API and fix them
|
||||
matches = search_files("old_api_call", path="src/", file_glob="*.py")
|
||||
fixed = 0
|
||||
for match in matches.get("matches", []):
|
||||
result = patch(
|
||||
path=match["path"],
|
||||
old_string="old_api_call(",
|
||||
new_string="new_api_call(",
|
||||
replace_all=True
|
||||
)
|
||||
if "error" not in str(result):
|
||||
fixed += 1
|
||||
|
||||
print(f"Fixed {fixed} files out of {len(matches.get('matches', []))} matches")
|
||||
```
|
||||
|
||||
### Build and Test Pipeline
|
||||
|
||||
```python
|
||||
from hermes_tools import terminal, read_file
|
||||
import json
|
||||
|
||||
# Run tests, parse results, and report
|
||||
result = terminal("cd /project && python -m pytest --tb=short -q 2>&1", timeout=120)
|
||||
output = result.get("output", "")
|
||||
|
||||
# Parse test output
|
||||
passed = output.count(" passed")
|
||||
failed = output.count(" failed")
|
||||
errors = output.count(" error")
|
||||
|
||||
report = {
|
||||
"passed": passed,
|
||||
"failed": failed,
|
||||
"errors": errors,
|
||||
"exit_code": result.get("exit_code", -1),
|
||||
"summary": output[-500:] if len(output) > 500 else output
|
||||
}
|
||||
|
||||
print(json.dumps(report, indent=2))
|
||||
```
|
||||
|
||||
## Resource Limits
|
||||
|
||||
| Resource | Limit | Notes |
|
||||
|----------|-------|-------|
|
||||
| **Timeout** | 5 minutes (300s) | Script is killed with SIGTERM, then SIGKILL after 5s grace |
|
||||
| **Stdout** | 50 KB | Output truncated with `[output truncated at 50KB]` notice |
|
||||
| **Stderr** | 10 KB | Included in output on non-zero exit for debugging |
|
||||
| **Tool calls** | 50 per execution | Error returned when limit reached |
|
||||
|
||||
All limits are configurable via `config.yaml`:
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
code_execution:
|
||||
timeout: 300 # Max seconds per script (default: 300)
|
||||
max_tool_calls: 50 # Max tool calls per execution (default: 50)
|
||||
```
|
||||
|
||||
## How Tool Calls Work Inside Scripts
|
||||
|
||||
When your script calls a function like `web_search("query")`:
|
||||
|
||||
1. The call is serialized to JSON and sent over a Unix domain socket to the parent process
|
||||
2. The parent dispatches through the standard `handle_function_call` handler
|
||||
3. The result is sent back over the socket
|
||||
4. The function returns the parsed result
|
||||
|
||||
This means tool calls inside scripts behave identically to normal tool calls — same rate limits, same error handling, same capabilities. The only restriction is that `terminal()` is foreground-only (no `background`, `pty`, or `check_interval` parameters).
|
||||
|
||||
## Error Handling
|
||||
|
||||
When a script fails, the agent receives structured error information:
|
||||
|
||||
- **Non-zero exit code**: stderr is included in the output so the agent sees the full traceback
|
||||
- **Timeout**: Script is killed and the agent sees `"Script timed out after 300s and was killed."`
|
||||
- **Interruption**: If the user sends a new message during execution, the script is terminated and the agent sees `[execution interrupted — user sent a new message]`
|
||||
- **Tool call limit**: When the 50-call limit is hit, subsequent tool calls return an error message
|
||||
|
||||
The response always includes `status` (success/error/timeout/interrupted), `output`, `tool_calls_made`, and `duration_seconds`.
|
||||
|
||||
## Security
|
||||
|
||||
:::danger Security Model
|
||||
The child process runs with a **minimal environment**. API keys, tokens, and credentials are stripped by default. The script accesses tools exclusively via the RPC channel — it cannot read secrets from environment variables unless explicitly allowed.
|
||||
:::
|
||||
|
||||
Environment variables containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTIAL`, `PASSWD`, or `AUTH` in their names are excluded. Only safe system variables (`PATH`, `HOME`, `LANG`, `SHELL`, `PYTHONPATH`, `VIRTUAL_ENV`, etc.) are passed through.
|
||||
|
||||
### Skill Environment Variable Passthrough
|
||||
|
||||
When a skill declares `required_environment_variables` in its frontmatter, those variables are **automatically passed through** to both `execute_code` and `terminal` sandboxes after the skill is loaded. This lets skills use their declared API keys without weakening the security posture for arbitrary code.
|
||||
|
||||
For non-skill use cases, you can explicitly allowlist variables in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
env_passthrough:
|
||||
- MY_CUSTOM_KEY
|
||||
- ANOTHER_TOKEN
|
||||
```
|
||||
|
||||
See the [Security guide](/docs/user-guide/security#environment-variable-passthrough) for full details.
|
||||
|
||||
The script runs in a temporary directory that is cleaned up after execution. The child process runs in its own process group so it can be cleanly killed on timeout or interruption.
|
||||
|
||||
## execute_code vs terminal
|
||||
|
||||
| Use Case | execute_code | terminal |
|
||||
|----------|-------------|----------|
|
||||
| Multi-step workflows with tool calls between | ✅ | ❌ |
|
||||
| Simple shell command | ❌ | ✅ |
|
||||
| Filtering/processing large tool outputs | ✅ | ❌ |
|
||||
| Running a build or test suite | ❌ | ✅ |
|
||||
| Looping over search results | ✅ | ❌ |
|
||||
| Interactive/background processes | ❌ | ✅ |
|
||||
| Needs API keys in environment | ⚠️ Only via [passthrough](/docs/user-guide/security#environment-variable-passthrough) | ✅ (most pass through) |
|
||||
|
||||
**Rule of thumb:** Use `execute_code` when you need to call Hermes tools programmatically with logic between calls. Use `terminal` for running shell commands, builds, and processes.
|
||||
|
||||
## Platform Support
|
||||
|
||||
Code execution requires Unix domain sockets and is available on **Linux and macOS only**. It is automatically disabled on Windows — the agent falls back to regular sequential tool calls.
|
||||
201
hermes_code/website/docs/user-guide/features/context-files.md
Normal file
201
hermes_code/website/docs/user-guide/features/context-files.md
Normal file
|
|
@ -0,0 +1,201 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "Context Files"
|
||||
description: "Project context files — .hermes.md, AGENTS.md, CLAUDE.md, global SOUL.md, and .cursorrules — automatically injected into every conversation"
|
||||
---
|
||||
|
||||
# Context Files
|
||||
|
||||
Hermes Agent automatically discovers and loads context files that shape how it behaves. Some are project-local and discovered from your working directory. `SOUL.md` is now global to the Hermes instance and is loaded from `HERMES_HOME` only.
|
||||
|
||||
## Supported Context Files
|
||||
|
||||
| File | Purpose | Discovery |
|
||||
|------|---------|-----------|
|
||||
| **.hermes.md** / **HERMES.md** | Project instructions (highest priority) | Walks to git root |
|
||||
| **AGENTS.md** | Project instructions, conventions, architecture | Recursive (walks subdirectories) |
|
||||
| **CLAUDE.md** | Claude Code context files (also detected) | CWD only |
|
||||
| **SOUL.md** | Global personality and tone customization for this Hermes instance | `HERMES_HOME/SOUL.md` only |
|
||||
| **.cursorrules** | Cursor IDE coding conventions | CWD only |
|
||||
| **.cursor/rules/*.mdc** | Cursor IDE rule modules | CWD only |
|
||||
|
||||
:::info Priority system
|
||||
Only **one** project context type is loaded per session (first match wins): `.hermes.md` → `AGENTS.md` → `CLAUDE.md` → `.cursorrules`. **SOUL.md** is always loaded independently as the agent identity (slot #1).
|
||||
:::
|
||||
|
||||
## AGENTS.md
|
||||
|
||||
`AGENTS.md` is the primary project context file. It tells the agent how your project is structured, what conventions to follow, and any special instructions.
|
||||
|
||||
### Hierarchical Discovery
|
||||
|
||||
Hermes walks the directory tree starting from the working directory and loads **all** `AGENTS.md` files found, sorted by depth. This supports monorepo-style setups:
|
||||
|
||||
```
|
||||
my-project/
|
||||
├── AGENTS.md ← Top-level project context
|
||||
├── frontend/
|
||||
│ └── AGENTS.md ← Frontend-specific instructions
|
||||
├── backend/
|
||||
│ └── AGENTS.md ← Backend-specific instructions
|
||||
└── shared/
|
||||
└── AGENTS.md ← Shared library conventions
|
||||
```
|
||||
|
||||
All four files are concatenated into a single context block with relative path headers.
|
||||
|
||||
:::info
|
||||
Directories that are skipped during the walk: `.`-prefixed dirs, `node_modules`, `__pycache__`, `venv`, `.venv`.
|
||||
:::
|
||||
|
||||
### Example AGENTS.md
|
||||
|
||||
```markdown
|
||||
# Project Context
|
||||
|
||||
This is a Next.js 14 web application with a Python FastAPI backend.
|
||||
|
||||
## Architecture
|
||||
- Frontend: Next.js 14 with App Router in `/frontend`
|
||||
- Backend: FastAPI in `/backend`, uses SQLAlchemy ORM
|
||||
- Database: PostgreSQL 16
|
||||
- Deployment: Docker Compose on a Hetzner VPS
|
||||
|
||||
## Conventions
|
||||
- Use TypeScript strict mode for all frontend code
|
||||
- Python code follows PEP 8, use type hints everywhere
|
||||
- All API endpoints return JSON with `{data, error, meta}` shape
|
||||
- Tests go in `__tests__/` directories (frontend) or `tests/` (backend)
|
||||
|
||||
## Important Notes
|
||||
- Never modify migration files directly — use Alembic commands
|
||||
- The `.env.local` file has real API keys, don't commit it
|
||||
- Frontend port is 3000, backend is 8000, DB is 5432
|
||||
```
|
||||
|
||||
## SOUL.md
|
||||
|
||||
`SOUL.md` controls the agent's personality, tone, and communication style. See the [Personality](/docs/user-guide/features/personality) page for full details.
|
||||
|
||||
**Location:**
|
||||
|
||||
- `~/.hermes/SOUL.md`
|
||||
- or `$HERMES_HOME/SOUL.md` if you run Hermes with a custom home directory
|
||||
|
||||
Important details:
|
||||
|
||||
- Hermes seeds a default `SOUL.md` automatically if one does not exist yet
|
||||
- Hermes loads `SOUL.md` only from `HERMES_HOME`
|
||||
- Hermes does not probe the working directory for `SOUL.md`
|
||||
- If the file is empty, nothing from `SOUL.md` is added to the prompt
|
||||
- If the file has content, the content is injected verbatim after scanning and truncation
|
||||
|
||||
## .cursorrules
|
||||
|
||||
Hermes is compatible with Cursor IDE's `.cursorrules` file and `.cursor/rules/*.mdc` rule modules. If these files exist in your project root and no higher-priority context file (`.hermes.md`, `AGENTS.md`, or `CLAUDE.md`) is found, they're loaded as the project context.
|
||||
|
||||
This means your existing Cursor conventions automatically apply when using Hermes.
|
||||
|
||||
## How Context Files Are Loaded
|
||||
|
||||
Context files are loaded by `build_context_files_prompt()` in `agent/prompt_builder.py`:
|
||||
|
||||
1. **At session start** — the function scans the working directory
|
||||
2. **Content is read** — each file is read as UTF-8 text
|
||||
3. **Security scan** — content is checked for prompt injection patterns
|
||||
4. **Truncation** — files exceeding 20,000 characters are head/tail truncated (70% head, 20% tail, with a marker in the middle)
|
||||
5. **Assembly** — all sections are combined under a `# Project Context` header
|
||||
6. **Injection** — the assembled content is added to the system prompt
|
||||
|
||||
The final prompt section looks roughly like:
|
||||
|
||||
```text
|
||||
# Project Context
|
||||
|
||||
The following project context files have been loaded and should be followed:
|
||||
|
||||
## AGENTS.md
|
||||
|
||||
[Your AGENTS.md content here]
|
||||
|
||||
## .cursorrules
|
||||
|
||||
[Your .cursorrules content here]
|
||||
|
||||
[Your SOUL.md content here]
|
||||
```
|
||||
|
||||
Notice that SOUL content is inserted directly, without extra wrapper text.
|
||||
|
||||
## Security: Prompt Injection Protection
|
||||
|
||||
All context files are scanned for potential prompt injection before being included. The scanner checks for:
|
||||
|
||||
- **Instruction override attempts**: "ignore previous instructions", "disregard your rules"
|
||||
- **Deception patterns**: "do not tell the user"
|
||||
- **System prompt overrides**: "system prompt override"
|
||||
- **Hidden HTML comments**: `<!-- ignore instructions -->`
|
||||
- **Hidden div elements**: `<div style="display:none">`
|
||||
- **Credential exfiltration**: `curl ... $API_KEY`
|
||||
- **Secret file access**: `cat .env`, `cat credentials`
|
||||
- **Invisible characters**: zero-width spaces, bidirectional overrides, word joiners
|
||||
|
||||
If any threat pattern is detected, the file is blocked:
|
||||
|
||||
```
|
||||
[BLOCKED: AGENTS.md contained potential prompt injection (prompt_injection). Content not loaded.]
|
||||
```
|
||||
|
||||
:::warning
|
||||
This scanner protects against common injection patterns, but it's not a substitute for reviewing context files in shared repositories. Always validate AGENTS.md content in projects you didn't author.
|
||||
:::
|
||||
|
||||
## Size Limits
|
||||
|
||||
| Limit | Value |
|
||||
|-------|-------|
|
||||
| Max chars per file | 20,000 (~7,000 tokens) |
|
||||
| Head truncation ratio | 70% |
|
||||
| Tail truncation ratio | 20% |
|
||||
| Truncation marker | 10% (shows char counts and suggests using file tools) |
|
||||
|
||||
When a file exceeds 20,000 characters, the truncation message reads:
|
||||
|
||||
```
|
||||
[...truncated AGENTS.md: kept 14000+4000 of 25000 chars. Use file tools to read the full file.]
|
||||
```
|
||||
|
||||
## Tips for Effective Context Files
|
||||
|
||||
:::tip Best practices for AGENTS.md
|
||||
1. **Keep it concise** — stay well under 20K chars; the agent reads it every turn
|
||||
2. **Structure with headers** — use `##` sections for architecture, conventions, important notes
|
||||
3. **Include concrete examples** — show preferred code patterns, API shapes, naming conventions
|
||||
4. **Mention what NOT to do** — "never modify migration files directly"
|
||||
5. **List key paths and ports** — the agent uses these for terminal commands
|
||||
6. **Update as the project evolves** — stale context is worse than no context
|
||||
:::
|
||||
|
||||
### Per-Subdirectory Context
|
||||
|
||||
For monorepos, put subdirectory-specific instructions in nested AGENTS.md files:
|
||||
|
||||
```markdown
|
||||
<!-- frontend/AGENTS.md -->
|
||||
# Frontend Context
|
||||
|
||||
- Use `pnpm` not `npm` for package management
|
||||
- Components go in `src/components/`, pages in `src/app/`
|
||||
- Use Tailwind CSS, never inline styles
|
||||
- Run tests with `pnpm test`
|
||||
```
|
||||
|
||||
```markdown
|
||||
<!-- backend/AGENTS.md -->
|
||||
# Backend Context
|
||||
|
||||
- Use `poetry` for dependency management
|
||||
- Run the dev server with `poetry run uvicorn main:app --reload`
|
||||
- All endpoints need OpenAPI docstrings
|
||||
- Database models are in `models/`, schemas in `schemas/`
|
||||
```
|
||||
|
|
@ -0,0 +1,109 @@
|
|||
---
|
||||
sidebar_position: 9
|
||||
title: "Context References"
|
||||
description: "Inline @-syntax for attaching files, folders, git diffs, and URLs directly into your messages"
|
||||
---
|
||||
|
||||
# Context References
|
||||
|
||||
Type `@` followed by a reference to inject content directly into your message. Hermes expands the reference inline and appends the content under an `--- Attached Context ---` section.
|
||||
|
||||
## Supported References
|
||||
|
||||
| Syntax | Description |
|
||||
|--------|-------------|
|
||||
| `@file:path/to/file.py` | Inject file contents |
|
||||
| `@file:path/to/file.py:10-25` | Inject specific line range (1-indexed, inclusive) |
|
||||
| `@folder:path/to/dir` | Inject directory tree listing with file metadata |
|
||||
| `@diff` | Inject `git diff` (unstaged working tree changes) |
|
||||
| `@staged` | Inject `git diff --staged` (staged changes) |
|
||||
| `@git:5` | Inject last N commits with patches (max 10) |
|
||||
| `@url:https://example.com` | Fetch and inject web page content |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```text
|
||||
Review @file:src/main.py and suggest improvements
|
||||
|
||||
What changed? @diff
|
||||
|
||||
Compare @file:old_config.yaml and @file:new_config.yaml
|
||||
|
||||
What's in @folder:src/components?
|
||||
|
||||
Summarize this article @url:https://arxiv.org/abs/2301.00001
|
||||
```
|
||||
|
||||
Multiple references work in a single message:
|
||||
|
||||
```text
|
||||
Check @file:main.py, and also @file:test.py.
|
||||
```
|
||||
|
||||
Trailing punctuation (`,`, `.`, `;`, `!`, `?`) is automatically stripped from reference values.
|
||||
|
||||
## CLI Tab Completion
|
||||
|
||||
In the interactive CLI, typing `@` triggers autocomplete:
|
||||
|
||||
- `@` shows all reference types (`@diff`, `@staged`, `@file:`, `@folder:`, `@git:`, `@url:`)
|
||||
- `@file:` and `@folder:` trigger filesystem path completion with file size metadata
|
||||
- Bare `@` followed by partial text shows matching files and folders from the current directory
|
||||
|
||||
## Line Ranges
|
||||
|
||||
The `@file:` reference supports line ranges for precise content injection:
|
||||
|
||||
```text
|
||||
@file:src/main.py:42 # Single line 42
|
||||
@file:src/main.py:10-25 # Lines 10 through 25 (inclusive)
|
||||
```
|
||||
|
||||
Lines are 1-indexed. Invalid ranges are silently ignored (full file is returned).
|
||||
|
||||
## Size Limits
|
||||
|
||||
Context references are bounded to prevent overwhelming the model's context window:
|
||||
|
||||
| Threshold | Value | Behavior |
|
||||
|-----------|-------|----------|
|
||||
| Soft limit | 25% of context length | Warning appended, expansion proceeds |
|
||||
| Hard limit | 50% of context length | Expansion refused, original message returned unchanged |
|
||||
| Folder entries | 200 files max | Excess entries replaced with `- ...` |
|
||||
| Git commits | 10 max | `@git:N` clamped to range [1, 10] |
|
||||
|
||||
## Security
|
||||
|
||||
### Sensitive Path Blocking
|
||||
|
||||
These paths are always blocked from `@file:` references to prevent credential exposure:
|
||||
|
||||
- SSH keys and config: `~/.ssh/id_rsa`, `~/.ssh/id_ed25519`, `~/.ssh/authorized_keys`, `~/.ssh/config`
|
||||
- Shell profiles: `~/.bashrc`, `~/.zshrc`, `~/.profile`, `~/.bash_profile`, `~/.zprofile`
|
||||
- Credential files: `~/.netrc`, `~/.pgpass`, `~/.npmrc`, `~/.pypirc`
|
||||
- Hermes env: `$HERMES_HOME/.env`
|
||||
|
||||
These directories are fully blocked (any file inside):
|
||||
- `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `~/.kube/`, `$HERMES_HOME/skills/.hub/`
|
||||
|
||||
### Path Traversal Protection
|
||||
|
||||
All paths are resolved relative to the working directory. References that resolve outside the allowed workspace root are rejected.
|
||||
|
||||
### Binary File Detection
|
||||
|
||||
Binary files are detected via MIME type and null-byte scanning. Known text extensions (`.py`, `.md`, `.json`, `.yaml`, `.toml`, `.js`, `.ts`, etc.) bypass MIME-based detection. Binary files are rejected with a warning.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Invalid references produce inline warnings rather than failures:
|
||||
|
||||
| Condition | Behavior |
|
||||
|-----------|----------|
|
||||
| File not found | Warning: "file not found" |
|
||||
| Binary file | Warning: "binary files are not supported" |
|
||||
| Folder not found | Warning: "folder not found" |
|
||||
| Git command fails | Warning with git stderr |
|
||||
| URL returns no content | Warning: "no content extracted" |
|
||||
| Sensitive path | Warning: "path is a sensitive credential file" |
|
||||
| Path outside workspace | Warning: "path is outside the allowed workspace" |
|
||||
285
hermes_code/website/docs/user-guide/features/cron.md
Normal file
285
hermes_code/website/docs/user-guide/features/cron.md
Normal file
|
|
@ -0,0 +1,285 @@
|
|||
---
|
||||
sidebar_position: 5
|
||||
title: "Scheduled Tasks (Cron)"
|
||||
description: "Schedule automated tasks with natural language, manage them with one cron tool, and attach one or more skills"
|
||||
---
|
||||
|
||||
# Scheduled Tasks (Cron)
|
||||
|
||||
Schedule tasks to run automatically with natural language or cron expressions. Hermes exposes cron management through a single `cronjob` tool with action-style operations instead of separate schedule/list/remove tools.
|
||||
|
||||
## What cron can do now
|
||||
|
||||
Cron jobs can:
|
||||
|
||||
- schedule one-shot or recurring tasks
|
||||
- pause, resume, edit, trigger, and remove jobs
|
||||
- attach zero, one, or multiple skills to a job
|
||||
- deliver results back to the origin chat, local files, or configured platform targets
|
||||
- run in fresh agent sessions with the normal static tool list
|
||||
|
||||
:::warning
|
||||
Cron-run sessions cannot recursively create more cron jobs. Hermes disables cron management tools inside cron executions to prevent runaway scheduling loops.
|
||||
:::
|
||||
|
||||
## Creating scheduled tasks
|
||||
|
||||
### In chat with `/cron`
|
||||
|
||||
```bash
|
||||
/cron add 30m "Remind me to check the build"
|
||||
/cron add "every 2h" "Check server status"
|
||||
/cron add "every 1h" "Summarize new feed items" --skill blogwatcher
|
||||
/cron add "every 1h" "Use both skills and combine the result" --skill blogwatcher --skill find-nearby
|
||||
```
|
||||
|
||||
### From the standalone CLI
|
||||
|
||||
```bash
|
||||
hermes cron create "every 2h" "Check server status"
|
||||
hermes cron create "every 1h" "Summarize new feed items" --skill blogwatcher
|
||||
hermes cron create "every 1h" "Use both skills and combine the result" \
|
||||
--skill blogwatcher \
|
||||
--skill find-nearby \
|
||||
--name "Skill combo"
|
||||
```
|
||||
|
||||
### Through natural conversation
|
||||
|
||||
Ask Hermes normally:
|
||||
|
||||
```text
|
||||
Every morning at 9am, check Hacker News for AI news and send me a summary on Telegram.
|
||||
```
|
||||
|
||||
Hermes will use the unified `cronjob` tool internally.
|
||||
|
||||
## Skill-backed cron jobs
|
||||
|
||||
A cron job can load one or more skills before it runs the prompt.
|
||||
|
||||
### Single skill
|
||||
|
||||
```python
|
||||
cronjob(
|
||||
action="create",
|
||||
skill="blogwatcher",
|
||||
prompt="Check the configured feeds and summarize anything new.",
|
||||
schedule="0 9 * * *",
|
||||
name="Morning feeds",
|
||||
)
|
||||
```
|
||||
|
||||
### Multiple skills
|
||||
|
||||
Skills are loaded in order. The prompt becomes the task instruction layered on top of those skills.
|
||||
|
||||
```python
|
||||
cronjob(
|
||||
action="create",
|
||||
skills=["blogwatcher", "find-nearby"],
|
||||
prompt="Look for new local events and interesting nearby places, then combine them into one short brief.",
|
||||
schedule="every 6h",
|
||||
name="Local brief",
|
||||
)
|
||||
```
|
||||
|
||||
This is useful when you want a scheduled agent to inherit reusable workflows without stuffing the full skill text into the cron prompt itself.
|
||||
|
||||
## Editing jobs
|
||||
|
||||
You do not need to delete and recreate jobs just to change them.
|
||||
|
||||
### Chat
|
||||
|
||||
```bash
|
||||
/cron edit <job_id> --schedule "every 4h"
|
||||
/cron edit <job_id> --prompt "Use the revised task"
|
||||
/cron edit <job_id> --skill blogwatcher --skill find-nearby
|
||||
/cron edit <job_id> --remove-skill blogwatcher
|
||||
/cron edit <job_id> --clear-skills
|
||||
```
|
||||
|
||||
### Standalone CLI
|
||||
|
||||
```bash
|
||||
hermes cron edit <job_id> --schedule "every 4h"
|
||||
hermes cron edit <job_id> --prompt "Use the revised task"
|
||||
hermes cron edit <job_id> --skill blogwatcher --skill find-nearby
|
||||
hermes cron edit <job_id> --add-skill find-nearby
|
||||
hermes cron edit <job_id> --remove-skill blogwatcher
|
||||
hermes cron edit <job_id> --clear-skills
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- repeated `--skill` replaces the job's attached skill list
|
||||
- `--add-skill` appends to the existing list without replacing it
|
||||
- `--remove-skill` removes specific attached skills
|
||||
- `--clear-skills` removes all attached skills
|
||||
|
||||
## Lifecycle actions
|
||||
|
||||
Cron jobs now have a fuller lifecycle than just create/remove.
|
||||
|
||||
### Chat
|
||||
|
||||
```bash
|
||||
/cron list
|
||||
/cron pause <job_id>
|
||||
/cron resume <job_id>
|
||||
/cron run <job_id>
|
||||
/cron remove <job_id>
|
||||
```
|
||||
|
||||
### Standalone CLI
|
||||
|
||||
```bash
|
||||
hermes cron list
|
||||
hermes cron pause <job_id>
|
||||
hermes cron resume <job_id>
|
||||
hermes cron run <job_id>
|
||||
hermes cron remove <job_id>
|
||||
hermes cron status
|
||||
hermes cron tick
|
||||
```
|
||||
|
||||
What they do:
|
||||
|
||||
- `pause` — keep the job but stop scheduling it
|
||||
- `resume` — re-enable the job and compute the next future run
|
||||
- `run` — trigger the job on the next scheduler tick
|
||||
- `remove` — delete it entirely
|
||||
|
||||
## How it works
|
||||
|
||||
**Cron execution is handled by the gateway daemon.** The gateway ticks the scheduler every 60 seconds, running any due jobs in isolated agent sessions.
|
||||
|
||||
```bash
|
||||
hermes gateway install # Install as a user service
|
||||
sudo hermes gateway install --system # Linux: boot-time system service for servers
|
||||
hermes gateway # Or run in foreground
|
||||
|
||||
hermes cron list
|
||||
hermes cron status
|
||||
```
|
||||
|
||||
### Gateway scheduler behavior
|
||||
|
||||
On each tick Hermes:
|
||||
|
||||
1. loads jobs from `~/.hermes/cron/jobs.json`
|
||||
2. checks `next_run_at` against the current time
|
||||
3. starts a fresh `AIAgent` session for each due job
|
||||
4. optionally injects one or more attached skills into that fresh session
|
||||
5. runs the prompt to completion
|
||||
6. delivers the final response
|
||||
7. updates run metadata and the next scheduled time
|
||||
|
||||
A file lock at `~/.hermes/cron/.tick.lock` prevents overlapping scheduler ticks from double-running the same job batch.
|
||||
|
||||
## Delivery options
|
||||
|
||||
When scheduling jobs, you specify where the output goes:
|
||||
|
||||
| Option | Description | Example |
|
||||
|--------|-------------|---------|
|
||||
| `"origin"` | Back to where the job was created | Default on messaging platforms |
|
||||
| `"local"` | Save to local files only (`~/.hermes/cron/output/`) | Default on CLI |
|
||||
| `"telegram"` | Telegram home channel | Uses `TELEGRAM_HOME_CHANNEL` |
|
||||
| `"discord"` | Discord home channel | Uses `DISCORD_HOME_CHANNEL` |
|
||||
| `"telegram:123456"` | Specific Telegram chat by ID | Direct delivery |
|
||||
| `"discord:987654"` | Specific Discord channel by ID | Direct delivery |
|
||||
|
||||
The agent's final response is automatically delivered. You do not need to call `send_message` in the cron prompt.
|
||||
|
||||
## Schedule formats
|
||||
|
||||
The agent's final response is automatically delivered — you do **not** need to include `send_message` in the cron prompt for that same destination. If a cron run calls `send_message` to the exact target the scheduler will already deliver to, Hermes skips that duplicate send and tells the model to put the user-facing content in the final response instead. Use `send_message` only for additional or different targets.
|
||||
|
||||
### Relative delays (one-shot)
|
||||
|
||||
```text
|
||||
30m → Run once in 30 minutes
|
||||
2h → Run once in 2 hours
|
||||
1d → Run once in 1 day
|
||||
```
|
||||
|
||||
### Intervals (recurring)
|
||||
|
||||
```text
|
||||
every 30m → Every 30 minutes
|
||||
every 2h → Every 2 hours
|
||||
every 1d → Every day
|
||||
```
|
||||
|
||||
### Cron expressions
|
||||
|
||||
```text
|
||||
0 9 * * * → Daily at 9:00 AM
|
||||
0 9 * * 1-5 → Weekdays at 9:00 AM
|
||||
0 */6 * * * → Every 6 hours
|
||||
30 8 1 * * → First of every month at 8:30 AM
|
||||
0 0 * * 0 → Every Sunday at midnight
|
||||
```
|
||||
|
||||
### ISO timestamps
|
||||
|
||||
```text
|
||||
2026-03-15T09:00:00 → One-time at March 15, 2026 9:00 AM
|
||||
```
|
||||
|
||||
## Repeat behavior
|
||||
|
||||
| Schedule type | Default repeat | Behavior |
|
||||
|--------------|----------------|----------|
|
||||
| One-shot (`30m`, timestamp) | 1 | Runs once |
|
||||
| Interval (`every 2h`) | forever | Runs until removed |
|
||||
| Cron expression | forever | Runs until removed |
|
||||
|
||||
You can override it:
|
||||
|
||||
```python
|
||||
cronjob(
|
||||
action="create",
|
||||
prompt="...",
|
||||
schedule="every 2h",
|
||||
repeat=5,
|
||||
)
|
||||
```
|
||||
|
||||
## Managing jobs programmatically
|
||||
|
||||
The agent-facing API is one tool:
|
||||
|
||||
```python
|
||||
cronjob(action="create", ...)
|
||||
cronjob(action="list")
|
||||
cronjob(action="update", job_id="...")
|
||||
cronjob(action="pause", job_id="...")
|
||||
cronjob(action="resume", job_id="...")
|
||||
cronjob(action="run", job_id="...")
|
||||
cronjob(action="remove", job_id="...")
|
||||
```
|
||||
|
||||
For `update`, pass `skills=[]` to remove all attached skills.
|
||||
|
||||
## Job storage
|
||||
|
||||
Jobs are stored in `~/.hermes/cron/jobs.json`. Output from job runs is saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md`.
|
||||
|
||||
The storage uses atomic file writes so interrupted writes do not leave a partially written job file behind.
|
||||
|
||||
## Self-contained prompts still matter
|
||||
|
||||
:::warning Important
|
||||
Cron jobs run in a completely fresh agent session. The prompt must contain everything the agent needs that is not already provided by attached skills.
|
||||
:::
|
||||
|
||||
**BAD:** `"Check on that server issue"`
|
||||
|
||||
**GOOD:** `"SSH into server 192.168.1.100 as user 'deploy', check if nginx is running with 'systemctl status nginx', and verify https://example.com returns HTTP 200."`
|
||||
|
||||
## Security
|
||||
|
||||
Scheduled task prompts are scanned for prompt-injection and credential-exfiltration patterns at creation and update time. Prompts containing invisible Unicode tricks, SSH backdoor attempts, or obvious secret-exfiltration payloads are blocked.
|
||||
222
hermes_code/website/docs/user-guide/features/delegation.md
Normal file
222
hermes_code/website/docs/user-guide/features/delegation.md
Normal file
|
|
@ -0,0 +1,222 @@
|
|||
---
|
||||
sidebar_position: 7
|
||||
title: "Subagent Delegation"
|
||||
description: "Spawn isolated child agents for parallel workstreams with delegate_task"
|
||||
---
|
||||
|
||||
# Subagent Delegation
|
||||
|
||||
The `delegate_task` tool spawns child AIAgent instances with isolated context, restricted toolsets, and their own terminal sessions. Each child gets a fresh conversation and works independently — only its final summary enters the parent's context.
|
||||
|
||||
## Single Task
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Debug why tests fail",
|
||||
context="Error: assertion in test_foo.py line 42",
|
||||
toolsets=["terminal", "file"]
|
||||
)
|
||||
```
|
||||
|
||||
## Parallel Batch
|
||||
|
||||
Up to 3 concurrent subagents:
|
||||
|
||||
```python
|
||||
delegate_task(tasks=[
|
||||
{"goal": "Research topic A", "toolsets": ["web"]},
|
||||
{"goal": "Research topic B", "toolsets": ["web"]},
|
||||
{"goal": "Fix the build", "toolsets": ["terminal", "file"]}
|
||||
])
|
||||
```
|
||||
|
||||
## How Subagent Context Works
|
||||
|
||||
:::warning Critical: Subagents Know Nothing
|
||||
Subagents start with a **completely fresh conversation**. They have zero knowledge of the parent's conversation history, prior tool calls, or anything discussed before delegation. The subagent's only context comes from the `goal` and `context` fields you provide.
|
||||
:::
|
||||
|
||||
This means you must pass **everything** the subagent needs:
|
||||
|
||||
```python
|
||||
# BAD - subagent has no idea what "the error" is
|
||||
delegate_task(goal="Fix the error")
|
||||
|
||||
# GOOD - subagent has all context it needs
|
||||
delegate_task(
|
||||
goal="Fix the TypeError in api/handlers.py",
|
||||
context="""The file api/handlers.py has a TypeError on line 47:
|
||||
'NoneType' object has no attribute 'get'.
|
||||
The function process_request() receives a dict from parse_body(),
|
||||
but parse_body() returns None when Content-Type is missing.
|
||||
The project is at /home/user/myproject and uses Python 3.11."""
|
||||
)
|
||||
```
|
||||
|
||||
The subagent receives a focused system prompt built from your goal and context, instructing it to complete the task and provide a structured summary of what it did, what it found, any files modified, and any issues encountered.
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Parallel Research
|
||||
|
||||
Research multiple topics simultaneously and collect summaries:
|
||||
|
||||
```python
|
||||
delegate_task(tasks=[
|
||||
{
|
||||
"goal": "Research the current state of WebAssembly in 2025",
|
||||
"context": "Focus on: browser support, non-browser runtimes, language support",
|
||||
"toolsets": ["web"]
|
||||
},
|
||||
{
|
||||
"goal": "Research the current state of RISC-V adoption in 2025",
|
||||
"context": "Focus on: server chips, embedded systems, software ecosystem",
|
||||
"toolsets": ["web"]
|
||||
},
|
||||
{
|
||||
"goal": "Research quantum computing progress in 2025",
|
||||
"context": "Focus on: error correction breakthroughs, practical applications, key players",
|
||||
"toolsets": ["web"]
|
||||
}
|
||||
])
|
||||
```
|
||||
|
||||
### Code Review + Fix
|
||||
|
||||
Delegate a review-and-fix workflow to a fresh context:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Review the authentication module for security issues and fix any found",
|
||||
context="""Project at /home/user/webapp.
|
||||
Auth module files: src/auth/login.py, src/auth/jwt.py, src/auth/middleware.py.
|
||||
The project uses Flask, PyJWT, and bcrypt.
|
||||
Focus on: SQL injection, JWT validation, password handling, session management.
|
||||
Fix any issues found and run the test suite (pytest tests/auth/).""",
|
||||
toolsets=["terminal", "file"]
|
||||
)
|
||||
```
|
||||
|
||||
### Multi-File Refactoring
|
||||
|
||||
Delegate a large refactoring task that would flood the parent's context:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Refactor all Python files in src/ to replace print() with proper logging",
|
||||
context="""Project at /home/user/myproject.
|
||||
Use the 'logging' module with logger = logging.getLogger(__name__).
|
||||
Replace print() calls with appropriate log levels:
|
||||
- print(f"Error: ...") -> logger.error(...)
|
||||
- print(f"Warning: ...") -> logger.warning(...)
|
||||
- print(f"Debug: ...") -> logger.debug(...)
|
||||
- Other prints -> logger.info(...)
|
||||
Don't change print() in test files or CLI output.
|
||||
Run pytest after to verify nothing broke.""",
|
||||
toolsets=["terminal", "file"]
|
||||
)
|
||||
```
|
||||
|
||||
## Batch Mode Details
|
||||
|
||||
When you provide a `tasks` array, subagents run in **parallel** using a thread pool:
|
||||
|
||||
- **Maximum concurrency:** 3 tasks (the `tasks` array is truncated to 3 if longer)
|
||||
- **Thread pool:** Uses `ThreadPoolExecutor` with `MAX_CONCURRENT_CHILDREN = 3` workers
|
||||
- **Progress display:** In CLI mode, a tree-view shows tool calls from each subagent in real-time with per-task completion lines. In gateway mode, progress is batched and relayed to the parent's progress callback
|
||||
- **Result ordering:** Results are sorted by task index to match input order regardless of completion order
|
||||
- **Interrupt propagation:** Interrupting the parent (e.g., sending a new message) interrupts all active children
|
||||
|
||||
Single-task delegation runs directly without thread pool overhead.
|
||||
|
||||
## Model Override
|
||||
|
||||
You can configure a different model for subagents via `config.yaml` — useful for delegating simple tasks to cheaper/faster models:
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
delegation:
|
||||
model: "google/gemini-flash-2.0" # Cheaper model for subagents
|
||||
provider: "openrouter" # Optional: route subagents to a different provider
|
||||
```
|
||||
|
||||
If omitted, subagents use the same model as the parent.
|
||||
|
||||
## Toolset Selection Tips
|
||||
|
||||
The `toolsets` parameter controls what tools the subagent has access to. Choose based on the task:
|
||||
|
||||
| Toolset Pattern | Use Case |
|
||||
|----------------|----------|
|
||||
| `["terminal", "file"]` | Code work, debugging, file editing, builds |
|
||||
| `["web"]` | Research, fact-checking, documentation lookup |
|
||||
| `["terminal", "file", "web"]` | Full-stack tasks (default) |
|
||||
| `["file"]` | Read-only analysis, code review without execution |
|
||||
| `["terminal"]` | System administration, process management |
|
||||
|
||||
Certain toolsets are **always blocked** for subagents regardless of what you specify:
|
||||
- `delegation` — no recursive delegation (prevents infinite spawning)
|
||||
- `clarify` — subagents cannot interact with the user
|
||||
- `memory` — no writes to shared persistent memory
|
||||
- `code_execution` — children should reason step-by-step
|
||||
- `send_message` — no cross-platform side effects (e.g., sending Telegram messages)
|
||||
|
||||
## Max Iterations
|
||||
|
||||
Each subagent has an iteration limit (default: 50) that controls how many tool-calling turns it can take:
|
||||
|
||||
```python
|
||||
delegate_task(
|
||||
goal="Quick file check",
|
||||
context="Check if /etc/nginx/nginx.conf exists and print its first 10 lines",
|
||||
max_iterations=10 # Simple task, don't need many turns
|
||||
)
|
||||
```
|
||||
|
||||
## Depth Limit
|
||||
|
||||
Delegation has a **depth limit of 2** — a parent (depth 0) can spawn children (depth 1), but children cannot delegate further. This prevents runaway recursive delegation chains.
|
||||
|
||||
## Key Properties
|
||||
|
||||
- Each subagent gets its **own terminal session** (separate from the parent)
|
||||
- **No nested delegation** — children cannot delegate further (no grandchildren)
|
||||
- Subagents **cannot** call: `delegate_task`, `clarify`, `memory`, `send_message`, `execute_code`
|
||||
- **Interrupt propagation** — interrupting the parent interrupts all active children
|
||||
- Only the final summary enters the parent's context, keeping token usage efficient
|
||||
- Subagents inherit the parent's **API key and provider configuration**
|
||||
|
||||
## Delegation vs execute_code
|
||||
|
||||
| Factor | delegate_task | execute_code |
|
||||
|--------|--------------|-------------|
|
||||
| **Reasoning** | Full LLM reasoning loop | Just Python code execution |
|
||||
| **Context** | Fresh isolated conversation | No conversation, just script |
|
||||
| **Tool access** | All non-blocked tools with reasoning | 7 tools via RPC, no reasoning |
|
||||
| **Parallelism** | Up to 3 concurrent subagents | Single script |
|
||||
| **Best for** | Complex tasks needing judgment | Mechanical multi-step pipelines |
|
||||
| **Token cost** | Higher (full LLM loop) | Lower (only stdout returned) |
|
||||
| **User interaction** | None (subagents can't clarify) | None |
|
||||
|
||||
**Rule of thumb:** Use `delegate_task` when the subtask requires reasoning, judgment, or multi-step problem solving. Use `execute_code` when you need mechanical data processing or scripted workflows.
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
delegation:
|
||||
max_iterations: 50 # Max turns per child (default: 50)
|
||||
default_toolsets: ["terminal", "file", "web"] # Default toolsets
|
||||
model: "google/gemini-3-flash-preview" # Optional provider/model override
|
||||
provider: "openrouter" # Optional built-in provider
|
||||
|
||||
# Or use a direct custom endpoint instead of provider:
|
||||
delegation:
|
||||
model: "qwen2.5-coder"
|
||||
base_url: "http://localhost:1234/v1"
|
||||
api_key: "local-key"
|
||||
```
|
||||
|
||||
:::tip
|
||||
The agent handles delegation automatically based on the task complexity. You don't need to explicitly ask it to delegate — it will do so when it makes sense.
|
||||
:::
|
||||
|
|
@ -0,0 +1,323 @@
|
|||
---
|
||||
title: Fallback Providers
|
||||
description: Configure automatic failover to backup LLM providers when your primary model is unavailable.
|
||||
sidebar_label: Fallback Providers
|
||||
sidebar_position: 8
|
||||
---
|
||||
|
||||
# Fallback Providers
|
||||
|
||||
Hermes Agent has two separate fallback systems that keep your sessions running when providers hit issues:
|
||||
|
||||
1. **Primary model fallback** — automatically switches to a backup provider:model when your main model fails
|
||||
2. **Auxiliary task fallback** — independent provider resolution for side tasks like vision, compression, and web extraction
|
||||
|
||||
Both are optional and work independently.
|
||||
|
||||
## Primary Model Fallback
|
||||
|
||||
When your main LLM provider encounters errors — rate limits, server overload, auth failures, connection drops — Hermes can automatically switch to a backup provider:model pair mid-session without losing your conversation.
|
||||
|
||||
### Configuration
|
||||
|
||||
Add a `fallback_model` section to `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: openrouter
|
||||
model: anthropic/claude-sonnet-4
|
||||
```
|
||||
|
||||
Both `provider` and `model` are **required**. If either is missing, the fallback is disabled.
|
||||
|
||||
### Supported Providers
|
||||
|
||||
| Provider | Value | Requirements |
|
||||
|----------|-------|-------------|
|
||||
| AI Gateway | `ai-gateway` | `AI_GATEWAY_API_KEY` |
|
||||
| OpenRouter | `openrouter` | `OPENROUTER_API_KEY` |
|
||||
| Nous Portal | `nous` | `hermes login` (OAuth) |
|
||||
| OpenAI Codex | `openai-codex` | `hermes model` (ChatGPT OAuth) |
|
||||
| Anthropic | `anthropic` | `ANTHROPIC_API_KEY` or Claude Code credentials |
|
||||
| z.ai / GLM | `zai` | `GLM_API_KEY` |
|
||||
| Kimi / Moonshot | `kimi-coding` | `KIMI_API_KEY` |
|
||||
| MiniMax | `minimax` | `MINIMAX_API_KEY` |
|
||||
| MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` |
|
||||
| Kilo Code | `kilocode` | `KILOCODE_API_KEY` |
|
||||
| Custom endpoint | `custom` | `base_url` + `api_key_env` (see below) |
|
||||
|
||||
### Custom Endpoint Fallback
|
||||
|
||||
For a custom OpenAI-compatible endpoint, add `base_url` and optionally `api_key_env`:
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: custom
|
||||
model: my-local-model
|
||||
base_url: http://localhost:8000/v1
|
||||
api_key_env: MY_LOCAL_KEY # env var name containing the API key
|
||||
```
|
||||
|
||||
### When Fallback Triggers
|
||||
|
||||
The fallback activates automatically when the primary model fails with:
|
||||
|
||||
- **Rate limits** (HTTP 429) — after exhausting retry attempts
|
||||
- **Server errors** (HTTP 500, 502, 503) — after exhausting retry attempts
|
||||
- **Auth failures** (HTTP 401, 403) — immediately (no point retrying)
|
||||
- **Not found** (HTTP 404) — immediately
|
||||
- **Invalid responses** — when the API returns malformed or empty responses repeatedly
|
||||
|
||||
When triggered, Hermes:
|
||||
|
||||
1. Resolves credentials for the fallback provider
|
||||
2. Builds a new API client
|
||||
3. Swaps the model, provider, and client in-place
|
||||
4. Resets the retry counter and continues the conversation
|
||||
|
||||
The switch is seamless — your conversation history, tool calls, and context are preserved. The agent continues from exactly where it left off, just using a different model.
|
||||
|
||||
:::info One-Shot
|
||||
Fallback activates **at most once** per session. If the fallback provider also fails, normal error handling takes over (retries, then error message). This prevents cascading failover loops.
|
||||
:::
|
||||
|
||||
### Examples
|
||||
|
||||
**OpenRouter as fallback for Anthropic native:**
|
||||
```yaml
|
||||
model:
|
||||
provider: anthropic
|
||||
default: claude-sonnet-4-6
|
||||
|
||||
fallback_model:
|
||||
provider: openrouter
|
||||
model: anthropic/claude-sonnet-4
|
||||
```
|
||||
|
||||
**Nous Portal as fallback for OpenRouter:**
|
||||
```yaml
|
||||
model:
|
||||
provider: openrouter
|
||||
default: anthropic/claude-opus-4
|
||||
|
||||
fallback_model:
|
||||
provider: nous
|
||||
model: nous-hermes-3
|
||||
```
|
||||
|
||||
**Local model as fallback for cloud:**
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: custom
|
||||
model: llama-3.1-70b
|
||||
base_url: http://localhost:8000/v1
|
||||
api_key_env: LOCAL_API_KEY
|
||||
```
|
||||
|
||||
**Codex OAuth as fallback:**
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: openai-codex
|
||||
model: gpt-5.3-codex
|
||||
```
|
||||
|
||||
### Where Fallback Works
|
||||
|
||||
| Context | Fallback Supported |
|
||||
|---------|-------------------|
|
||||
| CLI sessions | ✔ |
|
||||
| Messaging gateway (Telegram, Discord, etc.) | ✔ |
|
||||
| Subagent delegation | ✘ (subagents do not inherit fallback config) |
|
||||
| Cron jobs | ✘ (run with a fixed provider) |
|
||||
| Auxiliary tasks (vision, compression) | ✘ (use their own provider chain — see below) |
|
||||
|
||||
:::tip
|
||||
There are no environment variables for `fallback_model` — it is configured exclusively through `config.yaml`. This is intentional: fallback configuration is a deliberate choice, not something a stale shell export should override.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Auxiliary Task Fallback
|
||||
|
||||
Hermes uses separate lightweight models for side tasks. Each task has its own provider resolution chain that acts as a built-in fallback system.
|
||||
|
||||
### Tasks with Independent Provider Resolution
|
||||
|
||||
| Task | What It Does | Config Key |
|
||||
|------|-------------|-----------|
|
||||
| Vision | Image analysis, browser screenshots | `auxiliary.vision` |
|
||||
| Web Extract | Web page summarization | `auxiliary.web_extract` |
|
||||
| Compression | Context compression summaries | `auxiliary.compression` or `compression.summary_provider` |
|
||||
| Session Search | Past session summarization | `auxiliary.session_search` |
|
||||
| Skills Hub | Skill search and discovery | `auxiliary.skills_hub` |
|
||||
| MCP | MCP helper operations | `auxiliary.mcp` |
|
||||
| Memory Flush | Memory consolidation | `auxiliary.flush_memories` |
|
||||
|
||||
### Auto-Detection Chain
|
||||
|
||||
When a task's provider is set to `"auto"` (the default), Hermes tries providers in order until one works:
|
||||
|
||||
**For text tasks (compression, web extract, etc.):**
|
||||
|
||||
```text
|
||||
OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
|
||||
API-key providers (z.ai, Kimi, MiniMax, Anthropic) → give up
|
||||
```
|
||||
|
||||
**For vision tasks:**
|
||||
|
||||
```text
|
||||
Main provider (if vision-capable) → OpenRouter → Nous Portal →
|
||||
Codex OAuth → Anthropic → Custom endpoint → give up
|
||||
```
|
||||
|
||||
If the resolved provider fails at call time, Hermes also has an internal retry: if the provider is not OpenRouter and no explicit `base_url` is set, it tries OpenRouter as a last-resort fallback.
|
||||
|
||||
### Configuring Auxiliary Providers
|
||||
|
||||
Each task can be configured independently in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
vision:
|
||||
provider: "auto" # auto | openrouter | nous | codex | main | anthropic
|
||||
model: "" # e.g. "openai/gpt-4o"
|
||||
base_url: "" # direct endpoint (takes precedence over provider)
|
||||
api_key: "" # API key for base_url
|
||||
|
||||
web_extract:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
compression:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
session_search:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
skills_hub:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
mcp:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
|
||||
flush_memories:
|
||||
provider: "auto"
|
||||
model: ""
|
||||
```
|
||||
|
||||
Every task above follows the same **provider / model / base_url** pattern. Context compression uses its own top-level block:
|
||||
|
||||
```yaml
|
||||
compression:
|
||||
summary_provider: main # Same provider options as auxiliary tasks
|
||||
summary_model: google/gemini-3-flash-preview
|
||||
summary_base_url: null # Custom OpenAI-compatible endpoint
|
||||
```
|
||||
|
||||
And the fallback model uses:
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
provider: openrouter
|
||||
model: anthropic/claude-sonnet-4
|
||||
# base_url: http://localhost:8000/v1 # Optional custom endpoint
|
||||
```
|
||||
|
||||
All three — auxiliary, compression, fallback — work the same way: set `provider` to pick who handles the request, `model` to pick which model, and `base_url` to point at a custom endpoint (overrides provider).
|
||||
|
||||
### Provider Options for Auxiliary Tasks
|
||||
|
||||
| Provider | Description | Requirements |
|
||||
|----------|-------------|-------------|
|
||||
| `"auto"` | Try providers in order until one works (default) | At least one provider configured |
|
||||
| `"openrouter"` | Force OpenRouter | `OPENROUTER_API_KEY` |
|
||||
| `"nous"` | Force Nous Portal | `hermes login` |
|
||||
| `"codex"` | Force Codex OAuth | `hermes model` → Codex |
|
||||
| `"main"` | Use whatever provider the main agent uses | Active main provider configured |
|
||||
| `"anthropic"` | Force Anthropic native | `ANTHROPIC_API_KEY` or Claude Code credentials |
|
||||
|
||||
### Direct Endpoint Override
|
||||
|
||||
For any auxiliary task, setting `base_url` bypasses provider resolution entirely and sends requests directly to that endpoint:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
vision:
|
||||
base_url: "http://localhost:1234/v1"
|
||||
api_key: "local-key"
|
||||
model: "qwen2.5-vl"
|
||||
```
|
||||
|
||||
`base_url` takes precedence over `provider`. Hermes uses the configured `api_key` for authentication, falling back to `OPENAI_API_KEY` if not set. It does **not** reuse `OPENROUTER_API_KEY` for custom endpoints.
|
||||
|
||||
---
|
||||
|
||||
## Context Compression Fallback
|
||||
|
||||
Context compression has a legacy configuration path in addition to the auxiliary system:
|
||||
|
||||
```yaml
|
||||
compression:
|
||||
summary_provider: "auto" # auto | openrouter | nous | main
|
||||
summary_model: "google/gemini-3-flash-preview"
|
||||
```
|
||||
|
||||
This is equivalent to configuring `auxiliary.compression.provider` and `auxiliary.compression.model`. If both are set, the `auxiliary.compression` values take precedence.
|
||||
|
||||
If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.
|
||||
|
||||
---
|
||||
|
||||
## Delegation Provider Override
|
||||
|
||||
Subagents spawned by `delegate_task` do **not** use the primary fallback model. However, they can be routed to a different provider:model pair for cost optimization:
|
||||
|
||||
```yaml
|
||||
delegation:
|
||||
provider: "openrouter" # override provider for all subagents
|
||||
model: "google/gemini-3-flash-preview" # override model
|
||||
# base_url: "http://localhost:1234/v1" # or use a direct endpoint
|
||||
# api_key: "local-key"
|
||||
```
|
||||
|
||||
See [Subagent Delegation](/docs/user-guide/features/delegation) for full configuration details.
|
||||
|
||||
---
|
||||
|
||||
## Cron Job Providers
|
||||
|
||||
Cron jobs run with whatever provider is configured at execution time. They do not support a fallback model. To use a different provider for cron jobs, configure `provider` and `model` overrides on the cron job itself:
|
||||
|
||||
```python
|
||||
cronjob(
|
||||
action="create",
|
||||
schedule="every 2h",
|
||||
prompt="Check server status",
|
||||
provider="openrouter",
|
||||
model="google/gemini-3-flash-preview"
|
||||
)
|
||||
```
|
||||
|
||||
See [Scheduled Tasks (Cron)](/docs/user-guide/features/cron) for full configuration details.
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Feature | Fallback Mechanism | Config Location |
|
||||
|---------|-------------------|----------------|
|
||||
| Main agent model | `fallback_model` in config.yaml — one-shot failover on errors | `fallback_model:` (top-level) |
|
||||
| Vision | Auto-detection chain + internal OpenRouter retry | `auxiliary.vision` |
|
||||
| Web extraction | Auto-detection chain + internal OpenRouter retry | `auxiliary.web_extract` |
|
||||
| Context compression | Auto-detection chain, degrades to no-summary if unavailable | `auxiliary.compression` or `compression.summary_provider` |
|
||||
| Session search | Auto-detection chain | `auxiliary.session_search` |
|
||||
| Skills hub | Auto-detection chain | `auxiliary.skills_hub` |
|
||||
| MCP helpers | Auto-detection chain | `auxiliary.mcp` |
|
||||
| Memory flush | Auto-detection chain | `auxiliary.flush_memories` |
|
||||
| Delegation | Provider override only (no automatic fallback) | `delegation.provider` / `delegation.model` |
|
||||
| Cron jobs | Per-job provider override only (no automatic fallback) | Per-job `provider` / `model` |
|
||||
404
hermes_code/website/docs/user-guide/features/honcho.md
Normal file
404
hermes_code/website/docs/user-guide/features/honcho.md
Normal file
|
|
@ -0,0 +1,404 @@
|
|||
---
|
||||
title: Honcho Memory
|
||||
description: AI-native persistent memory for cross-session user modeling and personalization.
|
||||
sidebar_label: Honcho Memory
|
||||
sidebar_position: 8
|
||||
---
|
||||
|
||||
# Honcho Memory
|
||||
|
||||
[Honcho](https://honcho.dev) is an AI-native memory system that gives Hermes persistent, cross-session understanding of users. While Hermes has built-in memory (`MEMORY.md` and `USER.md`), Honcho adds a deeper layer of **user modeling** — learning preferences, goals, communication style, and context across conversations via a dual-peer architecture where both the user and the AI build representations over time.
|
||||
|
||||
## Works Alongside Built-in Memory
|
||||
|
||||
Hermes has two memory systems that can work together or be configured separately. In `hybrid` mode (the default), both run side by side — Honcho adds cross-session user modeling while local files handle agent-level notes.
|
||||
|
||||
| Feature | Built-in Memory | Honcho Memory |
|
||||
|---------|----------------|---------------|
|
||||
| Storage | Local files (`~/.hermes/memories/`) | Cloud-hosted Honcho API |
|
||||
| Scope | Agent-level notes and user profile | Deep user modeling via dialectic reasoning |
|
||||
| Persistence | Across sessions on same machine | Across sessions, machines, and platforms |
|
||||
| Query | Injected into system prompt automatically | Prefetched + on-demand via tools |
|
||||
| Content | Manually curated by the agent | Automatically learned from conversations |
|
||||
| Write surface | `memory` tool (add/replace/remove) | `honcho_conclude` tool (persist facts) |
|
||||
|
||||
Set `memoryMode` to `honcho` to use Honcho exclusively. See [Memory Modes](#memory-modes) for per-peer configuration.
|
||||
|
||||
|
||||
## Self-hosted / Docker
|
||||
|
||||
Hermes supports a local Honcho instance (e.g. via Docker) in addition to the hosted API. Point it at your instance using `HONCHO_BASE_URL` — no API key required.
|
||||
|
||||
**Via `hermes config`:**
|
||||
|
||||
```bash
|
||||
hermes config set HONCHO_BASE_URL http://localhost:8000
|
||||
```
|
||||
|
||||
**Via `~/.honcho/config.json`:**
|
||||
|
||||
```json
|
||||
{
|
||||
"hosts": {
|
||||
"hermes": {
|
||||
"base_url": "http://localhost:8000",
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Hermes auto-enables Honcho when either `apiKey` or `base_url` is present, so no further configuration is needed for a local instance.
|
||||
|
||||
To run Honcho locally, refer to the [Honcho self-hosting docs](https://docs.honcho.dev).
|
||||
|
||||
## Setup
|
||||
|
||||
### Interactive Setup
|
||||
|
||||
```bash
|
||||
hermes honcho setup
|
||||
```
|
||||
|
||||
The setup wizard walks through API key, peer names, workspace, memory mode, write frequency, recall mode, and session strategy. It offers to install `honcho-ai` if missing.
|
||||
|
||||
### Manual Setup
|
||||
|
||||
#### 1. Install the Client Library
|
||||
|
||||
```bash
|
||||
pip install 'honcho-ai>=2.0.1'
|
||||
```
|
||||
|
||||
#### 2. Get an API Key
|
||||
|
||||
Go to [app.honcho.dev](https://app.honcho.dev) > Settings > API Keys.
|
||||
|
||||
#### 3. Configure
|
||||
|
||||
Honcho reads from `~/.honcho/config.json` (shared across all Honcho-enabled applications):
|
||||
|
||||
```json
|
||||
{
|
||||
"apiKey": "your-honcho-api-key",
|
||||
"hosts": {
|
||||
"hermes": {
|
||||
"workspace": "hermes",
|
||||
"peerName": "your-name",
|
||||
"aiPeer": "hermes",
|
||||
"memoryMode": "hybrid",
|
||||
"writeFrequency": "async",
|
||||
"recallMode": "hybrid",
|
||||
"sessionStrategy": "per-session",
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`apiKey` lives at the root because it is a shared credential across all Honcho-enabled tools. All other settings are scoped under `hosts.hermes`. The `hermes honcho setup` wizard writes this structure automatically.
|
||||
|
||||
Or set the API key as an environment variable:
|
||||
|
||||
```bash
|
||||
hermes config set HONCHO_API_KEY your-key
|
||||
```
|
||||
|
||||
:::info
|
||||
When an API key is present (either in `~/.honcho/config.json` or as `HONCHO_API_KEY`), Honcho auto-enables unless explicitly set to `"enabled": false`.
|
||||
:::
|
||||
|
||||
## Configuration
|
||||
|
||||
### Global Config (`~/.honcho/config.json`)
|
||||
|
||||
Settings are scoped to `hosts.hermes` and fall back to root-level globals when the host field is absent. Root-level keys are managed by the user or the honcho CLI -- Hermes only writes to its own host block (except `apiKey`, which is a shared credential at root).
|
||||
|
||||
**Root-level (shared)**
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `apiKey` | — | Honcho API key (required, shared across all hosts) |
|
||||
| `sessions` | `{}` | Manual session name overrides per directory (shared) |
|
||||
|
||||
**Host-level (`hosts.hermes`)**
|
||||
|
||||
| Field | Default | Description |
|
||||
|-------|---------|-------------|
|
||||
| `workspace` | `"hermes"` | Workspace identifier |
|
||||
| `peerName` | *(derived)* | Your identity name for user modeling |
|
||||
| `aiPeer` | `"hermes"` | AI assistant identity name |
|
||||
| `environment` | `"production"` | Honcho environment |
|
||||
| `enabled` | *(auto)* | Auto-enables when API key is present |
|
||||
| `saveMessages` | `true` | Whether to sync messages to Honcho |
|
||||
| `memoryMode` | `"hybrid"` | Memory mode: `hybrid` or `honcho` |
|
||||
| `writeFrequency` | `"async"` | When to write: `async`, `turn`, `session`, or integer N |
|
||||
| `recallMode` | `"hybrid"` | Retrieval strategy: `hybrid`, `context`, or `tools` |
|
||||
| `sessionStrategy` | `"per-session"` | How sessions are scoped |
|
||||
| `sessionPeerPrefix` | `false` | Prefix session names with peer name |
|
||||
| `contextTokens` | *(Honcho default)* | Max tokens for auto-injected context |
|
||||
| `dialecticReasoningLevel` | `"low"` | Floor for dialectic reasoning: `minimal` / `low` / `medium` / `high` / `max` |
|
||||
| `dialecticMaxChars` | `600` | Char cap on dialectic results injected into system prompt |
|
||||
| `linkedHosts` | `[]` | Other host keys whose workspaces to cross-reference |
|
||||
|
||||
All host-level fields fall back to the equivalent root-level key if not set under `hosts.hermes`. Existing configs with settings at root level continue to work.
|
||||
|
||||
### Memory Modes
|
||||
|
||||
| Mode | Effect |
|
||||
|------|--------|
|
||||
| `hybrid` | Write to both Honcho and local files (default) |
|
||||
| `honcho` | Honcho only — skip local file writes |
|
||||
|
||||
Memory mode can be set globally or per-peer (user, agent1, agent2, etc):
|
||||
|
||||
```json
|
||||
{
|
||||
"memoryMode": {
|
||||
"default": "hybrid",
|
||||
"hermes": "honcho"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To disable Honcho entirely, set `enabled: false` or remove the API key.
|
||||
|
||||
### Recall Modes
|
||||
|
||||
Controls how Honcho context reaches the agent:
|
||||
|
||||
| Mode | Behavior |
|
||||
|------|----------|
|
||||
| `hybrid` | Auto-injected context + Honcho tools available (default) |
|
||||
| `context` | Auto-injected context only — Honcho tools hidden |
|
||||
| `tools` | Honcho tools only — no auto-injected context |
|
||||
|
||||
### Write Frequency
|
||||
|
||||
| Setting | Behavior |
|
||||
|---------|----------|
|
||||
| `async` | Background thread writes (zero blocking, default) |
|
||||
| `turn` | Synchronous write after each turn |
|
||||
| `session` | Batched write at session end |
|
||||
| *integer N* | Write every N turns |
|
||||
|
||||
### Session Strategies
|
||||
|
||||
| Strategy | Session key | Use case |
|
||||
|----------|-------------|----------|
|
||||
| `per-session` | Unique per run | Default. Fresh session every time. |
|
||||
| `per-directory` | CWD basename | Each project gets its own session. |
|
||||
| `per-repo` | Git repo root name | Groups subdirectories under one session. |
|
||||
| `global` | Fixed `"global"` | Single cross-project session. |
|
||||
|
||||
Resolution order: manual map > session title > strategy-derived key > platform key.
|
||||
|
||||
### Multi-host Configuration
|
||||
|
||||
Multiple Honcho-enabled tools share `~/.honcho/config.json`. Each tool writes only to its own host block, reads its host block first, and falls back to root-level globals:
|
||||
|
||||
```json
|
||||
{
|
||||
"apiKey": "your-key",
|
||||
"peerName": "eri",
|
||||
"hosts": {
|
||||
"hermes": {
|
||||
"workspace": "my-workspace",
|
||||
"aiPeer": "hermes-assistant",
|
||||
"memoryMode": "honcho",
|
||||
"linkedHosts": ["claude-code"],
|
||||
"contextTokens": 2000,
|
||||
"dialecticReasoningLevel": "medium"
|
||||
},
|
||||
"claude-code": {
|
||||
"workspace": "my-workspace",
|
||||
"aiPeer": "clawd"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Resolution: `hosts.<tool>` field > root-level field > default. In this example, both tools share the root `apiKey` and `peerName`, but each has its own `aiPeer` and workspace settings.
|
||||
|
||||
### Hermes Config (`~/.hermes/config.yaml`)
|
||||
|
||||
Intentionally minimal — most configuration comes from `~/.honcho/config.json`:
|
||||
|
||||
```yaml
|
||||
honcho: {}
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
### Async Context Pipeline
|
||||
|
||||
Honcho context is fetched asynchronously to avoid blocking the response path:
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
user["User message"] --> cache["Consume cached Honcho context<br/>from the previous turn"]
|
||||
cache --> prompt["Inject user, AI, and dialectic context<br/>into the system prompt"]
|
||||
prompt --> llm["LLM call"]
|
||||
llm --> response["Assistant response"]
|
||||
response --> fetch["Start background fetch for Turn N+1"]
|
||||
fetch --> ctx["Fetch context"]
|
||||
fetch --> dia["Fetch dialectic"]
|
||||
ctx --> next["Cache for the next turn"]
|
||||
dia --> next
|
||||
```
|
||||
|
||||
Turn 1 is a cold start (no cache). All subsequent turns consume cached results with zero HTTP latency on the response path. The system prompt on turn 1 uses only static context to preserve prefix cache hits at the LLM provider.
|
||||
|
||||
### Dual-Peer Architecture
|
||||
|
||||
Both the user and AI have peer representations in Honcho:
|
||||
|
||||
- **User peer** — observed from user messages. Honcho learns preferences, goals, communication style.
|
||||
- **AI peer** — observed from assistant messages (`observe_me=True`). Honcho builds a representation of the agent's knowledge and behavior.
|
||||
|
||||
Both representations are injected into the system prompt when available.
|
||||
|
||||
### Dynamic Reasoning Level
|
||||
|
||||
Dialectic queries scale reasoning effort with message complexity:
|
||||
|
||||
| Message length | Reasoning level |
|
||||
|----------------|-----------------|
|
||||
| < 120 chars | Config default (typically `low`) |
|
||||
| 120-400 chars | One level above default (cap: `high`) |
|
||||
| > 400 chars | Two levels above default (cap: `high`) |
|
||||
|
||||
`max` is never selected automatically.
|
||||
|
||||
### Gateway Integration
|
||||
|
||||
The gateway creates short-lived `AIAgent` instances per request. Honcho managers are owned at the gateway session layer (`_honcho_managers` dict) so they persist across requests within the same session and flush at real session boundaries (reset, resume, expiry, server stop).
|
||||
|
||||
#### Session Isolation
|
||||
|
||||
Each gateway session (e.g., a Telegram chat, a Discord channel) gets its own Honcho session context. The session key — derived from the platform and chat ID — is threaded through the entire tool dispatch chain so that Honcho tool calls always execute against the correct session, even when multiple users are messaging concurrently.
|
||||
|
||||
This means:
|
||||
- **`honcho_profile`**, **`honcho_search`**, **`honcho_context`**, and **`honcho_conclude`** all resolve the correct session at call time, not at startup
|
||||
- Background memory flushes (triggered by `/reset`, `/resume`, or session expiry) preserve the original session key so they write to the correct Honcho session
|
||||
- Synthetic flush turns (where the agent saves memories before context is lost) skip Honcho sync to avoid polluting conversation history with internal bookkeeping
|
||||
|
||||
#### Session Lifecycle
|
||||
|
||||
| Event | What happens to Honcho |
|
||||
|-------|------------------------|
|
||||
| New message arrives | Agent inherits the gateway's Honcho manager + session key |
|
||||
| `/reset` | Memory flush fires with the old session key, then Honcho manager shuts down |
|
||||
| `/resume` | Current session is flushed, then the resumed session's Honcho context loads |
|
||||
| Session expiry | Automatic flush + shutdown after the configured idle timeout |
|
||||
| Gateway stop | All active Honcho managers are flushed and shut down gracefully |
|
||||
|
||||
## Tools
|
||||
|
||||
When Honcho is active, four tools become available. Availability is gated dynamically — they are invisible when Honcho is disabled.
|
||||
|
||||
### `honcho_profile`
|
||||
|
||||
Fast peer card retrieval (no LLM). Returns a curated list of key facts about the user.
|
||||
|
||||
### `honcho_search`
|
||||
|
||||
Semantic search over memory (no LLM). Returns raw excerpts ranked by relevance. Cheaper and faster than `honcho_context` — good for factual lookups.
|
||||
|
||||
Parameters:
|
||||
- `query` (string) — search query
|
||||
- `max_tokens` (integer, optional) — result token budget
|
||||
|
||||
### `honcho_context`
|
||||
|
||||
Dialectic Q&A powered by Honcho's LLM. Synthesizes an answer from accumulated conversation history.
|
||||
|
||||
Parameters:
|
||||
- `query` (string) — natural language question
|
||||
- `peer` (string, optional) — `"user"` (default) or `"ai"`. Querying `"ai"` asks about the assistant's own history and identity.
|
||||
|
||||
Example queries the agent might make:
|
||||
|
||||
```
|
||||
"What are this user's main goals?"
|
||||
"What communication style does this user prefer?"
|
||||
"What topics has this user discussed recently?"
|
||||
"What is this user's technical expertise level?"
|
||||
```
|
||||
|
||||
### `honcho_conclude`
|
||||
|
||||
Writes a fact to Honcho memory. Use when the user explicitly states a preference, correction, or project context worth remembering. Feeds into the user's peer card and representation.
|
||||
|
||||
Parameters:
|
||||
- `conclusion` (string) — the fact to persist
|
||||
|
||||
## CLI Commands
|
||||
|
||||
```
|
||||
hermes honcho setup # Interactive setup wizard
|
||||
hermes honcho status # Show config and connection status
|
||||
hermes honcho sessions # List directory → session name mappings
|
||||
hermes honcho map <name> # Map current directory to a session name
|
||||
hermes honcho peer # Show peer names and dialectic settings
|
||||
hermes honcho peer --user NAME # Set user peer name
|
||||
hermes honcho peer --ai NAME # Set AI peer name
|
||||
hermes honcho peer --reasoning LEVEL # Set dialectic reasoning level
|
||||
hermes honcho mode # Show current memory mode
|
||||
hermes honcho mode [hybrid|honcho|local] # Set memory mode
|
||||
hermes honcho tokens # Show token budget settings
|
||||
hermes honcho tokens --context N # Set context token cap
|
||||
hermes honcho tokens --dialectic N # Set dialectic char cap
|
||||
hermes honcho identity # Show AI peer identity
|
||||
hermes honcho identity <file> # Seed AI peer identity from file (SOUL.md, etc.)
|
||||
hermes honcho migrate # Migration guide: OpenClaw → Hermes + Honcho
|
||||
```
|
||||
|
||||
### Doctor Integration
|
||||
|
||||
`hermes doctor` includes a Honcho section that validates config, API key, and connection status.
|
||||
|
||||
## Migration
|
||||
|
||||
### From Local Memory
|
||||
|
||||
When Honcho activates on an instance with existing local history, migration runs automatically:
|
||||
|
||||
1. **Conversation history** — prior messages are uploaded as an XML transcript file
|
||||
2. **Memory files** — existing `MEMORY.md`, `USER.md`, and `SOUL.md` are uploaded for context
|
||||
|
||||
### From OpenClaw
|
||||
|
||||
```bash
|
||||
hermes honcho migrate
|
||||
```
|
||||
|
||||
Walks through converting an OpenClaw native Honcho setup to the shared `~/.honcho/config.json` format.
|
||||
|
||||
## AI Peer Identity
|
||||
|
||||
Honcho can build a representation of the AI assistant over time (via `observe_me=True`). You can also seed the AI peer explicitly:
|
||||
|
||||
```bash
|
||||
hermes honcho identity ~/.hermes/SOUL.md
|
||||
```
|
||||
|
||||
This uploads the file content through Honcho's observation pipeline. The AI peer representation is then injected into the system prompt alongside the user's, giving the agent awareness of its own accumulated identity.
|
||||
|
||||
```bash
|
||||
hermes honcho identity --show
|
||||
```
|
||||
|
||||
Shows the current AI peer representation from Honcho.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Personalized responses** — Honcho learns how each user prefers to communicate
|
||||
- **Goal tracking** — remembers what users are working toward across sessions
|
||||
- **Expertise adaptation** — adjusts technical depth based on user's background
|
||||
- **Cross-platform memory** — same user understanding across CLI, Telegram, Discord, etc.
|
||||
- **Multi-user support** — each user (via messaging platforms) gets their own user model
|
||||
|
||||
:::tip
|
||||
Honcho is fully opt-in — zero behavior change when disabled or unconfigured. All Honcho calls are non-fatal; if the service is unreachable, the agent continues normally.
|
||||
:::
|
||||
182
hermes_code/website/docs/user-guide/features/hooks.md
Normal file
182
hermes_code/website/docs/user-guide/features/hooks.md
Normal file
|
|
@ -0,0 +1,182 @@
|
|||
---
|
||||
sidebar_position: 6
|
||||
title: "Event Hooks"
|
||||
description: "Run custom code at key lifecycle points — log activity, send alerts, post to webhooks"
|
||||
---
|
||||
|
||||
# Event Hooks
|
||||
|
||||
The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.
|
||||
|
||||
## Creating a Hook
|
||||
|
||||
Each hook is a directory under `~/.hermes/hooks/` containing two files:
|
||||
|
||||
```text
|
||||
~/.hermes/hooks/
|
||||
└── my-hook/
|
||||
├── HOOK.yaml # Declares which events to listen for
|
||||
└── handler.py # Python handler function
|
||||
```
|
||||
|
||||
### HOOK.yaml
|
||||
|
||||
```yaml
|
||||
name: my-hook
|
||||
description: Log all agent activity to a file
|
||||
events:
|
||||
- agent:start
|
||||
- agent:end
|
||||
- agent:step
|
||||
```
|
||||
|
||||
The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
|
||||
|
||||
### handler.py
|
||||
|
||||
```python
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
LOG_FILE = Path.home() / ".hermes" / "hooks" / "my-hook" / "activity.log"
|
||||
|
||||
async def handle(event_type: str, context: dict):
|
||||
"""Called for each subscribed event. Must be named 'handle'."""
|
||||
entry = {
|
||||
"timestamp": datetime.now().isoformat(),
|
||||
"event": event_type,
|
||||
**context,
|
||||
}
|
||||
with open(LOG_FILE, "a") as f:
|
||||
f.write(json.dumps(entry) + "\n")
|
||||
```
|
||||
|
||||
**Handler rules:**
|
||||
- Must be named `handle`
|
||||
- Receives `event_type` (string) and `context` (dict)
|
||||
- Can be `async def` or regular `def` — both work
|
||||
- Errors are caught and logged, never crashing the agent
|
||||
|
||||
## Available Events
|
||||
|
||||
| Event | When it fires | Context keys |
|
||||
|-------|---------------|--------------|
|
||||
| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
|
||||
| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
|
||||
| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
|
||||
| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
|
||||
| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
|
||||
| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
|
||||
| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
|
||||
|
||||
### Wildcard Matching
|
||||
|
||||
Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
|
||||
|
||||
## Examples
|
||||
|
||||
### Telegram Alert on Long Tasks
|
||||
|
||||
Send yourself a message when the agent takes more than 10 steps:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/hooks/long-task-alert/HOOK.yaml
|
||||
name: long-task-alert
|
||||
description: Alert when agent is taking many steps
|
||||
events:
|
||||
- agent:step
|
||||
```
|
||||
|
||||
```python
|
||||
# ~/.hermes/hooks/long-task-alert/handler.py
|
||||
import os
|
||||
import httpx
|
||||
|
||||
THRESHOLD = 10
|
||||
BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
|
||||
CHAT_ID = os.getenv("TELEGRAM_HOME_CHANNEL")
|
||||
|
||||
async def handle(event_type: str, context: dict):
|
||||
iteration = context.get("iteration", 0)
|
||||
if iteration == THRESHOLD and BOT_TOKEN and CHAT_ID:
|
||||
tools = ", ".join(context.get("tool_names", []))
|
||||
text = f"⚠️ Agent has been running for {iteration} steps. Last tools: {tools}"
|
||||
async with httpx.AsyncClient() as client:
|
||||
await client.post(
|
||||
f"https://api.telegram.org/bot{BOT_TOKEN}/sendMessage",
|
||||
json={"chat_id": CHAT_ID, "text": text},
|
||||
)
|
||||
```
|
||||
|
||||
### Command Usage Logger
|
||||
|
||||
Track which slash commands are used:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/hooks/command-logger/HOOK.yaml
|
||||
name: command-logger
|
||||
description: Log slash command usage
|
||||
events:
|
||||
- command:*
|
||||
```
|
||||
|
||||
```python
|
||||
# ~/.hermes/hooks/command-logger/handler.py
|
||||
import json
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
|
||||
LOG = Path.home() / ".hermes" / "logs" / "command_usage.jsonl"
|
||||
|
||||
def handle(event_type: str, context: dict):
|
||||
LOG.parent.mkdir(parents=True, exist_ok=True)
|
||||
entry = {
|
||||
"ts": datetime.now().isoformat(),
|
||||
"command": context.get("command"),
|
||||
"args": context.get("args"),
|
||||
"platform": context.get("platform"),
|
||||
"user": context.get("user_id"),
|
||||
}
|
||||
with open(LOG, "a") as f:
|
||||
f.write(json.dumps(entry) + "\n")
|
||||
```
|
||||
|
||||
### Session Start Webhook
|
||||
|
||||
POST to an external service on new sessions:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/hooks/session-webhook/HOOK.yaml
|
||||
name: session-webhook
|
||||
description: Notify external service on new sessions
|
||||
events:
|
||||
- session:start
|
||||
- session:reset
|
||||
```
|
||||
|
||||
```python
|
||||
# ~/.hermes/hooks/session-webhook/handler.py
|
||||
import httpx
|
||||
|
||||
WEBHOOK_URL = "https://your-service.example.com/hermes-events"
|
||||
|
||||
async def handle(event_type: str, context: dict):
|
||||
async with httpx.AsyncClient() as client:
|
||||
await client.post(WEBHOOK_URL, json={
|
||||
"event": event_type,
|
||||
**context,
|
||||
}, timeout=5)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
|
||||
2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
|
||||
3. Handlers are registered for their declared events
|
||||
4. At each lifecycle point, `hooks.emit()` fires all matching handlers
|
||||
5. Errors in any handler are caught and logged — a broken hook never crashes the agent
|
||||
|
||||
:::info
|
||||
Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
|
||||
:::
|
||||
150
hermes_code/website/docs/user-guide/features/image-generation.md
Normal file
150
hermes_code/website/docs/user-guide/features/image-generation.md
Normal file
|
|
@ -0,0 +1,150 @@
|
|||
---
|
||||
title: Image Generation
|
||||
description: Generate high-quality images using FLUX 2 Pro with automatic upscaling via FAL.ai.
|
||||
sidebar_label: Image Generation
|
||||
sidebar_position: 6
|
||||
---
|
||||
|
||||
# Image Generation
|
||||
|
||||
Hermes Agent can generate images from text prompts using FAL.ai's **FLUX 2 Pro** model with automatic 2x upscaling via the **Clarity Upscaler** for enhanced quality.
|
||||
|
||||
## Setup
|
||||
|
||||
### Get a FAL API Key
|
||||
|
||||
1. Sign up at [fal.ai](https://fal.ai/)
|
||||
2. Generate an API key from your dashboard
|
||||
|
||||
### Configure the Key
|
||||
|
||||
```bash
|
||||
# Add to ~/.hermes/.env
|
||||
FAL_KEY=your-fal-api-key-here
|
||||
```
|
||||
|
||||
### Install the Client Library
|
||||
|
||||
```bash
|
||||
pip install fal-client
|
||||
```
|
||||
|
||||
:::info
|
||||
The image generation tool is automatically available when `FAL_KEY` is set. No additional toolset configuration is needed.
|
||||
:::
|
||||
|
||||
## How It Works
|
||||
|
||||
When you ask Hermes to generate an image:
|
||||
|
||||
1. **Generation** — Your prompt is sent to the FLUX 2 Pro model (`fal-ai/flux-2-pro`)
|
||||
2. **Upscaling** — The generated image is automatically upscaled 2x using the Clarity Upscaler (`fal-ai/clarity-upscaler`)
|
||||
3. **Delivery** — The upscaled image URL is returned
|
||||
|
||||
If upscaling fails for any reason, the original image is returned as a fallback.
|
||||
|
||||
## Usage
|
||||
|
||||
Simply ask Hermes to create an image:
|
||||
|
||||
```
|
||||
Generate an image of a serene mountain landscape with cherry blossoms
|
||||
```
|
||||
|
||||
```
|
||||
Create a portrait of a wise old owl perched on an ancient tree branch
|
||||
```
|
||||
|
||||
```
|
||||
Make me a futuristic cityscape with flying cars and neon lights
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `image_generate_tool` accepts these parameters:
|
||||
|
||||
| Parameter | Default | Range | Description |
|
||||
|-----------|---------|-------|-------------|
|
||||
| `prompt` | *(required)* | — | Text description of the desired image |
|
||||
| `aspect_ratio` | `"landscape"` | `landscape`, `square`, `portrait` | Image aspect ratio |
|
||||
| `num_inference_steps` | `50` | 1–100 | Number of denoising steps (more = higher quality, slower) |
|
||||
| `guidance_scale` | `4.5` | 0.1–20.0 | How closely to follow the prompt |
|
||||
| `num_images` | `1` | 1–4 | Number of images to generate |
|
||||
| `output_format` | `"png"` | `png`, `jpeg` | Image file format |
|
||||
| `seed` | *(random)* | any integer | Random seed for reproducible results |
|
||||
|
||||
## Aspect Ratios
|
||||
|
||||
The tool uses simplified aspect ratio names that map to FLUX 2 Pro image sizes:
|
||||
|
||||
| Aspect Ratio | Maps To | Best For |
|
||||
|-------------|---------|----------|
|
||||
| `landscape` | `landscape_16_9` | Wallpapers, banners, scenes |
|
||||
| `square` | `square_hd` | Profile pictures, social media posts |
|
||||
| `portrait` | `portrait_16_9` | Character art, phone wallpapers |
|
||||
|
||||
:::tip
|
||||
You can also use the raw FLUX 2 Pro size presets directly: `square_hd`, `square`, `portrait_4_3`, `portrait_16_9`, `landscape_4_3`, `landscape_16_9`. Custom sizes up to 2048x2048 are also supported.
|
||||
:::
|
||||
|
||||
## Automatic Upscaling
|
||||
|
||||
Every generated image is automatically upscaled 2x using FAL.ai's Clarity Upscaler with these settings:
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Upscale Factor | 2x |
|
||||
| Creativity | 0.35 |
|
||||
| Resemblance | 0.6 |
|
||||
| Guidance Scale | 4 |
|
||||
| Inference Steps | 18 |
|
||||
| Positive Prompt | `"masterpiece, best quality, highres"` + your original prompt |
|
||||
| Negative Prompt | `"(worst quality, low quality, normal quality:2)"` |
|
||||
|
||||
The upscaler enhances detail and resolution while preserving the original composition. If the upscaler fails (network issue, rate limit), the original resolution image is returned automatically.
|
||||
|
||||
## Example Prompts
|
||||
|
||||
Here are some effective prompts to try:
|
||||
|
||||
```
|
||||
A candid street photo of a woman with a pink bob and bold eyeliner
|
||||
```
|
||||
|
||||
```
|
||||
Modern architecture building with glass facade, sunset lighting
|
||||
```
|
||||
|
||||
```
|
||||
Abstract art with vibrant colors and geometric patterns
|
||||
```
|
||||
|
||||
```
|
||||
Portrait of a wise old owl perched on ancient tree branch
|
||||
```
|
||||
|
||||
```
|
||||
Futuristic cityscape with flying cars and neon lights
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
Enable debug logging for image generation:
|
||||
|
||||
```bash
|
||||
export IMAGE_TOOLS_DEBUG=true
|
||||
```
|
||||
|
||||
Debug logs are saved to `./logs/image_tools_debug_<session_id>.json` with details about each generation request, parameters, timing, and any errors.
|
||||
|
||||
## Safety Settings
|
||||
|
||||
The image generation tool runs with safety checks disabled by default (`safety_tolerance: 5`, the most permissive setting). This is configured at the code level and is not user-adjustable.
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Requires FAL API key** — image generation incurs API costs on your FAL.ai account
|
||||
- **No image editing** — this is text-to-image only, no inpainting or img2img
|
||||
- **URL-based delivery** — images are returned as temporary FAL.ai URLs, not saved locally
|
||||
- **Upscaling adds latency** — the automatic 2x upscale step adds processing time
|
||||
- **Max 4 images per request** — `num_images` is capped at 4
|
||||
411
hermes_code/website/docs/user-guide/features/mcp.md
Normal file
411
hermes_code/website/docs/user-guide/features/mcp.md
Normal file
|
|
@ -0,0 +1,411 @@
|
|||
---
|
||||
sidebar_position: 4
|
||||
title: "MCP (Model Context Protocol)"
|
||||
description: "Connect Hermes Agent to external tool servers via MCP — and control exactly which MCP tools Hermes loads"
|
||||
---
|
||||
|
||||
# MCP (Model Context Protocol)
|
||||
|
||||
MCP lets Hermes Agent connect to external tool servers so the agent can use tools that live outside Hermes itself — GitHub, databases, file systems, browser stacks, internal APIs, and more.
|
||||
|
||||
If you have ever wanted Hermes to use a tool that already exists somewhere else, MCP is usually the cleanest way to do it.
|
||||
|
||||
## What MCP gives you
|
||||
|
||||
- Access to external tool ecosystems without writing a native Hermes tool first
|
||||
- Local stdio servers and remote HTTP MCP servers in the same config
|
||||
- Automatic tool discovery and registration at startup
|
||||
- Utility wrappers for MCP resources and prompts when supported by the server
|
||||
- Per-server filtering so you can expose only the MCP tools you actually want Hermes to see
|
||||
|
||||
## Quick start
|
||||
|
||||
1. Install MCP support (already included if you used the standard install script):
|
||||
|
||||
```bash
|
||||
cd ~/.hermes/hermes-agent
|
||||
uv pip install -e ".[mcp]"
|
||||
```
|
||||
|
||||
2. Add an MCP server to `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
filesystem:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/projects"]
|
||||
```
|
||||
|
||||
3. Start Hermes:
|
||||
|
||||
```bash
|
||||
hermes chat
|
||||
```
|
||||
|
||||
4. Ask Hermes to use the MCP-backed capability.
|
||||
|
||||
For example:
|
||||
|
||||
```text
|
||||
List the files in /home/user/projects and summarize the repo structure.
|
||||
```
|
||||
|
||||
Hermes will discover the MCP server's tools and use them like any other tool.
|
||||
|
||||
## Two kinds of MCP servers
|
||||
|
||||
### Stdio servers
|
||||
|
||||
Stdio servers run as local subprocesses and talk over stdin/stdout.
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "***"
|
||||
```
|
||||
|
||||
Use stdio servers when:
|
||||
- the server is installed locally
|
||||
- you want low-latency access to local resources
|
||||
- you are following MCP server docs that show `command`, `args`, and `env`
|
||||
|
||||
### HTTP servers
|
||||
|
||||
HTTP MCP servers are remote endpoints Hermes connects to directly.
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
remote_api:
|
||||
url: "https://mcp.example.com/mcp"
|
||||
headers:
|
||||
Authorization: "Bearer ***"
|
||||
```
|
||||
|
||||
Use HTTP servers when:
|
||||
- the MCP server is hosted elsewhere
|
||||
- your organization exposes internal MCP endpoints
|
||||
- you do not want Hermes spawning a local subprocess for that integration
|
||||
|
||||
## Basic configuration reference
|
||||
|
||||
Hermes reads MCP config from `~/.hermes/config.yaml` under `mcp_servers`.
|
||||
|
||||
### Common keys
|
||||
|
||||
| Key | Type | Meaning |
|
||||
|---|---|---|
|
||||
| `command` | string | Executable for a stdio MCP server |
|
||||
| `args` | list | Arguments for the stdio server |
|
||||
| `env` | mapping | Environment variables passed to the stdio server |
|
||||
| `url` | string | HTTP MCP endpoint |
|
||||
| `headers` | mapping | HTTP headers for remote servers |
|
||||
| `timeout` | number | Tool call timeout |
|
||||
| `connect_timeout` | number | Initial connection timeout |
|
||||
| `enabled` | bool | If `false`, Hermes skips the server entirely |
|
||||
| `tools` | mapping | Per-server tool filtering and utility policy |
|
||||
|
||||
### Minimal stdio example
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
filesystem:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
|
||||
```
|
||||
|
||||
### Minimal HTTP example
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
company_api:
|
||||
url: "https://mcp.internal.example.com"
|
||||
headers:
|
||||
Authorization: "Bearer ***"
|
||||
```
|
||||
|
||||
## How Hermes registers MCP tools
|
||||
|
||||
Hermes prefixes MCP tools so they do not collide with built-in names:
|
||||
|
||||
```text
|
||||
mcp_<server_name>_<tool_name>
|
||||
```
|
||||
|
||||
Examples:
|
||||
|
||||
| Server | MCP tool | Registered name |
|
||||
|---|---|---|
|
||||
| `filesystem` | `read_file` | `mcp_filesystem_read_file` |
|
||||
| `github` | `create-issue` | `mcp_github_create_issue` |
|
||||
| `my-api` | `query.data` | `mcp_my_api_query_data` |
|
||||
|
||||
In practice, you usually do not need to call the prefixed name manually — Hermes sees the tool and chooses it during normal reasoning.
|
||||
|
||||
## MCP utility tools
|
||||
|
||||
When supported, Hermes also registers utility tools around MCP resources and prompts:
|
||||
|
||||
- `list_resources`
|
||||
- `read_resource`
|
||||
- `list_prompts`
|
||||
- `get_prompt`
|
||||
|
||||
These are registered per server with the same prefix pattern, for example:
|
||||
|
||||
- `mcp_github_list_resources`
|
||||
- `mcp_github_get_prompt`
|
||||
|
||||
### Important
|
||||
|
||||
These utility tools are now capability-aware:
|
||||
- Hermes only registers resource utilities if the MCP session actually supports resource operations
|
||||
- Hermes only registers prompt utilities if the MCP session actually supports prompt operations
|
||||
|
||||
So a server that exposes callable tools but no resources/prompts will not get those extra wrappers.
|
||||
|
||||
## Per-server filtering
|
||||
|
||||
This is the main feature added by the PR work.
|
||||
|
||||
You can now control which tools each MCP server contributes to Hermes.
|
||||
|
||||
### Disable a server entirely
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
legacy:
|
||||
url: "https://mcp.legacy.internal"
|
||||
enabled: false
|
||||
```
|
||||
|
||||
If `enabled: false`, Hermes skips the server completely and does not even attempt a connection.
|
||||
|
||||
### Whitelist server tools
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "***"
|
||||
tools:
|
||||
include: [create_issue, list_issues]
|
||||
```
|
||||
|
||||
Only those MCP server tools are registered.
|
||||
|
||||
### Blacklist server tools
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
stripe:
|
||||
url: "https://mcp.stripe.com"
|
||||
tools:
|
||||
exclude: [delete_customer]
|
||||
```
|
||||
|
||||
All server tools are registered except the excluded ones.
|
||||
|
||||
### Precedence rule
|
||||
|
||||
If both are present:
|
||||
|
||||
```yaml
|
||||
tools:
|
||||
include: [create_issue]
|
||||
exclude: [create_issue, delete_issue]
|
||||
```
|
||||
|
||||
`include` wins.
|
||||
|
||||
### Filter utility tools too
|
||||
|
||||
You can also separately disable Hermes-added utility wrappers:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
docs:
|
||||
url: "https://mcp.docs.example.com"
|
||||
tools:
|
||||
prompts: false
|
||||
resources: false
|
||||
```
|
||||
|
||||
That means:
|
||||
- `tools.resources: false` disables `list_resources` and `read_resource`
|
||||
- `tools.prompts: false` disables `list_prompts` and `get_prompt`
|
||||
|
||||
### Full example
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "***"
|
||||
tools:
|
||||
include: [create_issue, list_issues, search_code]
|
||||
prompts: false
|
||||
|
||||
stripe:
|
||||
url: "https://mcp.stripe.com"
|
||||
headers:
|
||||
Authorization: "Bearer ***"
|
||||
tools:
|
||||
exclude: [delete_customer]
|
||||
resources: false
|
||||
|
||||
legacy:
|
||||
url: "https://mcp.legacy.internal"
|
||||
enabled: false
|
||||
```
|
||||
|
||||
## What happens if everything is filtered out?
|
||||
|
||||
If your config filters out all callable tools and disables or omits all supported utilities, Hermes does not create an empty runtime MCP toolset for that server.
|
||||
|
||||
That keeps the tool list clean.
|
||||
|
||||
## Runtime behavior
|
||||
|
||||
### Discovery time
|
||||
|
||||
Hermes discovers MCP servers at startup and registers their tools into the normal tool registry.
|
||||
|
||||
### Reloading
|
||||
|
||||
If you change MCP config, use:
|
||||
|
||||
```text
|
||||
/reload-mcp
|
||||
```
|
||||
|
||||
This reloads MCP servers from config and refreshes the available tool list.
|
||||
|
||||
### Toolsets
|
||||
|
||||
Each configured MCP server also creates a runtime toolset when it contributes at least one registered tool:
|
||||
|
||||
```text
|
||||
mcp-<server>
|
||||
```
|
||||
|
||||
That makes MCP servers easier to reason about at the toolset level.
|
||||
|
||||
## Security model
|
||||
|
||||
### Stdio env filtering
|
||||
|
||||
For stdio servers, Hermes does not blindly pass your full shell environment.
|
||||
|
||||
Only explicitly configured `env` plus a safe baseline are passed through. This reduces accidental secret leakage.
|
||||
|
||||
### Config-level exposure control
|
||||
|
||||
The new filtering support is also a security control:
|
||||
- disable dangerous tools you do not want the model to see
|
||||
- expose only a minimal whitelist for a sensitive server
|
||||
- disable resource/prompt wrappers when you do not want that surface exposed
|
||||
|
||||
## Example use cases
|
||||
|
||||
### GitHub server with a minimal issue-management surface
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "***"
|
||||
tools:
|
||||
include: [list_issues, create_issue, update_issue]
|
||||
prompts: false
|
||||
resources: false
|
||||
```
|
||||
|
||||
Use it like:
|
||||
|
||||
```text
|
||||
Show me open issues labeled bug, then draft a new issue for the flaky MCP reconnection behavior.
|
||||
```
|
||||
|
||||
### Stripe server with dangerous actions removed
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
stripe:
|
||||
url: "https://mcp.stripe.com"
|
||||
headers:
|
||||
Authorization: "Bearer ***"
|
||||
tools:
|
||||
exclude: [delete_customer, refund_payment]
|
||||
```
|
||||
|
||||
Use it like:
|
||||
|
||||
```text
|
||||
Look up the last 10 failed payments and summarize common failure reasons.
|
||||
```
|
||||
|
||||
### Filesystem server for a single project root
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
project_fs:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user/my-project"]
|
||||
```
|
||||
|
||||
Use it like:
|
||||
|
||||
```text
|
||||
Inspect the project root and explain the directory layout.
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### MCP server not connecting
|
||||
|
||||
Check:
|
||||
|
||||
```bash
|
||||
# Verify MCP deps are installed (already included in standard install)
|
||||
cd ~/.hermes/hermes-agent && uv pip install -e ".[mcp]"
|
||||
|
||||
node --version
|
||||
npx --version
|
||||
```
|
||||
|
||||
Then verify your config and restart Hermes.
|
||||
|
||||
### Tools not appearing
|
||||
|
||||
Possible causes:
|
||||
- the server failed to connect
|
||||
- discovery failed
|
||||
- your filter config excluded the tools
|
||||
- the utility capability does not exist on that server
|
||||
- the server is disabled with `enabled: false`
|
||||
|
||||
If you are intentionally filtering, this is expected.
|
||||
|
||||
### Why didn't resource or prompt utilities appear?
|
||||
|
||||
Because Hermes now only registers those wrappers when both are true:
|
||||
1. your config allows them
|
||||
2. the server session actually supports the capability
|
||||
|
||||
This is intentional and keeps the tool list honest.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Use MCP with Hermes](/docs/guides/use-mcp-with-hermes)
|
||||
- [CLI Commands](/docs/reference/cli-commands)
|
||||
- [Slash Commands](/docs/reference/slash-commands)
|
||||
- [FAQ](/docs/reference/faq)
|
||||
218
hermes_code/website/docs/user-guide/features/memory.md
Normal file
218
hermes_code/website/docs/user-guide/features/memory.md
Normal file
|
|
@ -0,0 +1,218 @@
|
|||
---
|
||||
sidebar_position: 3
|
||||
title: "Persistent Memory"
|
||||
description: "How Hermes Agent remembers across sessions — MEMORY.md, USER.md, and session search"
|
||||
---
|
||||
|
||||
# Persistent Memory
|
||||
|
||||
Hermes Agent has bounded, curated memory that persists across sessions. This lets it remember your preferences, your projects, your environment, and things it has learned.
|
||||
|
||||
## How It Works
|
||||
|
||||
Two files make up the agent's memory:
|
||||
|
||||
| File | Purpose | Char Limit |
|
||||
|------|---------|------------|
|
||||
| **MEMORY.md** | Agent's personal notes — environment facts, conventions, things learned | 2,200 chars (~800 tokens) |
|
||||
| **USER.md** | User profile — your preferences, communication style, expectations | 1,375 chars (~500 tokens) |
|
||||
|
||||
Both are stored in `~/.hermes/memories/` and are injected into the system prompt as a frozen snapshot at session start. The agent manages its own memory via the `memory` tool — it can add, replace, or remove entries.
|
||||
|
||||
:::info
|
||||
Character limits keep memory focused. When memory is full, the agent consolidates or replaces entries to make room for new information.
|
||||
:::
|
||||
|
||||
## How Memory Appears in the System Prompt
|
||||
|
||||
At the start of every session, memory entries are loaded from disk and rendered into the system prompt as a frozen block:
|
||||
|
||||
```
|
||||
══════════════════════════════════════════════
|
||||
MEMORY (your personal notes) [67% — 1,474/2,200 chars]
|
||||
══════════════════════════════════════════════
|
||||
User's project is a Rust web service at ~/code/myapi using Axum + SQLx
|
||||
§
|
||||
This machine runs Ubuntu 22.04, has Docker and Podman installed
|
||||
§
|
||||
User prefers concise responses, dislikes verbose explanations
|
||||
```
|
||||
|
||||
The format includes:
|
||||
- A header showing which store (MEMORY or USER PROFILE)
|
||||
- Usage percentage and character counts so the agent knows capacity
|
||||
- Individual entries separated by `§` (section sign) delimiters
|
||||
- Entries can be multiline
|
||||
|
||||
**Frozen snapshot pattern:** The system prompt injection is captured once at session start and never changes mid-session. This is intentional — it preserves the LLM's prefix cache for performance. When the agent adds/removes memory entries during a session, the changes are persisted to disk immediately but won't appear in the system prompt until the next session starts. Tool responses always show the live state.
|
||||
|
||||
## Memory Tool Actions
|
||||
|
||||
The agent uses the `memory` tool with these actions:
|
||||
|
||||
- **add** — Add a new memory entry
|
||||
- **replace** — Replace an existing entry with updated content (uses substring matching via `old_text`)
|
||||
- **remove** — Remove an entry that's no longer relevant (uses substring matching via `old_text`)
|
||||
|
||||
There is no `read` action — memory content is automatically injected into the system prompt at session start. The agent sees its memories as part of its conversation context.
|
||||
|
||||
### Substring Matching
|
||||
|
||||
The `replace` and `remove` actions use short unique substring matching — you don't need the full entry text. The `old_text` parameter just needs to be a unique substring that identifies exactly one entry:
|
||||
|
||||
```python
|
||||
# If memory contains "User prefers dark mode in all editors"
|
||||
memory(action="replace", target="memory",
|
||||
old_text="dark mode",
|
||||
content="User prefers light mode in VS Code, dark mode in terminal")
|
||||
```
|
||||
|
||||
If the substring matches multiple entries, an error is returned asking for a more specific match.
|
||||
|
||||
## Two Targets Explained
|
||||
|
||||
### `memory` — Agent's Personal Notes
|
||||
|
||||
For information the agent needs to remember about the environment, workflows, and lessons learned:
|
||||
|
||||
- Environment facts (OS, tools, project structure)
|
||||
- Project conventions and configuration
|
||||
- Tool quirks and workarounds discovered
|
||||
- Completed task diary entries
|
||||
- Skills and techniques that worked
|
||||
|
||||
### `user` — User Profile
|
||||
|
||||
For information about the user's identity, preferences, and communication style:
|
||||
|
||||
- Name, role, timezone
|
||||
- Communication preferences (concise vs detailed, format preferences)
|
||||
- Pet peeves and things to avoid
|
||||
- Workflow habits
|
||||
- Technical skill level
|
||||
|
||||
## What to Save vs Skip
|
||||
|
||||
### Save These (Proactively)
|
||||
|
||||
The agent saves automatically — you don't need to ask. It saves when it learns:
|
||||
|
||||
- **User preferences:** "I prefer TypeScript over JavaScript" → save to `user`
|
||||
- **Environment facts:** "This server runs Debian 12 with PostgreSQL 16" → save to `memory`
|
||||
- **Corrections:** "Don't use `sudo` for Docker commands, user is in docker group" → save to `memory`
|
||||
- **Conventions:** "Project uses tabs, 120-char line width, Google-style docstrings" → save to `memory`
|
||||
- **Completed work:** "Migrated database from MySQL to PostgreSQL on 2026-01-15" → save to `memory`
|
||||
- **Explicit requests:** "Remember that my API key rotation happens monthly" → save to `memory`
|
||||
|
||||
### Skip These
|
||||
|
||||
- **Trivial/obvious info:** "User asked about Python" — too vague to be useful
|
||||
- **Easily re-discovered facts:** "Python 3.12 supports f-string nesting" — can web search this
|
||||
- **Raw data dumps:** Large code blocks, log files, data tables — too big for memory
|
||||
- **Session-specific ephemera:** Temporary file paths, one-off debugging context
|
||||
- **Information already in context files:** SOUL.md and AGENTS.md content
|
||||
|
||||
## Capacity Management
|
||||
|
||||
Memory has strict character limits to keep system prompts bounded:
|
||||
|
||||
| Store | Limit | Typical entries |
|
||||
|-------|-------|----------------|
|
||||
| memory | 2,200 chars | 8-15 entries |
|
||||
| user | 1,375 chars | 5-10 entries |
|
||||
|
||||
### What Happens When Memory is Full
|
||||
|
||||
When you try to add an entry that would exceed the limit, the tool returns an error:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": "Memory at 2,100/2,200 chars. Adding this entry (250 chars) would exceed the limit. Replace or remove existing entries first.",
|
||||
"current_entries": ["..."],
|
||||
"usage": "2,100/2,200"
|
||||
}
|
||||
```
|
||||
|
||||
The agent should then:
|
||||
1. Read the current entries (shown in the error response)
|
||||
2. Identify entries that can be removed or consolidated
|
||||
3. Use `replace` to merge related entries into shorter versions
|
||||
4. Then `add` the new entry
|
||||
|
||||
**Best practice:** When memory is above 80% capacity (visible in the system prompt header), consolidate entries before adding new ones. For example, merge three separate "project uses X" entries into one comprehensive project description entry.
|
||||
|
||||
### Practical Examples of Good Memory Entries
|
||||
|
||||
**Compact, information-dense entries work best:**
|
||||
|
||||
```
|
||||
# Good: Packs multiple related facts
|
||||
User runs macOS 14 Sonoma, uses Homebrew, has Docker Desktop and Podman. Shell: zsh with oh-my-zsh. Editor: VS Code with Vim keybindings.
|
||||
|
||||
# Good: Specific, actionable convention
|
||||
Project ~/code/api uses Go 1.22, sqlc for DB queries, chi router. Run tests with 'make test'. CI via GitHub Actions.
|
||||
|
||||
# Good: Lesson learned with context
|
||||
The staging server (10.0.1.50) needs SSH port 2222, not 22. Key is at ~/.ssh/staging_ed25519.
|
||||
|
||||
# Bad: Too vague
|
||||
User has a project.
|
||||
|
||||
# Bad: Too verbose
|
||||
On January 5th, 2026, the user asked me to look at their project which is
|
||||
located at ~/code/api. I discovered it uses Go version 1.22 and...
|
||||
```
|
||||
|
||||
## Duplicate Prevention
|
||||
|
||||
The memory system automatically rejects exact duplicate entries. If you try to add content that already exists, it returns success with a "no duplicate added" message.
|
||||
|
||||
## Security Scanning
|
||||
|
||||
Memory entries are scanned for injection and exfiltration patterns before being accepted, since they're injected into the system prompt. Content matching threat patterns (prompt injection, credential exfiltration, SSH backdoors) or containing invisible Unicode characters is blocked.
|
||||
|
||||
## Session Search
|
||||
|
||||
Beyond MEMORY.md and USER.md, the agent can search its past conversations using the `session_search` tool:
|
||||
|
||||
- All CLI and messaging sessions are stored in SQLite (`~/.hermes/state.db`) with FTS5 full-text search
|
||||
- Search queries return relevant past conversations with Gemini Flash summarization
|
||||
- The agent can find things it discussed weeks ago, even if they're not in its active memory
|
||||
|
||||
```bash
|
||||
hermes sessions list # Browse past sessions
|
||||
```
|
||||
|
||||
### session_search vs memory
|
||||
|
||||
| Feature | Persistent Memory | Session Search |
|
||||
|---------|------------------|----------------|
|
||||
| **Capacity** | ~1,300 tokens total | Unlimited (all sessions) |
|
||||
| **Speed** | Instant (in system prompt) | Requires search + LLM summarization |
|
||||
| **Use case** | Key facts always available | Finding specific past conversations |
|
||||
| **Management** | Manually curated by agent | Automatic — all sessions stored |
|
||||
| **Token cost** | Fixed per session (~1,300 tokens) | On-demand (searched when needed) |
|
||||
|
||||
**Memory** is for critical facts that should always be in context. **Session search** is for "did we discuss X last week?" queries where the agent needs to recall specifics from past conversations.
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
memory:
|
||||
memory_enabled: true
|
||||
user_profile_enabled: true
|
||||
memory_char_limit: 2200 # ~800 tokens
|
||||
user_char_limit: 1375 # ~500 tokens
|
||||
```
|
||||
|
||||
## Honcho Integration (Cross-Session User Modeling)
|
||||
|
||||
For deeper, AI-generated user understanding that works across sessions and platforms, you can enable [Honcho Memory](./honcho.md). Honcho runs alongside built-in memory in `hybrid` mode (the default) — `MEMORY.md` and `USER.md` stay as-is, and Honcho adds a persistent user modeling layer on top.
|
||||
|
||||
```bash
|
||||
hermes honcho setup
|
||||
```
|
||||
|
||||
See the [Honcho Memory](./honcho.md) docs for full configuration, tools, and CLI reference.
|
||||
271
hermes_code/website/docs/user-guide/features/personality.md
Normal file
271
hermes_code/website/docs/user-guide/features/personality.md
Normal file
|
|
@ -0,0 +1,271 @@
|
|||
---
|
||||
sidebar_position: 9
|
||||
title: "Personality & SOUL.md"
|
||||
description: "Customize Hermes Agent's personality with a global SOUL.md, built-in personalities, and custom persona definitions"
|
||||
---
|
||||
|
||||
# Personality & SOUL.md
|
||||
|
||||
Hermes Agent's personality is fully customizable. `SOUL.md` is the **primary identity** — it's the first thing in the system prompt and defines who the agent is.
|
||||
|
||||
- `SOUL.md` — a durable persona file that lives in `HERMES_HOME` and serves as the agent's identity (slot #1 in the system prompt)
|
||||
- built-in or custom `/personality` presets — session-level system-prompt overlays
|
||||
|
||||
If you want to change who Hermes is — or replace it with an entirely different agent persona — edit `SOUL.md`.
|
||||
|
||||
## How SOUL.md works now
|
||||
|
||||
Hermes now seeds a default `SOUL.md` automatically in:
|
||||
|
||||
```text
|
||||
~/.hermes/SOUL.md
|
||||
```
|
||||
|
||||
More precisely, it uses the current instance's `HERMES_HOME`, so if you run Hermes with a custom home directory, it will use:
|
||||
|
||||
```text
|
||||
$HERMES_HOME/SOUL.md
|
||||
```
|
||||
|
||||
### Important behavior
|
||||
|
||||
- **SOUL.md is the agent's primary identity.** It occupies slot #1 in the system prompt, replacing the hardcoded default identity.
|
||||
- Hermes creates a starter `SOUL.md` automatically if one does not exist yet
|
||||
- Existing user `SOUL.md` files are never overwritten
|
||||
- Hermes loads `SOUL.md` only from `HERMES_HOME`
|
||||
- Hermes does not look in the current working directory for `SOUL.md`
|
||||
- If `SOUL.md` exists but is empty, or cannot be loaded, Hermes falls back to a built-in default identity
|
||||
- If `SOUL.md` has content, that content is injected verbatim after security scanning and truncation
|
||||
- SOUL.md is **not** duplicated in the context files section — it appears only once, as the identity
|
||||
|
||||
That makes `SOUL.md` a true per-user or per-instance identity, not just an additive layer.
|
||||
|
||||
## Why this design
|
||||
|
||||
This keeps personality predictable.
|
||||
|
||||
If Hermes loaded `SOUL.md` from whatever directory you happened to launch it in, your personality could change unexpectedly between projects. By loading only from `HERMES_HOME`, the personality belongs to the Hermes instance itself.
|
||||
|
||||
That also makes it easier to teach users:
|
||||
- "Edit `~/.hermes/SOUL.md` to change Hermes' default personality."
|
||||
|
||||
## Where to edit it
|
||||
|
||||
For most users:
|
||||
|
||||
```bash
|
||||
~/.hermes/SOUL.md
|
||||
```
|
||||
|
||||
If you use a custom home:
|
||||
|
||||
```bash
|
||||
$HERMES_HOME/SOUL.md
|
||||
```
|
||||
|
||||
## What should go in SOUL.md?
|
||||
|
||||
Use it for durable voice and personality guidance, such as:
|
||||
- tone
|
||||
- communication style
|
||||
- level of directness
|
||||
- default interaction style
|
||||
- what to avoid stylistically
|
||||
- how Hermes should handle uncertainty, disagreement, or ambiguity
|
||||
|
||||
Use it less for:
|
||||
- one-off project instructions
|
||||
- file paths
|
||||
- repo conventions
|
||||
- temporary workflow details
|
||||
|
||||
Those belong in `AGENTS.md`, not `SOUL.md`.
|
||||
|
||||
## Good SOUL.md content
|
||||
|
||||
A good SOUL file is:
|
||||
- stable across contexts
|
||||
- broad enough to apply in many conversations
|
||||
- specific enough to materially shape the voice
|
||||
- focused on communication and identity, not task-specific instructions
|
||||
|
||||
### Example
|
||||
|
||||
```markdown
|
||||
# Personality
|
||||
|
||||
You are a pragmatic senior engineer with strong taste.
|
||||
You optimize for truth, clarity, and usefulness over politeness theater.
|
||||
|
||||
## Style
|
||||
- Be direct without being cold
|
||||
- Prefer substance over filler
|
||||
- Push back when something is a bad idea
|
||||
- Admit uncertainty plainly
|
||||
- Keep explanations compact unless depth is useful
|
||||
|
||||
## What to avoid
|
||||
- Sycophancy
|
||||
- Hype language
|
||||
- Repeating the user's framing if it's wrong
|
||||
- Overexplaining obvious things
|
||||
|
||||
## Technical posture
|
||||
- Prefer simple systems over clever systems
|
||||
- Care about operational reality, not idealized architecture
|
||||
- Treat edge cases as part of the design, not cleanup
|
||||
```
|
||||
|
||||
## What Hermes injects into the prompt
|
||||
|
||||
`SOUL.md` content goes directly into slot #1 of the system prompt — the agent identity position. No wrapper language is added around it.
|
||||
|
||||
The content goes through:
|
||||
- prompt-injection scanning
|
||||
- truncation if it is too large
|
||||
|
||||
If the file is empty, whitespace-only, or cannot be read, Hermes falls back to a built-in default identity ("You are Hermes Agent, an intelligent AI assistant created by Nous Research..."). This fallback also applies when `skip_context_files` is set (e.g., in subagent/delegation contexts).
|
||||
|
||||
## Security scanning
|
||||
|
||||
`SOUL.md` is scanned like other context-bearing files for prompt injection patterns before inclusion.
|
||||
|
||||
That means you should still keep it focused on persona/voice rather than trying to sneak in strange meta-instructions.
|
||||
|
||||
## SOUL.md vs AGENTS.md
|
||||
|
||||
This is the most important distinction.
|
||||
|
||||
### SOUL.md
|
||||
Use for:
|
||||
- identity
|
||||
- tone
|
||||
- style
|
||||
- communication defaults
|
||||
- personality-level behavior
|
||||
|
||||
### AGENTS.md
|
||||
Use for:
|
||||
- project architecture
|
||||
- coding conventions
|
||||
- tool preferences
|
||||
- repo-specific workflows
|
||||
- commands, ports, paths, deployment notes
|
||||
|
||||
A useful rule:
|
||||
- if it should follow you everywhere, it belongs in `SOUL.md`
|
||||
- if it belongs to a project, it belongs in `AGENTS.md`
|
||||
|
||||
## SOUL.md vs `/personality`
|
||||
|
||||
`SOUL.md` is your durable default personality.
|
||||
|
||||
`/personality` is a session-level overlay that changes or supplements the current system prompt.
|
||||
|
||||
So:
|
||||
- `SOUL.md` = baseline voice
|
||||
- `/personality` = temporary mode switch
|
||||
|
||||
Examples:
|
||||
- keep a pragmatic default SOUL, then use `/personality teacher` for a tutoring conversation
|
||||
- keep a concise SOUL, then use `/personality creative` for brainstorming
|
||||
|
||||
## Built-in personalities
|
||||
|
||||
Hermes ships with built-in personalities you can switch to with `/personality`.
|
||||
|
||||
| Name | Description |
|
||||
|------|-------------|
|
||||
| **helpful** | Friendly, general-purpose assistant |
|
||||
| **concise** | Brief, to-the-point responses |
|
||||
| **technical** | Detailed, accurate technical expert |
|
||||
| **creative** | Innovative, outside-the-box thinking |
|
||||
| **teacher** | Patient educator with clear examples |
|
||||
| **kawaii** | Cute expressions, sparkles, and enthusiasm ★ |
|
||||
| **catgirl** | Neko-chan with cat-like expressions, nya~ |
|
||||
| **pirate** | Captain Hermes, tech-savvy buccaneer |
|
||||
| **shakespeare** | Bardic prose with dramatic flair |
|
||||
| **surfer** | Totally chill bro vibes |
|
||||
| **noir** | Hard-boiled detective narration |
|
||||
| **uwu** | Maximum cute with uwu-speak |
|
||||
| **philosopher** | Deep contemplation on every query |
|
||||
| **hype** | MAXIMUM ENERGY AND ENTHUSIASM!!! |
|
||||
|
||||
## Switching personalities with commands
|
||||
|
||||
### CLI
|
||||
|
||||
```text
|
||||
/personality
|
||||
/personality concise
|
||||
/personality technical
|
||||
```
|
||||
|
||||
### Messaging platforms
|
||||
|
||||
```text
|
||||
/personality teacher
|
||||
```
|
||||
|
||||
These are convenient overlays, but your global `SOUL.md` still gives Hermes its persistent default personality unless the overlay meaningfully changes it.
|
||||
|
||||
## Custom personalities in config
|
||||
|
||||
You can also define named custom personalities in `~/.hermes/config.yaml` under `agent.personalities`.
|
||||
|
||||
```yaml
|
||||
agent:
|
||||
personalities:
|
||||
codereviewer: >
|
||||
You are a meticulous code reviewer. Identify bugs, security issues,
|
||||
performance concerns, and unclear design choices. Be precise and constructive.
|
||||
```
|
||||
|
||||
Then switch to it with:
|
||||
|
||||
```text
|
||||
/personality codereviewer
|
||||
```
|
||||
|
||||
## Recommended workflow
|
||||
|
||||
A strong default setup is:
|
||||
|
||||
1. Keep a thoughtful global `SOUL.md` in `~/.hermes/SOUL.md`
|
||||
2. Put project instructions in `AGENTS.md`
|
||||
3. Use `/personality` only when you want a temporary mode shift
|
||||
|
||||
That gives you:
|
||||
- a stable voice
|
||||
- project-specific behavior where it belongs
|
||||
- temporary control when needed
|
||||
|
||||
## How personality interacts with the full prompt
|
||||
|
||||
At a high level, the prompt stack includes:
|
||||
1. **SOUL.md** (agent identity — or built-in fallback if SOUL.md is unavailable)
|
||||
2. tool-aware behavior guidance
|
||||
3. memory/user context
|
||||
4. skills guidance
|
||||
5. context files (`AGENTS.md`, `.cursorrules`)
|
||||
6. timestamp
|
||||
7. platform-specific formatting hints
|
||||
8. optional system-prompt overlays such as `/personality`
|
||||
|
||||
`SOUL.md` is the foundation — everything else builds on top of it.
|
||||
|
||||
## Related docs
|
||||
|
||||
- [Context Files](/docs/user-guide/features/context-files)
|
||||
- [Configuration](/docs/user-guide/configuration)
|
||||
- [Tips & Best Practices](/docs/guides/tips)
|
||||
- [SOUL.md Guide](/docs/guides/use-soul-with-hermes)
|
||||
|
||||
## CLI appearance vs conversational personality
|
||||
|
||||
Conversational personality and CLI appearance are separate:
|
||||
|
||||
- `SOUL.md`, `agent.system_prompt`, and `/personality` affect how Hermes speaks
|
||||
- `display.skin` and `/skin` affect how Hermes looks in the terminal
|
||||
|
||||
For terminal appearance, see [Skins & Themes](./skins.md).
|
||||
92
hermes_code/website/docs/user-guide/features/plugins.md
Normal file
92
hermes_code/website/docs/user-guide/features/plugins.md
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
---
|
||||
sidebar_position: 20
|
||||
---
|
||||
|
||||
# Plugins
|
||||
|
||||
Hermes has a plugin system for adding custom tools, hooks, slash commands, and integrations without modifying core code.
|
||||
|
||||
**→ [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin)** — step-by-step guide with a complete working example.
|
||||
|
||||
## Quick overview
|
||||
|
||||
Drop a directory into `~/.hermes/plugins/` with a `plugin.yaml` and Python code:
|
||||
|
||||
```
|
||||
~/.hermes/plugins/my-plugin/
|
||||
├── plugin.yaml # manifest
|
||||
├── __init__.py # register() — wires schemas to handlers
|
||||
├── schemas.py # tool schemas (what the LLM sees)
|
||||
└── tools.py # tool handlers (what runs when called)
|
||||
```
|
||||
|
||||
Start Hermes — your tools appear alongside built-in tools. The model can call them immediately.
|
||||
|
||||
Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable them only for trusted repositories by setting `HERMES_ENABLE_PROJECT_PLUGINS=true` before starting Hermes.
|
||||
|
||||
## What plugins can do
|
||||
|
||||
| Capability | How |
|
||||
|-----------|-----|
|
||||
| Add tools | `ctx.register_tool(name, schema, handler)` |
|
||||
| Add hooks | `ctx.register_hook("post_tool_call", callback)` |
|
||||
| Add slash commands | `ctx.register_command("mycommand", handler)` |
|
||||
| Ship data files | `Path(__file__).parent / "data" / "file.yaml"` |
|
||||
| Bundle skills | Copy `skill.md` to `~/.hermes/skills/` at load time |
|
||||
| Gate on env vars | `requires_env: [API_KEY]` in plugin.yaml |
|
||||
| Distribute via pip | `[project.entry-points."hermes_agent.plugins"]` |
|
||||
|
||||
## Plugin discovery
|
||||
|
||||
| Source | Path | Use case |
|
||||
|--------|------|----------|
|
||||
| User | `~/.hermes/plugins/` | Personal plugins |
|
||||
| Project | `.hermes/plugins/` | Project-specific plugins (requires `HERMES_ENABLE_PROJECT_PLUGINS=true`) |
|
||||
| pip | `hermes_agent.plugins` entry_points | Distributed packages |
|
||||
|
||||
## Available hooks
|
||||
|
||||
| Hook | Fires when |
|
||||
|------|-----------|
|
||||
| `pre_tool_call` | Before any tool executes |
|
||||
| `post_tool_call` | After any tool returns |
|
||||
| `pre_llm_call` | Before LLM API request |
|
||||
| `post_llm_call` | After LLM API response |
|
||||
| `on_session_start` | Session begins |
|
||||
| `on_session_end` | Session ends |
|
||||
|
||||
## Slash commands
|
||||
|
||||
Plugins can register slash commands that work in both CLI and messaging platforms:
|
||||
|
||||
```python
|
||||
def register(ctx):
|
||||
ctx.register_command(
|
||||
name="greet",
|
||||
handler=lambda args: f"Hello, {args or 'world'}!",
|
||||
description="Greet someone",
|
||||
args_hint="[name]",
|
||||
aliases=("hi",),
|
||||
)
|
||||
```
|
||||
|
||||
The handler receives the argument string (everything after `/greet`) and returns a string to display. Registered commands automatically appear in `/help`, tab autocomplete, Telegram bot menu, and Slack subcommand mapping.
|
||||
|
||||
| Parameter | Description |
|
||||
|-----------|-------------|
|
||||
| `name` | Command name without slash |
|
||||
| `handler` | Callable that takes `args: str` and returns `str | None` |
|
||||
| `description` | Shown in `/help` |
|
||||
| `args_hint` | Usage hint, e.g. `"[name]"` |
|
||||
| `aliases` | Tuple of alternative names |
|
||||
| `cli_only` | Only available in CLI |
|
||||
| `gateway_only` | Only available in messaging platforms |
|
||||
|
||||
## Managing plugins
|
||||
|
||||
```
|
||||
/plugins # list loaded plugins in a session
|
||||
hermes config set display.show_cost true # show cost in status bar
|
||||
```
|
||||
|
||||
See the **[full guide](/docs/guides/build-a-hermes-plugin)** for handler contracts, schema format, hook behavior, error handling, and common mistakes.
|
||||
200
hermes_code/website/docs/user-guide/features/provider-routing.md
Normal file
200
hermes_code/website/docs/user-guide/features/provider-routing.md
Normal file
|
|
@ -0,0 +1,200 @@
|
|||
---
|
||||
title: Provider Routing
|
||||
description: Configure OpenRouter provider preferences to optimize for cost, speed, or quality.
|
||||
sidebar_label: Provider Routing
|
||||
sidebar_position: 7
|
||||
---
|
||||
|
||||
# Provider Routing
|
||||
|
||||
When using [OpenRouter](https://openrouter.ai) as your LLM provider, Hermes Agent supports **provider routing** — fine-grained control over which underlying AI providers handle your requests and how they're prioritized.
|
||||
|
||||
OpenRouter routes requests to many providers (e.g., Anthropic, Google, AWS Bedrock, Together AI). Provider routing lets you optimize for cost, speed, quality, or enforce specific provider requirements.
|
||||
|
||||
## Configuration
|
||||
|
||||
Add a `provider_routing` section to your `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
sort: "price" # How to rank providers
|
||||
only: [] # Whitelist: only use these providers
|
||||
ignore: [] # Blacklist: never use these providers
|
||||
order: [] # Explicit provider priority order
|
||||
require_parameters: false # Only use providers that support all parameters
|
||||
data_collection: null # Control data collection ("allow" or "deny")
|
||||
```
|
||||
|
||||
:::info
|
||||
Provider routing only applies when using OpenRouter. It has no effect with direct provider connections (e.g., connecting directly to the Anthropic API).
|
||||
:::
|
||||
|
||||
## Options
|
||||
|
||||
### `sort`
|
||||
|
||||
Controls how OpenRouter ranks available providers for your request.
|
||||
|
||||
| Value | Description |
|
||||
|-------|-------------|
|
||||
| `"price"` | Cheapest provider first |
|
||||
| `"throughput"` | Fastest tokens-per-second first |
|
||||
| `"latency"` | Lowest time-to-first-token first |
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
sort: "price"
|
||||
```
|
||||
|
||||
### `only`
|
||||
|
||||
Whitelist of provider names. When set, **only** these providers will be used. All others are excluded.
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
only:
|
||||
- "Anthropic"
|
||||
- "Google"
|
||||
```
|
||||
|
||||
### `ignore`
|
||||
|
||||
Blacklist of provider names. These providers will **never** be used, even if they offer the cheapest or fastest option.
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
ignore:
|
||||
- "Together"
|
||||
- "DeepInfra"
|
||||
```
|
||||
|
||||
### `order`
|
||||
|
||||
Explicit priority order. Providers listed first are preferred. Unlisted providers are used as fallbacks.
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
order:
|
||||
- "Anthropic"
|
||||
- "Google"
|
||||
- "AWS Bedrock"
|
||||
```
|
||||
|
||||
### `require_parameters`
|
||||
|
||||
When `true`, OpenRouter will only route to providers that support **all** parameters in your request (like `temperature`, `top_p`, `tools`, etc.). This avoids silent parameter drops.
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
require_parameters: true
|
||||
```
|
||||
|
||||
### `data_collection`
|
||||
|
||||
Controls whether providers can use your prompts for training. Options are `"allow"` or `"deny"`.
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
data_collection: "deny"
|
||||
```
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Optimize for Cost
|
||||
|
||||
Route to the cheapest available provider. Good for high-volume usage and development:
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
sort: "price"
|
||||
```
|
||||
|
||||
### Optimize for Speed
|
||||
|
||||
Prioritize low-latency providers for interactive use:
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
sort: "latency"
|
||||
```
|
||||
|
||||
### Optimize for Throughput
|
||||
|
||||
Best for long-form generation where tokens-per-second matters:
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
sort: "throughput"
|
||||
```
|
||||
|
||||
### Lock to Specific Providers
|
||||
|
||||
Ensure all requests go through a specific provider for consistency:
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
only:
|
||||
- "Anthropic"
|
||||
```
|
||||
|
||||
### Avoid Specific Providers
|
||||
|
||||
Exclude providers you don't want to use (e.g., for data privacy):
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
ignore:
|
||||
- "Together"
|
||||
- "Lepton"
|
||||
data_collection: "deny"
|
||||
```
|
||||
|
||||
### Preferred Order with Fallbacks
|
||||
|
||||
Try your preferred providers first, fall back to others if unavailable:
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
order:
|
||||
- "Anthropic"
|
||||
- "Google"
|
||||
require_parameters: true
|
||||
```
|
||||
|
||||
## How It Works
|
||||
|
||||
Provider routing preferences are passed to the OpenRouter API via the `extra_body.provider` field on every API call. This applies to both:
|
||||
|
||||
- **CLI mode** — configured in `~/.hermes/config.yaml`, loaded at startup
|
||||
- **Gateway mode** — same config file, loaded when the gateway starts
|
||||
|
||||
The routing config is read from `config.yaml` and passed as parameters when creating the `AIAgent`:
|
||||
|
||||
```
|
||||
providers_allowed ← from provider_routing.only
|
||||
providers_ignored ← from provider_routing.ignore
|
||||
providers_order ← from provider_routing.order
|
||||
provider_sort ← from provider_routing.sort
|
||||
provider_require_parameters ← from provider_routing.require_parameters
|
||||
provider_data_collection ← from provider_routing.data_collection
|
||||
```
|
||||
|
||||
:::tip
|
||||
You can combine multiple options. For example, sort by price but exclude certain providers and require parameter support:
|
||||
|
||||
```yaml
|
||||
provider_routing:
|
||||
sort: "price"
|
||||
ignore: ["Together"]
|
||||
require_parameters: true
|
||||
data_collection: "deny"
|
||||
```
|
||||
:::
|
||||
|
||||
## Default Behavior
|
||||
|
||||
When no `provider_routing` section is configured (the default), OpenRouter uses its own default routing logic, which generally balances cost and availability automatically.
|
||||
|
||||
:::tip Provider Routing vs. Fallback Models
|
||||
Provider routing controls which **sub-providers within OpenRouter** handle your requests. For automatic failover to an entirely different provider when your primary model fails, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
|
||||
:::
|
||||
234
hermes_code/website/docs/user-guide/features/rl-training.md
Normal file
234
hermes_code/website/docs/user-guide/features/rl-training.md
Normal file
|
|
@ -0,0 +1,234 @@
|
|||
---
|
||||
sidebar_position: 13
|
||||
title: "RL Training"
|
||||
description: "Reinforcement learning on agent behaviors with Tinker-Atropos — environment discovery, training, and evaluation"
|
||||
---
|
||||
|
||||
# RL Training
|
||||
|
||||
Hermes Agent includes an integrated RL (Reinforcement Learning) training pipeline built on **Tinker-Atropos**. This enables training language models on environment-specific tasks using GRPO (Group Relative Policy Optimization) with LoRA adapters, orchestrated entirely through the agent's tool interface.
|
||||
|
||||
## Overview
|
||||
|
||||
The RL training system consists of three components:
|
||||
|
||||
1. **Atropos** — A trajectory API server that coordinates environment interactions, manages rollout groups, and computes advantages
|
||||
2. **Tinker** — A training service that handles model weights, LoRA training, sampling/inference, and optimizer steps
|
||||
3. **Environments** — Python classes that define tasks, scoring, and reward functions (e.g., GSM8K math problems)
|
||||
|
||||
The agent can discover environments, configure training parameters, launch training runs, and monitor metrics — all through a set of `rl_*` tools.
|
||||
|
||||
## Requirements
|
||||
|
||||
RL training requires:
|
||||
|
||||
- **Python >= 3.11** (Tinker package requirement)
|
||||
- **TINKER_API_KEY** — API key for the Tinker training service
|
||||
- **WANDB_API_KEY** — API key for Weights & Biases metrics tracking
|
||||
- The `tinker-atropos` submodule (at `tinker-atropos/` relative to the Hermes root)
|
||||
|
||||
```bash
|
||||
# Set up API keys
|
||||
hermes config set TINKER_API_KEY your-tinker-key
|
||||
hermes config set WANDB_API_KEY your-wandb-key
|
||||
```
|
||||
|
||||
When both keys are present and Python >= 3.11 is available, the `rl` toolset is automatically enabled.
|
||||
|
||||
## Available Tools
|
||||
|
||||
| Tool | Description |
|
||||
|------|-------------|
|
||||
| `rl_list_environments` | Discover available RL environments |
|
||||
| `rl_select_environment` | Select an environment and load its config |
|
||||
| `rl_get_current_config` | View configurable and locked fields |
|
||||
| `rl_edit_config` | Modify configurable training parameters |
|
||||
| `rl_start_training` | Launch a training run (spawns 3 processes) |
|
||||
| `rl_check_status` | Monitor training progress and WandB metrics |
|
||||
| `rl_stop_training` | Stop a running training job |
|
||||
| `rl_get_results` | Get final metrics and model weights path |
|
||||
| `rl_list_runs` | List all active and completed runs |
|
||||
| `rl_test_inference` | Quick inference test using OpenRouter |
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Discover Environments
|
||||
|
||||
```
|
||||
List the available RL environments
|
||||
```
|
||||
|
||||
The agent calls `rl_list_environments()` which scans `tinker-atropos/tinker_atropos/environments/` using AST parsing to find Python classes inheriting from `BaseEnv`. Each environment defines:
|
||||
|
||||
- **Dataset loading** — where training data comes from (e.g., HuggingFace datasets)
|
||||
- **Prompt construction** — how to format items for the model
|
||||
- **Scoring/verification** — how to evaluate model outputs and assign rewards
|
||||
|
||||
### 2. Select and Configure
|
||||
|
||||
```
|
||||
Select the GSM8K environment and show me the configuration
|
||||
```
|
||||
|
||||
The agent calls `rl_select_environment("gsm8k_tinker")`, then `rl_get_current_config()` to see all parameters.
|
||||
|
||||
Configuration fields are divided into two categories:
|
||||
|
||||
**Configurable fields** (can be modified):
|
||||
- `group_size` — Number of completions per item (default: 16)
|
||||
- `batch_size` — Training batch size (default: 128)
|
||||
- `wandb_name` — WandB run name (auto-set to `{env}-{timestamp}`)
|
||||
- Other environment-specific parameters
|
||||
|
||||
**Locked fields** (infrastructure settings, cannot be changed):
|
||||
- `tokenizer_name` — Model tokenizer (e.g., `Qwen/Qwen3-8B`)
|
||||
- `rollout_server_url` — Atropos API URL (`http://localhost:8000`)
|
||||
- `max_token_length` — Maximum token length (8192)
|
||||
- `max_num_workers` — Maximum parallel workers (2048)
|
||||
- `total_steps` — Total training steps (2500)
|
||||
- `lora_rank` — LoRA adapter rank (32)
|
||||
- `learning_rate` — Learning rate (4e-5)
|
||||
- `max_token_trainer_length` — Max tokens for trainer (9000)
|
||||
|
||||
### 3. Start Training
|
||||
|
||||
```
|
||||
Start the training run
|
||||
```
|
||||
|
||||
The agent calls `rl_start_training()` which:
|
||||
|
||||
1. Generates a YAML config file merging locked settings with configurable overrides
|
||||
2. Creates a unique run ID
|
||||
3. Spawns three processes:
|
||||
- **Atropos API server** (`run-api`) — trajectory coordination
|
||||
- **Tinker trainer** (`launch_training.py`) — LoRA training + FastAPI inference server on port 8001
|
||||
- **Environment** (`environment.py serve`) — the selected environment connecting to Atropos
|
||||
|
||||
The processes start with staggered delays (5s for API, 30s for trainer, 90s more for environment) to ensure proper initialization order.
|
||||
|
||||
### 4. Monitor Progress
|
||||
|
||||
```
|
||||
Check the status of training run abc12345
|
||||
```
|
||||
|
||||
The agent calls `rl_check_status(run_id)` which reports:
|
||||
|
||||
- Process status (running/exited for each of the 3 processes)
|
||||
- Running time
|
||||
- WandB metrics (step, reward mean, percent correct, eval accuracy)
|
||||
- Log file locations for debugging
|
||||
|
||||
:::note Rate Limiting
|
||||
Status checks are rate-limited to once every **30 minutes** per run ID. This prevents excessive polling during long-running training jobs that take hours.
|
||||
:::
|
||||
|
||||
### 5. Stop or Get Results
|
||||
|
||||
```
|
||||
Stop the training run
|
||||
# or
|
||||
Get the final results for run abc12345
|
||||
```
|
||||
|
||||
`rl_stop_training()` terminates all three processes in reverse order (environment → trainer → API). `rl_get_results()` retrieves final WandB metrics and training history.
|
||||
|
||||
## Inference Testing
|
||||
|
||||
Before committing to a full training run, you can test if an environment works correctly using `rl_test_inference`. This runs a few steps of inference and scoring using OpenRouter — no Tinker API needed, just an `OPENROUTER_API_KEY`.
|
||||
|
||||
```
|
||||
Test the selected environment with inference
|
||||
```
|
||||
|
||||
Default configuration:
|
||||
- **3 steps × 16 completions = 48 rollouts per model**
|
||||
- Tests 3 models at different scales for robustness:
|
||||
- `qwen/qwen3-8b` (small)
|
||||
- `z-ai/glm-4.7-flash` (medium)
|
||||
- `minimax/minimax-m2.7` (large)
|
||||
- Total: ~144 rollouts
|
||||
|
||||
This validates:
|
||||
- Environment loads correctly
|
||||
- Prompt construction works
|
||||
- Inference response parsing is robust across model scales
|
||||
- Verifier/scoring logic produces valid rewards
|
||||
|
||||
## Tinker API Integration
|
||||
|
||||
The trainer uses the [Tinker](https://tinker.computer) API for model training operations:
|
||||
|
||||
- **ServiceClient** — Creates training and sampling clients
|
||||
- **Training client** — Handles forward-backward passes with importance sampling loss, optimizer steps (Adam), and weight checkpointing
|
||||
- **Sampling client** — Provides inference using the latest trained weights
|
||||
|
||||
The training loop:
|
||||
1. Fetches a batch of rollouts from Atropos (prompt + completions + scores)
|
||||
2. Converts to Tinker Datum objects with padded logprobs and advantages
|
||||
3. Runs forward-backward pass with importance sampling loss
|
||||
4. Takes an optimizer step (Adam: lr=4e-5, β1=0.9, β2=0.95)
|
||||
5. Saves weights and creates a new sampling client for next-step inference
|
||||
6. Logs metrics to WandB
|
||||
|
||||
## Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
api["Atropos API<br/>run-api<br/>port 8000"]
|
||||
env["Environment<br/>BaseEnv implementation"]
|
||||
infer["OpenAI / sglang<br/>inference API<br/>port 8001"]
|
||||
trainer["Tinker Trainer<br/>LoRA training + FastAPI"]
|
||||
|
||||
env <--> api
|
||||
env --> infer
|
||||
api -->|"batches: tokens, scores, logprobs"| trainer
|
||||
trainer -->|"serves inference"| infer
|
||||
```
|
||||
|
||||
## Creating Custom Environments
|
||||
|
||||
To create a new RL environment:
|
||||
|
||||
1. Create a Python file in `tinker-atropos/tinker_atropos/environments/`
|
||||
2. Define a class that inherits from `BaseEnv`
|
||||
3. Implement the required methods:
|
||||
- `load_dataset()` — Load your training data
|
||||
- `get_next_item()` — Provide the next item to the model
|
||||
- `score_answer()` — Score model outputs and assign rewards
|
||||
- `collect_trajectories()` — Collect and return trajectories
|
||||
4. Optionally define a custom config class inheriting from `BaseEnvConfig`
|
||||
|
||||
Study the existing `gsm8k_tinker.py` as a template. The agent can help you create new environments — it can read existing environment files, inspect HuggingFace datasets, and write new environment code.
|
||||
|
||||
## WandB Metrics
|
||||
|
||||
Training runs log to Weights & Biases with these key metrics:
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| `train/loss` | Training loss (importance sampling) |
|
||||
| `train/learning_rate` | Current learning rate |
|
||||
| `reward/mean` | Mean reward across groups |
|
||||
| `logprobs/mean` | Mean reference logprobs |
|
||||
| `logprobs/mean_training` | Mean training logprobs |
|
||||
| `logprobs/diff` | Logprob drift (reference - training) |
|
||||
| `advantages/mean` | Mean advantage values |
|
||||
| `advantages/std` | Advantage standard deviation |
|
||||
|
||||
## Log Files
|
||||
|
||||
Each training run generates log files in `~/.hermes/logs/rl_training/`:
|
||||
|
||||
```
|
||||
logs/
|
||||
├── api_{run_id}.log # Atropos API server logs
|
||||
├── trainer_{run_id}.log # Tinker trainer logs
|
||||
├── env_{run_id}.log # Environment process logs
|
||||
└── inference_tests/ # Inference test results
|
||||
├── test_{env}_{model}.jsonl
|
||||
└── test_{env}_{model}.log
|
||||
```
|
||||
|
||||
These are invaluable for debugging when training fails or produces unexpected results.
|
||||
375
hermes_code/website/docs/user-guide/features/skills.md
Normal file
375
hermes_code/website/docs/user-guide/features/skills.md
Normal file
|
|
@ -0,0 +1,375 @@
|
|||
---
|
||||
sidebar_position: 2
|
||||
title: "Skills System"
|
||||
description: "On-demand knowledge documents — progressive disclosure, agent-managed skills, and the Skills Hub"
|
||||
---
|
||||
|
||||
# Skills System
|
||||
|
||||
Skills are on-demand knowledge documents the agent can load when needed. They follow a **progressive disclosure** pattern to minimize token usage and are compatible with the [agentskills.io](https://agentskills.io/specification) open standard.
|
||||
|
||||
All skills live in **`~/.hermes/skills/`** — a single directory that serves as the source of truth. On fresh install, bundled skills are copied from the repo. Hub-installed and agent-created skills also go here. The agent can modify or delete any skill.
|
||||
|
||||
See also:
|
||||
|
||||
- [Bundled Skills Catalog](/docs/reference/skills-catalog)
|
||||
- [Official Optional Skills Catalog](/docs/reference/optional-skills-catalog)
|
||||
|
||||
## Using Skills
|
||||
|
||||
Every installed skill is automatically available as a slash command:
|
||||
|
||||
```bash
|
||||
# In the CLI or any messaging platform:
|
||||
/gif-search funny cats
|
||||
/axolotl help me fine-tune Llama 3 on my dataset
|
||||
/github-pr-workflow create a PR for the auth refactor
|
||||
/plan design a rollout for migrating our auth provider
|
||||
|
||||
# Just the skill name loads it and lets the agent ask what you need:
|
||||
/excalidraw
|
||||
```
|
||||
|
||||
The bundled `plan` skill is a good example of a skill-backed slash command with custom behavior. Running `/plan [request]` tells Hermes to inspect context if needed, write a markdown implementation plan instead of executing the task, and save the result under `.hermes/plans/` relative to the active workspace/backend working directory.
|
||||
|
||||
You can also interact with skills through natural conversation:
|
||||
|
||||
```bash
|
||||
hermes chat --toolsets skills -q "What skills do you have?"
|
||||
hermes chat --toolsets skills -q "Show me the axolotl skill"
|
||||
```
|
||||
|
||||
## Progressive Disclosure
|
||||
|
||||
Skills use a token-efficient loading pattern:
|
||||
|
||||
```
|
||||
Level 0: skills_list() → [{name, description, category}, ...] (~3k tokens)
|
||||
Level 1: skill_view(name) → Full content + metadata (varies)
|
||||
Level 2: skill_view(name, path) → Specific reference file (varies)
|
||||
```
|
||||
|
||||
The agent only loads the full skill content when it actually needs it.
|
||||
|
||||
## SKILL.md Format
|
||||
|
||||
```markdown
|
||||
---
|
||||
name: my-skill
|
||||
description: Brief description of what this skill does
|
||||
version: 1.0.0
|
||||
platforms: [macos, linux] # Optional — restrict to specific OS platforms
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [python, automation]
|
||||
category: devops
|
||||
fallback_for_toolsets: [web] # Optional — conditional activation (see below)
|
||||
requires_toolsets: [terminal] # Optional — conditional activation (see below)
|
||||
---
|
||||
|
||||
# Skill Title
|
||||
|
||||
## When to Use
|
||||
Trigger conditions for this skill.
|
||||
|
||||
## Procedure
|
||||
1. Step one
|
||||
2. Step two
|
||||
|
||||
## Pitfalls
|
||||
- Known failure modes and fixes
|
||||
|
||||
## Verification
|
||||
How to confirm it worked.
|
||||
```
|
||||
|
||||
### Platform-Specific Skills
|
||||
|
||||
Skills can restrict themselves to specific operating systems using the `platforms` field:
|
||||
|
||||
| Value | Matches |
|
||||
|-------|---------|
|
||||
| `macos` | macOS (Darwin) |
|
||||
| `linux` | Linux |
|
||||
| `windows` | Windows |
|
||||
|
||||
```yaml
|
||||
platforms: [macos] # macOS only (e.g., iMessage, Apple Reminders, FindMy)
|
||||
platforms: [macos, linux] # macOS and Linux
|
||||
```
|
||||
|
||||
When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted, the skill loads on all platforms.
|
||||
|
||||
### Conditional Activation (Fallback Skills)
|
||||
|
||||
Skills can automatically show or hide themselves based on which tools are available in the current session. This is most useful for **fallback skills** — free or local alternatives that should only appear when a premium tool is unavailable.
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
hermes:
|
||||
fallback_for_toolsets: [web] # Show ONLY when these toolsets are unavailable
|
||||
requires_toolsets: [terminal] # Show ONLY when these toolsets are available
|
||||
fallback_for_tools: [web_search] # Show ONLY when these specific tools are unavailable
|
||||
requires_tools: [terminal] # Show ONLY when these specific tools are available
|
||||
```
|
||||
|
||||
| Field | Behavior |
|
||||
|-------|----------|
|
||||
| `fallback_for_toolsets` | Skill is **hidden** when the listed toolsets are available. Shown when they're missing. |
|
||||
| `fallback_for_tools` | Same, but checks individual tools instead of toolsets. |
|
||||
| `requires_toolsets` | Skill is **hidden** when the listed toolsets are unavailable. Shown when they're present. |
|
||||
| `requires_tools` | Same, but checks individual tools. |
|
||||
|
||||
**Example:** The built-in `duckduckgo-search` skill uses `fallback_for_toolsets: [web]`. When you have `FIRECRAWL_API_KEY` set, the web toolset is available and the agent uses `web_search` — the DuckDuckGo skill stays hidden. If the API key is missing, the web toolset is unavailable and the DuckDuckGo skill automatically appears as a fallback.
|
||||
|
||||
Skills without any conditional fields behave exactly as before — they're always shown.
|
||||
|
||||
## Secure Setup on Load
|
||||
|
||||
Skills can declare required environment variables without disappearing from discovery:
|
||||
|
||||
```yaml
|
||||
required_environment_variables:
|
||||
- name: TENOR_API_KEY
|
||||
prompt: Tenor API key
|
||||
help: Get a key from https://developers.google.com/tenor
|
||||
required_for: full functionality
|
||||
```
|
||||
|
||||
When a missing value is encountered, Hermes asks for it securely only when the skill is actually loaded in the local CLI. You can skip setup and keep using the skill. Messaging surfaces never ask for secrets in chat — they tell you to use `hermes setup` or `~/.hermes/.env` locally instead.
|
||||
|
||||
Once set, declared env vars are **automatically passed through** to `execute_code` and `terminal` sandboxes — the skill's scripts can use `$TENOR_API_KEY` directly. For non-skill env vars, use the `terminal.env_passthrough` config option. See [Environment Variable Passthrough](/docs/user-guide/security#environment-variable-passthrough) for details.
|
||||
|
||||
## Skill Directory Structure
|
||||
|
||||
```text
|
||||
~/.hermes/skills/ # Single source of truth
|
||||
├── mlops/ # Category directory
|
||||
│ ├── axolotl/
|
||||
│ │ ├── SKILL.md # Main instructions (required)
|
||||
│ │ ├── references/ # Additional docs
|
||||
│ │ ├── templates/ # Output formats
|
||||
│ │ ├── scripts/ # Helper scripts callable from the skill
|
||||
│ │ └── assets/ # Supplementary files
|
||||
│ └── vllm/
|
||||
│ └── SKILL.md
|
||||
├── devops/
|
||||
│ └── deploy-k8s/ # Agent-created skill
|
||||
│ ├── SKILL.md
|
||||
│ └── references/
|
||||
├── .hub/ # Skills Hub state
|
||||
│ ├── lock.json
|
||||
│ ├── quarantine/
|
||||
│ └── audit.log
|
||||
└── .bundled_manifest # Tracks seeded bundled skills
|
||||
```
|
||||
|
||||
## Agent-Managed Skills (skill_manage tool)
|
||||
|
||||
The agent can create, update, and delete its own skills via the `skill_manage` tool. This is the agent's **procedural memory** — when it figures out a non-trivial workflow, it saves the approach as a skill for future reuse.
|
||||
|
||||
### When the Agent Creates Skills
|
||||
|
||||
- After completing a complex task (5+ tool calls) successfully
|
||||
- When it hit errors or dead ends and found the working path
|
||||
- When the user corrected its approach
|
||||
- When it discovered a non-trivial workflow
|
||||
|
||||
### Actions
|
||||
|
||||
| Action | Use for | Key params |
|
||||
|--------|---------|------------|
|
||||
| `create` | New skill from scratch | `name`, `content` (full SKILL.md), optional `category` |
|
||||
| `patch` | Targeted fixes (preferred) | `name`, `old_string`, `new_string` |
|
||||
| `edit` | Major structural rewrites | `name`, `content` (full SKILL.md replacement) |
|
||||
| `delete` | Remove a skill entirely | `name` |
|
||||
| `write_file` | Add/update supporting files | `name`, `file_path`, `file_content` |
|
||||
| `remove_file` | Remove a supporting file | `name`, `file_path` |
|
||||
|
||||
:::tip
|
||||
The `patch` action is preferred for updates — it's more token-efficient than `edit` because only the changed text appears in the tool call.
|
||||
:::
|
||||
|
||||
## Skills Hub
|
||||
|
||||
Browse, search, install, and manage skills from online registries, `skills.sh`, direct well-known skill endpoints, and official optional skills.
|
||||
|
||||
### Common commands
|
||||
|
||||
```bash
|
||||
hermes skills browse # Browse all hub skills (official first)
|
||||
hermes skills browse --source official # Browse only official optional skills
|
||||
hermes skills search kubernetes # Search all sources
|
||||
hermes skills search react --source skills-sh # Search the skills.sh directory
|
||||
hermes skills search https://mintlify.com/docs --source well-known
|
||||
hermes skills inspect openai/skills/k8s # Preview before installing
|
||||
hermes skills install openai/skills/k8s # Install with security scan
|
||||
hermes skills install official/security/1password
|
||||
hermes skills install skills-sh/vercel-labs/json-render/json-render-react --force
|
||||
hermes skills install well-known:https://mintlify.com/docs/.well-known/skills/mintlify
|
||||
hermes skills list --source hub # List hub-installed skills
|
||||
hermes skills check # Check installed hub skills for upstream updates
|
||||
hermes skills update # Reinstall hub skills with upstream changes when needed
|
||||
hermes skills audit # Re-scan all hub skills for security
|
||||
hermes skills uninstall k8s # Remove a hub skill
|
||||
hermes skills publish skills/my-skill --to github --repo owner/repo
|
||||
hermes skills snapshot export setup.json # Export skill config
|
||||
hermes skills tap add myorg/skills-repo # Add a custom GitHub source
|
||||
```
|
||||
|
||||
### Supported hub sources
|
||||
|
||||
| Source | Example | Notes |
|
||||
|--------|---------|-------|
|
||||
| `official` | `official/security/1password` | Optional skills shipped with Hermes. |
|
||||
| `skills-sh` | `skills-sh/vercel-labs/agent-skills/vercel-react-best-practices` | Searchable via `hermes skills search <query> --source skills-sh`. Hermes resolves alias-style skills when the skills.sh slug differs from the repo folder. |
|
||||
| `well-known` | `well-known:https://mintlify.com/docs/.well-known/skills/mintlify` | Skills served directly from `/.well-known/skills/index.json` on a website. Search using the site or docs URL. |
|
||||
| `github` | `openai/skills/k8s` | Direct GitHub repo/path installs and custom taps. |
|
||||
| `clawhub`, `lobehub`, `claude-marketplace` | Source-specific identifiers | Community or marketplace integrations. |
|
||||
|
||||
### Integrated hubs and registries
|
||||
|
||||
Hermes currently integrates with these skills ecosystems and discovery sources:
|
||||
|
||||
#### 1. Official optional skills (`official`)
|
||||
|
||||
These are maintained in the Hermes repository itself and install with builtin trust.
|
||||
|
||||
- Catalog: [Official Optional Skills Catalog](../../reference/optional-skills-catalog)
|
||||
- Source in repo: `optional-skills/`
|
||||
- Example:
|
||||
|
||||
```bash
|
||||
hermes skills browse --source official
|
||||
hermes skills install official/security/1password
|
||||
```
|
||||
|
||||
#### 2. skills.sh (`skills-sh`)
|
||||
|
||||
This is Vercel's public skills directory. Hermes can search it directly, inspect skill detail pages, resolve alias-style slugs, and install from the underlying source repo.
|
||||
|
||||
- Directory: [skills.sh](https://skills.sh/)
|
||||
- CLI/tooling repo: [vercel-labs/skills](https://github.com/vercel-labs/skills)
|
||||
- Official Vercel skills repo: [vercel-labs/agent-skills](https://github.com/vercel-labs/agent-skills)
|
||||
- Example:
|
||||
|
||||
```bash
|
||||
hermes skills search react --source skills-sh
|
||||
hermes skills inspect skills-sh/vercel-labs/json-render/json-render-react
|
||||
hermes skills install skills-sh/vercel-labs/json-render/json-render-react --force
|
||||
```
|
||||
|
||||
#### 3. Well-known skill endpoints (`well-known`)
|
||||
|
||||
This is URL-based discovery from sites that publish `/.well-known/skills/index.json`. It is not a single centralized hub — it is a web discovery convention.
|
||||
|
||||
- Example live endpoint: [Mintlify docs skills index](https://mintlify.com/docs/.well-known/skills/index.json)
|
||||
- Reference server implementation: [vercel-labs/skills-handler](https://github.com/vercel-labs/skills-handler)
|
||||
- Example:
|
||||
|
||||
```bash
|
||||
hermes skills search https://mintlify.com/docs --source well-known
|
||||
hermes skills inspect well-known:https://mintlify.com/docs/.well-known/skills/mintlify
|
||||
hermes skills install well-known:https://mintlify.com/docs/.well-known/skills/mintlify
|
||||
```
|
||||
|
||||
#### 4. Direct GitHub skills (`github`)
|
||||
|
||||
Hermes can install directly from GitHub repositories and GitHub-based taps. This is useful when you already know the repo/path or want to add your own custom source repo.
|
||||
|
||||
- OpenAI skills: [openai/skills](https://github.com/openai/skills)
|
||||
- Anthropic skills: [anthropics/skills](https://github.com/anthropics/skills)
|
||||
- Example community tap source: [VoltAgent/awesome-agent-skills](https://github.com/VoltAgent/awesome-agent-skills)
|
||||
- Example:
|
||||
|
||||
```bash
|
||||
hermes skills install openai/skills/k8s
|
||||
hermes skills tap add myorg/skills-repo
|
||||
```
|
||||
|
||||
#### 5. ClawHub (`clawhub`)
|
||||
|
||||
A third-party skills marketplace integrated as a community source.
|
||||
|
||||
- Site: [clawhub.ai](https://clawhub.ai/)
|
||||
- Hermes source id: `clawhub`
|
||||
|
||||
#### 6. Claude marketplace-style repos (`claude-marketplace`)
|
||||
|
||||
Hermes supports marketplace repos that publish Claude-compatible plugin/marketplace manifests.
|
||||
|
||||
Known integrated sources include:
|
||||
- [anthropics/skills](https://github.com/anthropics/skills)
|
||||
- [aiskillstore/marketplace](https://github.com/aiskillstore/marketplace)
|
||||
|
||||
Hermes source id: `claude-marketplace`
|
||||
|
||||
#### 7. LobeHub (`lobehub`)
|
||||
|
||||
Hermes can search and convert agent entries from LobeHub's public catalog into installable Hermes skills.
|
||||
|
||||
- Site: [LobeHub](https://lobehub.com/)
|
||||
- Public agents index: [chat-agents.lobehub.com](https://chat-agents.lobehub.com/)
|
||||
- Backing repo: [lobehub/lobe-chat-agents](https://github.com/lobehub/lobe-chat-agents)
|
||||
- Hermes source id: `lobehub`
|
||||
|
||||
### Security scanning and `--force`
|
||||
|
||||
All hub-installed skills go through a **security scanner** that checks for data exfiltration, prompt injection, destructive commands, supply-chain signals, and other threats.
|
||||
|
||||
`hermes skills inspect ...` now also surfaces upstream metadata when available:
|
||||
- repo URL
|
||||
- skills.sh detail page URL
|
||||
- install command
|
||||
- weekly installs
|
||||
- upstream security audit statuses
|
||||
- well-known index/endpoint URLs
|
||||
|
||||
Use `--force` when you have reviewed a third-party skill and want to override a non-dangerous policy block:
|
||||
|
||||
```bash
|
||||
hermes skills install skills-sh/anthropics/skills/pdf --force
|
||||
```
|
||||
|
||||
Important behavior:
|
||||
- `--force` can override policy blocks for caution/warn-style findings.
|
||||
- `--force` does **not** override a `dangerous` scan verdict.
|
||||
- Official optional skills (`official/...`) are treated as builtin trust and do not show the third-party warning panel.
|
||||
|
||||
### Trust levels
|
||||
|
||||
| Level | Source | Policy |
|
||||
|-------|--------|--------|
|
||||
| `builtin` | Ships with Hermes | Always trusted |
|
||||
| `official` | `optional-skills/` in the repo | Builtin trust, no third-party warning |
|
||||
| `trusted` | Trusted registries/repos such as `openai/skills`, `anthropics/skills` | More permissive policy than community sources |
|
||||
| `community` | Everything else (`skills.sh`, well-known endpoints, custom GitHub repos, most marketplaces) | Non-dangerous findings can be overridden with `--force`; `dangerous` verdicts stay blocked |
|
||||
|
||||
### Update lifecycle
|
||||
|
||||
The hub now tracks enough provenance to re-check upstream copies of installed skills:
|
||||
|
||||
```bash
|
||||
hermes skills check # Report which installed hub skills changed upstream
|
||||
hermes skills update # Reinstall only the skills with updates available
|
||||
hermes skills update react # Update one specific installed hub skill
|
||||
```
|
||||
|
||||
This uses the stored source identifier plus the current upstream bundle content hash to detect drift.
|
||||
|
||||
### Slash commands (inside chat)
|
||||
|
||||
All the same commands work with `/skills`:
|
||||
|
||||
```text
|
||||
/skills browse
|
||||
/skills search react --source skills-sh
|
||||
/skills search https://mintlify.com/docs --source well-known
|
||||
/skills inspect skills-sh/vercel-labs/json-render/json-render-react
|
||||
/skills install openai/skills/skill-creator --force
|
||||
/skills check
|
||||
/skills update
|
||||
/skills list
|
||||
```
|
||||
|
||||
Official optional skills still use identifiers like `official/security/1password` and `official/migration/openclaw-migration`.
|
||||
81
hermes_code/website/docs/user-guide/features/skins.md
Normal file
81
hermes_code/website/docs/user-guide/features/skins.md
Normal file
|
|
@ -0,0 +1,81 @@
|
|||
---
|
||||
sidebar_position: 10
|
||||
title: "Skins & Themes"
|
||||
description: "Customize the Hermes CLI with built-in and user-defined skins"
|
||||
---
|
||||
|
||||
# Skins & Themes
|
||||
|
||||
Skins control the **visual presentation** of the Hermes CLI: banner colors, spinner faces and verbs, response-box labels, branding text, and the tool activity prefix.
|
||||
|
||||
Conversational style and visual style are separate concepts:
|
||||
|
||||
- **Personality** changes the agent's tone and wording.
|
||||
- **Skin** changes the CLI's appearance.
|
||||
|
||||
## Change skins
|
||||
|
||||
```bash
|
||||
/skin # show the current skin and list available skins
|
||||
/skin ares # switch to a built-in skin
|
||||
/skin mytheme # switch to a custom skin from ~/.hermes/skins/mytheme.yaml
|
||||
```
|
||||
|
||||
Or set the default skin in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
display:
|
||||
skin: default
|
||||
```
|
||||
|
||||
## Built-in skins
|
||||
|
||||
| Skin | Description | Agent branding |
|
||||
|------|-------------|----------------|
|
||||
| `default` | Classic Hermes — gold and kawaii | `Hermes Agent` |
|
||||
| `ares` | War-god theme — crimson and bronze | `Ares Agent` |
|
||||
| `mono` | Monochrome — clean grayscale | `Hermes Agent` |
|
||||
| `slate` | Cool blue — developer-focused | `Hermes Agent` |
|
||||
| `poseidon` | Ocean-god theme — deep blue and seafoam | `Poseidon Agent` |
|
||||
| `sisyphus` | Sisyphean theme — austere grayscale with persistence | `Sisyphus Agent` |
|
||||
| `charizard` | Volcanic theme — burnt orange and ember | `Charizard Agent` |
|
||||
|
||||
## What a skin can customize
|
||||
|
||||
| Area | Keys |
|
||||
|------|------|
|
||||
| Banner + response colors | `colors.banner_*`, `colors.response_border` |
|
||||
| Spinner animation | `spinner.waiting_faces`, `spinner.thinking_faces`, `spinner.thinking_verbs`, `spinner.wings` |
|
||||
| Branding text | `branding.agent_name`, `branding.welcome`, `branding.response_label`, `branding.prompt_symbol` |
|
||||
| Tool activity prefix | `tool_prefix` |
|
||||
|
||||
## Custom skins
|
||||
|
||||
Create YAML files under `~/.hermes/skins/`. User skins inherit missing values from the built-in `default` skin.
|
||||
|
||||
```yaml
|
||||
name: cyberpunk
|
||||
description: Neon terminal theme
|
||||
|
||||
colors:
|
||||
banner_border: "#FF00FF"
|
||||
banner_title: "#00FFFF"
|
||||
banner_accent: "#FF1493"
|
||||
|
||||
spinner:
|
||||
thinking_verbs: ["jacking in", "decrypting", "uploading"]
|
||||
wings:
|
||||
- ["⟨⚡", "⚡⟩"]
|
||||
|
||||
branding:
|
||||
agent_name: "Cyber Agent"
|
||||
response_label: " ⚡ Cyber "
|
||||
|
||||
tool_prefix: "▏"
|
||||
```
|
||||
|
||||
## Operational notes
|
||||
|
||||
- Built-in skins load from `hermes_cli/skin_engine.py`.
|
||||
- Unknown skins automatically fall back to `default`.
|
||||
- `/skin` updates the active CLI theme immediately for the current session.
|
||||
165
hermes_code/website/docs/user-guide/features/tools.md
Normal file
165
hermes_code/website/docs/user-guide/features/tools.md
Normal file
|
|
@ -0,0 +1,165 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: "Tools & Toolsets"
|
||||
description: "Overview of Hermes Agent's tools — what's available, how toolsets work, and terminal backends"
|
||||
---
|
||||
|
||||
# Tools & Toolsets
|
||||
|
||||
Tools are functions that extend the agent's capabilities. They're organized into logical **toolsets** that can be enabled or disabled per platform.
|
||||
|
||||
## Available Tools
|
||||
|
||||
Hermes ships with a broad built-in tool registry covering web search, browser automation, terminal execution, file editing, memory, delegation, RL training, messaging delivery, Home Assistant, Honcho memory, and more.
|
||||
|
||||
High-level categories:
|
||||
|
||||
| Category | Examples | Description |
|
||||
|----------|----------|-------------|
|
||||
| **Web** | `web_search`, `web_extract` | Search the web and extract page content. |
|
||||
| **Terminal & Files** | `terminal`, `process`, `read_file`, `patch` | Execute commands and manipulate files. |
|
||||
| **Browser** | `browser_navigate`, `browser_snapshot`, `browser_vision` | Interactive browser automation with text and vision support. |
|
||||
| **Media** | `vision_analyze`, `image_generate`, `text_to_speech` | Multimodal analysis and generation. |
|
||||
| **Agent orchestration** | `todo`, `clarify`, `execute_code`, `delegate_task` | Planning, clarification, code execution, and subagent delegation. |
|
||||
| **Memory & recall** | `memory`, `session_search`, `honcho_*` | Persistent memory, session search, and Honcho cross-session context. |
|
||||
| **Automation & delivery** | `cronjob`, `send_message` | Scheduled tasks with create/list/update/pause/resume/run/remove actions, plus outbound messaging delivery. |
|
||||
| **Integrations** | `ha_*`, MCP server tools, `rl_*` | Home Assistant, MCP, RL training, and other integrations. |
|
||||
|
||||
For the authoritative code-derived registry, see [Built-in Tools Reference](/docs/reference/tools-reference) and [Toolsets Reference](/docs/reference/toolsets-reference).
|
||||
|
||||
## Using Toolsets
|
||||
|
||||
```bash
|
||||
# Use specific toolsets
|
||||
hermes chat --toolsets "web,terminal"
|
||||
|
||||
# See all available tools
|
||||
hermes tools
|
||||
|
||||
# Configure tools per platform (interactive)
|
||||
hermes tools
|
||||
```
|
||||
|
||||
Common toolsets include `web`, `terminal`, `file`, `browser`, `vision`, `image_gen`, `moa`, `skills`, `tts`, `todo`, `memory`, `session_search`, `cronjob`, `code_execution`, `delegation`, `clarify`, `honcho`, `homeassistant`, and `rl`.
|
||||
|
||||
See [Toolsets Reference](/docs/reference/toolsets-reference) for the full set, including platform presets such as `hermes-cli`, `hermes-telegram`, and dynamic MCP toolsets like `mcp-<server>`.
|
||||
|
||||
## Terminal Backends
|
||||
|
||||
The terminal tool can execute commands in different environments:
|
||||
|
||||
| Backend | Description | Use Case |
|
||||
|---------|-------------|----------|
|
||||
| `local` | Run on your machine (default) | Development, trusted tasks |
|
||||
| `docker` | Isolated containers | Security, reproducibility |
|
||||
| `ssh` | Remote server | Sandboxing, keep agent away from its own code |
|
||||
| `singularity` | HPC containers | Cluster computing, rootless |
|
||||
| `modal` | Cloud execution | Serverless, scale |
|
||||
| `daytona` | Cloud sandbox workspace | Persistent remote dev environments |
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
terminal:
|
||||
backend: local # or: docker, ssh, singularity, modal, daytona
|
||||
cwd: "." # Working directory
|
||||
timeout: 180 # Command timeout in seconds
|
||||
```
|
||||
|
||||
### Docker Backend
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: docker
|
||||
docker_image: python:3.11-slim
|
||||
```
|
||||
|
||||
### SSH Backend
|
||||
|
||||
Recommended for security — agent can't modify its own code:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: ssh
|
||||
```
|
||||
```bash
|
||||
# Set credentials in ~/.hermes/.env
|
||||
TERMINAL_SSH_HOST=my-server.example.com
|
||||
TERMINAL_SSH_USER=myuser
|
||||
TERMINAL_SSH_KEY=~/.ssh/id_rsa
|
||||
```
|
||||
|
||||
### Singularity/Apptainer
|
||||
|
||||
```bash
|
||||
# Pre-build SIF for parallel workers
|
||||
apptainer build ~/python.sif docker://python:3.11-slim
|
||||
|
||||
# Configure
|
||||
hermes config set terminal.backend singularity
|
||||
hermes config set terminal.singularity_image ~/python.sif
|
||||
```
|
||||
|
||||
### Modal (Serverless Cloud)
|
||||
|
||||
```bash
|
||||
uv pip install "swe-rex[modal]"
|
||||
modal setup
|
||||
hermes config set terminal.backend modal
|
||||
```
|
||||
|
||||
### Container Resources
|
||||
|
||||
Configure CPU, memory, disk, and persistence for all container backends:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: docker # or singularity, modal, daytona
|
||||
container_cpu: 1 # CPU cores (default: 1)
|
||||
container_memory: 5120 # Memory in MB (default: 5GB)
|
||||
container_disk: 51200 # Disk in MB (default: 50GB)
|
||||
container_persistent: true # Persist filesystem across sessions (default: true)
|
||||
```
|
||||
|
||||
When `container_persistent: true`, installed packages, files, and config survive across sessions.
|
||||
|
||||
### Container Security
|
||||
|
||||
All container backends run with security hardening:
|
||||
|
||||
- Read-only root filesystem (Docker)
|
||||
- All Linux capabilities dropped
|
||||
- No privilege escalation
|
||||
- PID limits (256 processes)
|
||||
- Full namespace isolation
|
||||
- Persistent workspace via volumes, not writable root layer
|
||||
|
||||
Docker can optionally receive an explicit env allowlist via `terminal.docker_forward_env`, but forwarded variables are visible to commands inside the container and should be treated as exposed to that session.
|
||||
|
||||
## Background Process Management
|
||||
|
||||
Start background processes and manage them:
|
||||
|
||||
```python
|
||||
terminal(command="pytest -v tests/", background=true)
|
||||
# Returns: {"session_id": "proc_abc123", "pid": 12345}
|
||||
|
||||
# Then manage with the process tool:
|
||||
process(action="list") # Show all running processes
|
||||
process(action="poll", session_id="proc_abc123") # Check status
|
||||
process(action="wait", session_id="proc_abc123") # Block until done
|
||||
process(action="log", session_id="proc_abc123") # Full output
|
||||
process(action="kill", session_id="proc_abc123") # Terminate
|
||||
process(action="write", session_id="proc_abc123", data="y") # Send input
|
||||
```
|
||||
|
||||
PTY mode (`pty=true`) enables interactive CLI tools like Codex and Claude Code.
|
||||
|
||||
## Sudo Support
|
||||
|
||||
If a command needs sudo, you'll be prompted for your password (cached for the session). Or set `SUDO_PASSWORD` in `~/.hermes/.env`.
|
||||
|
||||
:::warning
|
||||
On messaging platforms, if sudo fails, the output includes a tip to add `SUDO_PASSWORD` to `~/.hermes/.env`.
|
||||
:::
|
||||
128
hermes_code/website/docs/user-guide/features/tts.md
Normal file
128
hermes_code/website/docs/user-guide/features/tts.md
Normal file
|
|
@ -0,0 +1,128 @@
|
|||
---
|
||||
sidebar_position: 9
|
||||
title: "Voice & TTS"
|
||||
description: "Text-to-speech and voice message transcription across all platforms"
|
||||
---
|
||||
|
||||
# Voice & TTS
|
||||
|
||||
Hermes Agent supports both text-to-speech output and voice message transcription across all messaging platforms.
|
||||
|
||||
## Text-to-Speech
|
||||
|
||||
Convert text to speech with four providers:
|
||||
|
||||
| Provider | Quality | Cost | API Key |
|
||||
|----------|---------|------|---------|
|
||||
| **Edge TTS** (default) | Good | Free | None needed |
|
||||
| **ElevenLabs** | Excellent | Paid | `ELEVENLABS_API_KEY` |
|
||||
| **OpenAI TTS** | Good | Paid | `VOICE_TOOLS_OPENAI_KEY` |
|
||||
| **NeuTTS** | Good | Free | None needed |
|
||||
|
||||
### Platform Delivery
|
||||
|
||||
| Platform | Delivery | Format |
|
||||
|----------|----------|--------|
|
||||
| Telegram | Voice bubble (plays inline) | Opus `.ogg` |
|
||||
| Discord | Voice bubble (Opus/OGG), falls back to file attachment | Opus/MP3 |
|
||||
| WhatsApp | Audio file attachment | MP3 |
|
||||
| CLI | Saved to `~/.hermes/audio_cache/` | MP3 |
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
tts:
|
||||
provider: "edge" # "edge" | "elevenlabs" | "openai" | "neutts"
|
||||
edge:
|
||||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||||
elevenlabs:
|
||||
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
|
||||
model_id: "eleven_multilingual_v2"
|
||||
openai:
|
||||
model: "gpt-4o-mini-tts"
|
||||
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
||||
base_url: "https://api.openai.com/v1" # Override for OpenAI-compatible TTS endpoints
|
||||
neutts:
|
||||
ref_audio: ''
|
||||
ref_text: ''
|
||||
model: neuphonic/neutts-air-q4-gguf
|
||||
device: cpu
|
||||
```
|
||||
|
||||
### Telegram Voice Bubbles & ffmpeg
|
||||
|
||||
Telegram voice bubbles require Opus/OGG audio format:
|
||||
|
||||
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup
|
||||
- **Edge TTS** (default) outputs MP3 and needs **ffmpeg** to convert:
|
||||
- **NeuTTS** outputs WAV and also needs **ffmpeg** to convert for Telegram voice bubbles
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install ffmpeg
|
||||
|
||||
# macOS
|
||||
brew install ffmpeg
|
||||
|
||||
# Fedora
|
||||
sudo dnf install ffmpeg
|
||||
```
|
||||
|
||||
Without ffmpeg, Edge TTS and NeuTTS audio are sent as regular audio files (playable, but shown as a rectangular player instead of a voice bubble).
|
||||
|
||||
:::tip
|
||||
If you want voice bubbles without installing ffmpeg, switch to the OpenAI or ElevenLabs provider.
|
||||
:::
|
||||
|
||||
## Voice Message Transcription (STT)
|
||||
|
||||
Voice messages sent on Telegram, Discord, WhatsApp, Slack, or Signal are automatically transcribed and injected as text into the conversation. The agent sees the transcript as normal text.
|
||||
|
||||
| Provider | Quality | Cost | API Key |
|
||||
|----------|---------|------|---------|
|
||||
| **Local Whisper** (default) | Good | Free | None needed |
|
||||
| **Groq Whisper API** | Good–Best | Free tier | `GROQ_API_KEY` |
|
||||
| **OpenAI Whisper API** | Good–Best | Paid | `VOICE_TOOLS_OPENAI_KEY` or `OPENAI_API_KEY` |
|
||||
|
||||
:::info Zero Config
|
||||
Local transcription works out of the box when `faster-whisper` is installed. If that's unavailable, Hermes can also use a local `whisper` CLI from common install locations (like `/opt/homebrew/bin`) or a custom command via `HERMES_LOCAL_STT_COMMAND`.
|
||||
:::
|
||||
|
||||
### Configuration
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
stt:
|
||||
provider: "local" # "local" | "groq" | "openai"
|
||||
local:
|
||||
model: "base" # tiny, base, small, medium, large-v3
|
||||
openai:
|
||||
model: "whisper-1" # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
|
||||
```
|
||||
|
||||
### Provider Details
|
||||
|
||||
**Local (faster-whisper)** — Runs Whisper locally via [faster-whisper](https://github.com/SYSTRAN/faster-whisper). Uses CPU by default, GPU if available. Model sizes:
|
||||
|
||||
| Model | Size | Speed | Quality |
|
||||
|-------|------|-------|---------|
|
||||
| `tiny` | ~75 MB | Fastest | Basic |
|
||||
| `base` | ~150 MB | Fast | Good (default) |
|
||||
| `small` | ~500 MB | Medium | Better |
|
||||
| `medium` | ~1.5 GB | Slower | Great |
|
||||
| `large-v3` | ~3 GB | Slowest | Best |
|
||||
|
||||
**Groq API** — Requires `GROQ_API_KEY`. Good cloud fallback when you want a free hosted STT option.
|
||||
|
||||
**OpenAI API** — Accepts `VOICE_TOOLS_OPENAI_KEY` first and falls back to `OPENAI_API_KEY`. Supports `whisper-1`, `gpt-4o-mini-transcribe`, and `gpt-4o-transcribe`.
|
||||
|
||||
**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders.
|
||||
|
||||
### Fallback Behavior
|
||||
|
||||
If your configured provider isn't available, Hermes automatically falls back:
|
||||
- **Local faster-whisper unavailable** → Tries a local `whisper` CLI or `HERMES_LOCAL_STT_COMMAND` before cloud providers
|
||||
- **Groq key not set** → Falls back to local transcription, then OpenAI
|
||||
- **OpenAI key not set** → Falls back to local transcription, then Groq
|
||||
- **Nothing available** → Voice messages pass through with an accurate note to the user
|
||||
187
hermes_code/website/docs/user-guide/features/vision.md
Normal file
187
hermes_code/website/docs/user-guide/features/vision.md
Normal file
|
|
@ -0,0 +1,187 @@
|
|||
---
|
||||
title: Vision & Image Paste
|
||||
description: Paste images from your clipboard into the Hermes CLI for multimodal vision analysis.
|
||||
sidebar_label: Vision & Image Paste
|
||||
sidebar_position: 7
|
||||
---
|
||||
|
||||
# Vision & Image Paste
|
||||
|
||||
Hermes Agent supports **multimodal vision** — you can paste images from your clipboard directly into the CLI and ask the agent to analyze, describe, or work with them. Images are sent to the model as base64-encoded content blocks, so any vision-capable model can process them.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. Copy an image to your clipboard (screenshot, browser image, etc.)
|
||||
2. Attach it using one of the methods below
|
||||
3. Type your question and press Enter
|
||||
4. The image appears as a `[📎 Image #1]` badge above the input
|
||||
5. On submit, the image is sent to the model as a vision content block
|
||||
|
||||
You can attach multiple images before sending — each gets its own badge. Press `Ctrl+C` to clear all attached images.
|
||||
|
||||
Images are saved to `~/.hermes/images/` as PNG files with timestamped filenames.
|
||||
|
||||
## Paste Methods
|
||||
|
||||
How you attach an image depends on your terminal environment. Not all methods work everywhere — here's the full breakdown:
|
||||
|
||||
### `/paste` Command
|
||||
|
||||
**The most reliable method. Works everywhere.**
|
||||
|
||||
```
|
||||
/paste
|
||||
```
|
||||
|
||||
Type `/paste` and press Enter. Hermes checks your clipboard for an image and attaches it. This works in every environment because it explicitly calls the clipboard backend — no terminal keybinding interception to worry about.
|
||||
|
||||
### Ctrl+V / Cmd+V (Bracketed Paste)
|
||||
|
||||
When you paste text that's on the clipboard alongside an image, Hermes automatically checks for an image too. This works when:
|
||||
- Your clipboard contains **both text and an image** (some apps put both on the clipboard when you copy)
|
||||
- Your terminal supports bracketed paste (most modern terminals do)
|
||||
|
||||
:::warning
|
||||
If your clipboard has **only an image** (no text), Ctrl+V does nothing in most terminals. Terminals can only paste text — there's no standard mechanism to paste binary image data. Use `/paste` or Alt+V instead.
|
||||
:::
|
||||
|
||||
### Alt+V
|
||||
|
||||
Alt key combinations pass through most terminal emulators (they're sent as ESC + key rather than being intercepted). Press `Alt+V` to check the clipboard for an image.
|
||||
|
||||
:::caution
|
||||
**Does not work in VSCode's integrated terminal.** VSCode intercepts many Alt+key combos for its own UI. Use `/paste` instead.
|
||||
:::
|
||||
|
||||
### Ctrl+V (Raw — Linux Only)
|
||||
|
||||
On Linux desktop terminals (GNOME Terminal, Konsole, Alacritty, etc.), `Ctrl+V` is **not** the paste shortcut — `Ctrl+Shift+V` is. So `Ctrl+V` sends a raw byte to the application, and Hermes catches it to check the clipboard. This only works on Linux desktop terminals with X11 or Wayland clipboard access.
|
||||
|
||||
## Platform Compatibility
|
||||
|
||||
| Environment | `/paste` | Ctrl+V text+image | Alt+V | Notes |
|
||||
|---|:---:|:---:|:---:|---|
|
||||
| **macOS Terminal / iTerm2** | ✅ | ✅ | ✅ | Best experience — `osascript` always available |
|
||||
| **Linux X11 desktop** | ✅ | ✅ | ✅ | Requires `xclip` (`apt install xclip`) |
|
||||
| **Linux Wayland desktop** | ✅ | ✅ | ✅ | Requires `wl-paste` (`apt install wl-clipboard`) |
|
||||
| **WSL2 (Windows Terminal)** | ✅ | ✅¹ | ✅ | Uses `powershell.exe` — no extra install needed |
|
||||
| **VSCode Terminal (local)** | ✅ | ✅¹ | ❌ | VSCode intercepts Alt+key |
|
||||
| **VSCode Terminal (SSH)** | ❌² | ❌² | ❌ | Remote clipboard not accessible |
|
||||
| **SSH terminal (any)** | ❌² | ❌² | ❌² | Remote clipboard not accessible |
|
||||
|
||||
¹ Only when clipboard has both text and an image (image-only clipboard = nothing happens)
|
||||
² See [SSH & Remote Sessions](#ssh--remote-sessions) below
|
||||
|
||||
## Platform-Specific Setup
|
||||
|
||||
### macOS
|
||||
|
||||
**No setup required.** Hermes uses `osascript` (built into macOS) to read the clipboard. For faster performance, optionally install `pngpaste`:
|
||||
|
||||
```bash
|
||||
brew install pngpaste
|
||||
```
|
||||
|
||||
### Linux (X11)
|
||||
|
||||
Install `xclip`:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install xclip
|
||||
|
||||
# Fedora
|
||||
sudo dnf install xclip
|
||||
|
||||
# Arch
|
||||
sudo pacman -S xclip
|
||||
```
|
||||
|
||||
### Linux (Wayland)
|
||||
|
||||
Modern Linux desktops (Ubuntu 22.04+, Fedora 34+) often use Wayland by default. Install `wl-clipboard`:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install wl-clipboard
|
||||
|
||||
# Fedora
|
||||
sudo dnf install wl-clipboard
|
||||
|
||||
# Arch
|
||||
sudo pacman -S wl-clipboard
|
||||
```
|
||||
|
||||
:::tip How to check if you're on Wayland
|
||||
```bash
|
||||
echo $XDG_SESSION_TYPE
|
||||
# "wayland" = Wayland, "x11" = X11, "tty" = no display server
|
||||
```
|
||||
:::
|
||||
|
||||
### WSL2
|
||||
|
||||
**No extra setup required.** Hermes detects WSL2 automatically (via `/proc/version`) and uses `powershell.exe` to access the Windows clipboard through .NET's `System.Windows.Forms.Clipboard`. This is built into WSL2's Windows interop — `powershell.exe` is available by default.
|
||||
|
||||
The clipboard data is transferred as base64-encoded PNG over stdout, so no file path conversion or temp files are needed.
|
||||
|
||||
:::info WSLg Note
|
||||
If you're running WSLg (WSL2 with GUI support), Hermes tries the PowerShell path first, then falls back to `wl-paste`. WSLg's clipboard bridge only supports BMP format for images — Hermes auto-converts BMP to PNG using Pillow (if installed) or ImageMagick's `convert` command.
|
||||
:::
|
||||
|
||||
#### Verify WSL2 clipboard access
|
||||
|
||||
```bash
|
||||
# 1. Check WSL detection
|
||||
grep -i microsoft /proc/version
|
||||
|
||||
# 2. Check PowerShell is accessible
|
||||
which powershell.exe
|
||||
|
||||
# 3. Copy an image, then check
|
||||
powershell.exe -NoProfile -Command "Add-Type -AssemblyName System.Windows.Forms; [System.Windows.Forms.Clipboard]::ContainsImage()"
|
||||
# Should print "True"
|
||||
```
|
||||
|
||||
## SSH & Remote Sessions
|
||||
|
||||
**Clipboard paste does not work over SSH.** When you SSH into a remote machine, the Hermes CLI runs on the remote host. All clipboard tools (`xclip`, `wl-paste`, `powershell.exe`, `osascript`) read the clipboard of the machine they run on — which is the remote server, not your local machine. Your local clipboard is inaccessible from the remote side.
|
||||
|
||||
### Workarounds for SSH
|
||||
|
||||
1. **Upload the image file** — Save the image locally, upload it to the remote server via `scp`, VSCode's file explorer (drag-and-drop), or any file transfer method. Then reference it by path. *(A `/attach <filepath>` command is planned for a future release.)*
|
||||
|
||||
2. **Use a URL** — If the image is accessible online, just paste the URL in your message. The agent can use `vision_analyze` to look at any image URL directly.
|
||||
|
||||
3. **X11 forwarding** — Connect with `ssh -X` to forward X11. This lets `xclip` on the remote machine access your local X11 clipboard. Requires an X server running locally (XQuartz on macOS, built-in on Linux X11 desktops). Slow for large images.
|
||||
|
||||
4. **Use a messaging platform** — Send images to Hermes via Telegram, Discord, Slack, or WhatsApp. These platforms handle image upload natively and are not affected by clipboard/terminal limitations.
|
||||
|
||||
## Why Terminals Can't Paste Images
|
||||
|
||||
This is a common source of confusion, so here's the technical explanation:
|
||||
|
||||
Terminals are **text-based** interfaces. When you press Ctrl+V (or Cmd+V), the terminal emulator:
|
||||
|
||||
1. Reads the clipboard for **text content**
|
||||
2. Wraps it in [bracketed paste](https://en.wikipedia.org/wiki/Bracketed-paste) escape sequences
|
||||
3. Sends it to the application through the terminal's text stream
|
||||
|
||||
If the clipboard contains only an image (no text), the terminal has nothing to send. There is no standard terminal escape sequence for binary image data. The terminal simply does nothing.
|
||||
|
||||
This is why Hermes uses a separate clipboard check — instead of receiving image data through the terminal paste event, it calls OS-level tools (`osascript`, `powershell.exe`, `xclip`, `wl-paste`) directly via subprocess to read the clipboard independently.
|
||||
|
||||
## Supported Models
|
||||
|
||||
Image paste works with any vision-capable model. The image is sent as a base64-encoded data URL in the OpenAI vision content format:
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "data:image/png;base64,..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Most modern models support this format, including GPT-4 Vision, Claude (with vision), Gemini, and open-source multimodal models served through OpenRouter.
|
||||
508
hermes_code/website/docs/user-guide/features/voice-mode.md
Normal file
508
hermes_code/website/docs/user-guide/features/voice-mode.md
Normal file
|
|
@ -0,0 +1,508 @@
|
|||
---
|
||||
sidebar_position: 10
|
||||
title: "Voice Mode"
|
||||
description: "Real-time voice conversations with Hermes Agent — CLI, Telegram, Discord (DMs, text channels, and voice channels)"
|
||||
---
|
||||
|
||||
# Voice Mode
|
||||
|
||||
Hermes Agent supports full voice interaction across CLI and messaging platforms. Talk to the agent using your microphone, hear spoken replies, and have live voice conversations in Discord voice channels.
|
||||
|
||||
If you want a practical setup walkthrough with recommended configurations and real usage patterns, see [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using voice features, make sure you have:
|
||||
|
||||
1. **Hermes Agent installed** — `pip install hermes-agent` (see [Installation](/docs/getting-started/installation))
|
||||
2. **An LLM provider configured** — run `hermes model` or set your preferred provider credentials in `~/.hermes/.env`
|
||||
3. **A working base setup** — run `hermes` to verify the agent responds to text before enabling voice
|
||||
|
||||
:::tip
|
||||
The `~/.hermes/` directory and default `config.yaml` are created automatically the first time you run `hermes`. You only need to create `~/.hermes/.env` manually for API keys.
|
||||
:::
|
||||
|
||||
## Overview
|
||||
|
||||
| Feature | Platform | Description |
|
||||
|---------|----------|-------------|
|
||||
| **Interactive Voice** | CLI | Press Ctrl+B to record, agent auto-detects silence and responds |
|
||||
| **Auto Voice Reply** | Telegram, Discord | Agent sends spoken audio alongside text responses |
|
||||
| **Voice Channel** | Discord | Bot joins VC, listens to users speaking, speaks replies back |
|
||||
|
||||
## Requirements
|
||||
|
||||
### Python Packages
|
||||
|
||||
```bash
|
||||
# CLI voice mode (microphone + audio playback)
|
||||
pip install "hermes-agent[voice]"
|
||||
|
||||
# Discord + Telegram messaging (includes discord.py[voice] for VC support)
|
||||
pip install "hermes-agent[messaging]"
|
||||
|
||||
# Premium TTS (ElevenLabs)
|
||||
pip install "hermes-agent[tts-premium]"
|
||||
|
||||
# Local TTS (NeuTTS, optional)
|
||||
python -m pip install -U neutts[all]
|
||||
|
||||
# Everything at once
|
||||
pip install "hermes-agent[all]"
|
||||
```
|
||||
|
||||
| Extra | Packages | Required For |
|
||||
|-------|----------|-------------|
|
||||
| `voice` | `sounddevice`, `numpy` | CLI voice mode |
|
||||
| `messaging` | `discord.py[voice]`, `python-telegram-bot`, `aiohttp` | Discord & Telegram bots |
|
||||
| `tts-premium` | `elevenlabs` | ElevenLabs TTS provider |
|
||||
|
||||
Optional local TTS provider: install `neutts` separately with `python -m pip install -U neutts[all]`. On first use it downloads the model automatically.
|
||||
|
||||
:::info
|
||||
`discord.py[voice]` installs **PyNaCl** (for voice encryption) and **opus bindings** automatically. This is required for Discord voice channel support.
|
||||
:::
|
||||
|
||||
### System Dependencies
|
||||
|
||||
```bash
|
||||
# macOS
|
||||
brew install portaudio ffmpeg opus
|
||||
brew install espeak-ng # for NeuTTS
|
||||
|
||||
# Ubuntu/Debian
|
||||
sudo apt install portaudio19-dev ffmpeg libopus0
|
||||
sudo apt install espeak-ng # for NeuTTS
|
||||
```
|
||||
|
||||
| Dependency | Purpose | Required For |
|
||||
|-----------|---------|-------------|
|
||||
| **PortAudio** | Microphone input and audio playback | CLI voice mode |
|
||||
| **ffmpeg** | Audio format conversion (MP3 → Opus, PCM → WAV) | All platforms |
|
||||
| **Opus** | Discord voice codec | Discord voice channels |
|
||||
| **espeak-ng** | Phonemizer backend | Local NeuTTS provider |
|
||||
|
||||
### API Keys
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
# Speech-to-Text — local provider needs NO key at all
|
||||
# pip install faster-whisper # Free, runs locally, recommended
|
||||
GROQ_API_KEY=your-key # Groq Whisper — fast, free tier (cloud)
|
||||
VOICE_TOOLS_OPENAI_KEY=your-key # OpenAI Whisper — paid (cloud)
|
||||
|
||||
# Text-to-Speech (optional — Edge TTS and NeuTTS work without any key)
|
||||
ELEVENLABS_API_KEY=*** # ElevenLabs — premium quality
|
||||
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
|
||||
```
|
||||
|
||||
:::tip
|
||||
If `faster-whisper` is installed, voice mode works with **zero API keys** for STT. The model (~150 MB for `base`) downloads automatically on first use.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## CLI Voice Mode
|
||||
|
||||
### Quick Start
|
||||
|
||||
Start the CLI and enable voice mode:
|
||||
|
||||
```bash
|
||||
hermes # Start the interactive CLI
|
||||
```
|
||||
|
||||
Then use these commands inside the CLI:
|
||||
|
||||
```
|
||||
/voice Toggle voice mode on/off
|
||||
/voice on Enable voice mode
|
||||
/voice off Disable voice mode
|
||||
/voice tts Toggle TTS output
|
||||
/voice status Show current state
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
1. Start the CLI with `hermes` and enable voice mode with `/voice on`
|
||||
2. **Press Ctrl+B** — a beep plays (880Hz), recording starts
|
||||
3. **Speak** — a live audio level bar shows your input: `● [▁▂▃▅▇▇▅▂] ❯`
|
||||
4. **Stop speaking** — after 3 seconds of silence, recording auto-stops
|
||||
5. **Two beeps** play (660Hz) confirming the recording ended
|
||||
6. Audio is transcribed via Whisper and sent to the agent
|
||||
7. If TTS is enabled, the agent's reply is spoken aloud
|
||||
8. Recording **automatically restarts** — speak again without pressing any key
|
||||
|
||||
This loop continues until you press **Ctrl+B** during recording (exits continuous mode) or 3 consecutive recordings detect no speech.
|
||||
|
||||
:::tip
|
||||
The record key is configurable via `voice.record_key` in `~/.hermes/config.yaml` (default: `ctrl+b`).
|
||||
:::
|
||||
|
||||
### Silence Detection
|
||||
|
||||
Two-stage algorithm detects when you've finished speaking:
|
||||
|
||||
1. **Speech confirmation** — waits for audio above the RMS threshold (200) for at least 0.3s, tolerating brief dips between syllables
|
||||
2. **End detection** — once speech is confirmed, triggers after 3.0 seconds of continuous silence
|
||||
|
||||
If no speech is detected at all for 15 seconds, recording stops automatically.
|
||||
|
||||
Both `silence_threshold` and `silence_duration` are configurable in `config.yaml`.
|
||||
|
||||
### Streaming TTS
|
||||
|
||||
When TTS is enabled, the agent speaks its reply **sentence-by-sentence** as it generates text — you don't wait for the full response:
|
||||
|
||||
1. Buffers text deltas into complete sentences (min 20 chars)
|
||||
2. Strips markdown formatting and `<think>` blocks
|
||||
3. Generates and plays audio per sentence in real-time
|
||||
|
||||
### Hallucination Filter
|
||||
|
||||
Whisper sometimes generates phantom text from silence or background noise ("Thank you for watching", "Subscribe", etc.). The agent filters these out using a set of 26 known hallucination phrases across multiple languages, plus a regex pattern that catches repetitive variations.
|
||||
|
||||
---
|
||||
|
||||
## Gateway Voice Reply (Telegram & Discord)
|
||||
|
||||
If you haven't set up your messaging bots yet, see the platform-specific guides:
|
||||
- [Telegram Setup Guide](../messaging/telegram.md)
|
||||
- [Discord Setup Guide](../messaging/discord.md)
|
||||
|
||||
Start the gateway to connect to your messaging platforms:
|
||||
|
||||
```bash
|
||||
hermes gateway # Start the gateway (connects to configured platforms)
|
||||
hermes gateway setup # Interactive setup wizard for first-time configuration
|
||||
```
|
||||
|
||||
### Discord: Channels vs DMs
|
||||
|
||||
The bot supports two interaction modes on Discord:
|
||||
|
||||
| Mode | How to Talk | Mention Required | Setup |
|
||||
|------|------------|-----------------|-------|
|
||||
| **Direct Message (DM)** | Open the bot's profile → "Message" | No | Works immediately |
|
||||
| **Server Channel** | Type in a text channel where the bot is present | Yes (`@botname`) | Bot must be invited to the server |
|
||||
|
||||
**DM (recommended for personal use):** Just open a DM with the bot and type — no @mention needed. Voice replies and all commands work the same as in channels.
|
||||
|
||||
**Server channels:** The bot only responds when you @mention it (e.g. `@hermesbyt4 hello`). Make sure you select the **bot user** from the mention popup, not the role with the same name.
|
||||
|
||||
:::tip
|
||||
To disable the mention requirement in server channels, add to `~/.hermes/.env`:
|
||||
```bash
|
||||
DISCORD_REQUIRE_MENTION=false
|
||||
```
|
||||
Or set specific channels as free-response (no mention needed):
|
||||
```bash
|
||||
DISCORD_FREE_RESPONSE_CHANNELS=123456789,987654321
|
||||
```
|
||||
:::
|
||||
|
||||
### Commands
|
||||
|
||||
These work in both Telegram and Discord (DMs and text channels):
|
||||
|
||||
```
|
||||
/voice Toggle voice mode on/off
|
||||
/voice on Voice replies only when you send a voice message
|
||||
/voice tts Voice replies for ALL messages
|
||||
/voice off Disable voice replies
|
||||
/voice status Show current setting
|
||||
```
|
||||
|
||||
### Modes
|
||||
|
||||
| Mode | Command | Behavior |
|
||||
|------|---------|----------|
|
||||
| `off` | `/voice off` | Text only (default) |
|
||||
| `voice_only` | `/voice on` | Speaks reply only when you send a voice message |
|
||||
| `all` | `/voice tts` | Speaks reply to every message |
|
||||
|
||||
Voice mode setting is persisted across gateway restarts.
|
||||
|
||||
### Platform Delivery
|
||||
|
||||
| Platform | Format | Notes |
|
||||
|----------|--------|-------|
|
||||
| **Telegram** | Voice bubble (Opus/OGG) | Plays inline in chat. ffmpeg converts MP3 → Opus if needed |
|
||||
| **Discord** | Native voice bubble (Opus/OGG) | Plays inline like a user voice message. Falls back to file attachment if voice bubble API fails |
|
||||
|
||||
---
|
||||
|
||||
## Discord Voice Channels
|
||||
|
||||
The most immersive voice feature: the bot joins a Discord voice channel, listens to users speaking, transcribes their speech, processes through the agent, and speaks the reply back in the voice channel.
|
||||
|
||||
### Setup
|
||||
|
||||
#### 1. Discord Bot Permissions
|
||||
|
||||
If you already have a Discord bot set up for text (see [Discord Setup Guide](../messaging/discord.md)), you need to add voice permissions.
|
||||
|
||||
Go to the [Discord Developer Portal](https://discord.com/developers/applications) → your application → **Installation** → **Default Install Settings** → **Guild Install**:
|
||||
|
||||
**Add these permissions to the existing text permissions:**
|
||||
|
||||
| Permission | Purpose | Required |
|
||||
|-----------|---------|----------|
|
||||
| **Connect** | Join voice channels | Yes |
|
||||
| **Speak** | Play TTS audio in voice channels | Yes |
|
||||
| **Use Voice Activity** | Detect when users are speaking | Recommended |
|
||||
|
||||
**Updated Permissions Integer:**
|
||||
|
||||
| Level | Integer | What's Included |
|
||||
|-------|---------|----------------|
|
||||
| Text only | `274878286912` | View Channels, Send Messages, Read History, Embeds, Attachments, Threads, Reactions |
|
||||
| Text + Voice | `274881432640` | All above + Connect, Speak |
|
||||
|
||||
**Re-invite the bot** with the updated permissions URL:
|
||||
|
||||
```
|
||||
https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274881432640
|
||||
```
|
||||
|
||||
Replace `YOUR_APP_ID` with your Application ID from the Developer Portal.
|
||||
|
||||
:::warning
|
||||
Re-inviting the bot to a server it's already in will update its permissions without removing it. You won't lose any data or configuration.
|
||||
:::
|
||||
|
||||
#### 2. Privileged Gateway Intents
|
||||
|
||||
In the [Developer Portal](https://discord.com/developers/applications) → your application → **Bot** → **Privileged Gateway Intents**, enable all three:
|
||||
|
||||
| Intent | Purpose |
|
||||
|--------|---------|
|
||||
| **Presence Intent** | Detect user online/offline status |
|
||||
| **Server Members Intent** | Map voice SSRC identifiers to Discord user IDs |
|
||||
| **Message Content Intent** | Read text message content in channels |
|
||||
|
||||
All three are required for full voice channel functionality. **Server Members Intent** is especially critical — without it, the bot cannot identify who is speaking in the voice channel.
|
||||
|
||||
#### 3. Opus Codec
|
||||
|
||||
The Opus codec library must be installed on the machine running the gateway:
|
||||
|
||||
```bash
|
||||
# macOS (Homebrew)
|
||||
brew install opus
|
||||
|
||||
# Ubuntu/Debian
|
||||
sudo apt install libopus0
|
||||
```
|
||||
|
||||
The bot auto-loads the codec from:
|
||||
- **macOS:** `/opt/homebrew/lib/libopus.dylib`
|
||||
- **Linux:** `libopus.so.0`
|
||||
|
||||
#### 4. Environment Variables
|
||||
|
||||
```bash
|
||||
# ~/.hermes/.env
|
||||
|
||||
# Discord bot (already configured for text)
|
||||
DISCORD_BOT_TOKEN=your-bot-token
|
||||
DISCORD_ALLOWED_USERS=your-user-id
|
||||
|
||||
# STT — local provider needs no key (pip install faster-whisper)
|
||||
# GROQ_API_KEY=your-key # Alternative: cloud-based, fast, free tier
|
||||
|
||||
# TTS — optional. Edge TTS and NeuTTS need no key.
|
||||
# ELEVENLABS_API_KEY=*** # Premium quality
|
||||
# VOICE_TOOLS_OPENAI_KEY=*** # OpenAI TTS / Whisper
|
||||
```
|
||||
|
||||
### Start the Gateway
|
||||
|
||||
```bash
|
||||
hermes gateway # Start with existing configuration
|
||||
```
|
||||
|
||||
The bot should come online in Discord within a few seconds.
|
||||
|
||||
### Commands
|
||||
|
||||
Use these in the Discord text channel where the bot is present:
|
||||
|
||||
```
|
||||
/voice join Bot joins your current voice channel
|
||||
/voice channel Alias for /voice join
|
||||
/voice leave Bot disconnects from voice channel
|
||||
/voice status Show voice mode and connected channel
|
||||
```
|
||||
|
||||
:::info
|
||||
You must be in a voice channel before running `/voice join`. The bot joins the same VC you're in.
|
||||
:::
|
||||
|
||||
### How It Works
|
||||
|
||||
When the bot joins a voice channel, it:
|
||||
|
||||
1. **Listens** to each user's audio stream independently
|
||||
2. **Detects silence** — 1.5s of silence after at least 0.5s of speech triggers processing
|
||||
3. **Transcribes** the audio via Whisper STT (local, Groq, or OpenAI)
|
||||
4. **Processes** through the full agent pipeline (session, tools, memory)
|
||||
5. **Speaks** the reply back in the voice channel via TTS
|
||||
|
||||
### Text Channel Integration
|
||||
|
||||
When the bot is in a voice channel:
|
||||
|
||||
- Transcripts appear in the text channel: `[Voice] @user: what you said`
|
||||
- Agent responses are sent as text in the channel AND spoken in the VC
|
||||
- The text channel is the one where `/voice join` was issued
|
||||
|
||||
### Echo Prevention
|
||||
|
||||
The bot automatically pauses its audio listener while playing TTS replies, preventing it from hearing and re-processing its own output.
|
||||
|
||||
### Access Control
|
||||
|
||||
Only users listed in `DISCORD_ALLOWED_USERS` can interact via voice. Other users' audio is silently ignored.
|
||||
|
||||
```bash
|
||||
# ~/.hermes/.env
|
||||
DISCORD_ALLOWED_USERS=284102345871466496
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### config.yaml
|
||||
|
||||
```yaml
|
||||
# Voice recording (CLI)
|
||||
voice:
|
||||
record_key: "ctrl+b" # Key to start/stop recording
|
||||
max_recording_seconds: 120 # Maximum recording length
|
||||
auto_tts: false # Auto-enable TTS when voice mode starts
|
||||
silence_threshold: 200 # RMS level (0-32767) below which counts as silence
|
||||
silence_duration: 3.0 # Seconds of silence before auto-stop
|
||||
|
||||
# Speech-to-Text
|
||||
stt:
|
||||
provider: "local" # "local" (free) | "groq" | "openai"
|
||||
local:
|
||||
model: "base" # tiny, base, small, medium, large-v3
|
||||
# model: "whisper-1" # Legacy: used when provider is not set
|
||||
|
||||
# Text-to-Speech
|
||||
tts:
|
||||
provider: "edge" # "edge" (free) | "elevenlabs" | "openai" | "neutts"
|
||||
edge:
|
||||
voice: "en-US-AriaNeural" # 322 voices, 74 languages
|
||||
elevenlabs:
|
||||
voice_id: "pNInz6obpgDQGcFmaJgB" # Adam
|
||||
model_id: "eleven_multilingual_v2"
|
||||
openai:
|
||||
model: "gpt-4o-mini-tts"
|
||||
voice: "alloy" # alloy, echo, fable, onyx, nova, shimmer
|
||||
base_url: "https://api.openai.com/v1" # optional: override for self-hosted or OpenAI-compatible endpoints
|
||||
neutts:
|
||||
ref_audio: ''
|
||||
ref_text: ''
|
||||
model: neuphonic/neutts-air-q4-gguf
|
||||
device: cpu
|
||||
```
|
||||
|
||||
### Environment Variables
|
||||
|
||||
```bash
|
||||
# Speech-to-Text providers (local needs no key)
|
||||
# pip install faster-whisper # Free local STT — no API key needed
|
||||
GROQ_API_KEY=... # Groq Whisper (fast, free tier)
|
||||
VOICE_TOOLS_OPENAI_KEY=... # OpenAI Whisper (paid)
|
||||
|
||||
# STT advanced overrides (optional)
|
||||
STT_GROQ_MODEL=whisper-large-v3-turbo # Override default Groq STT model
|
||||
STT_OPENAI_MODEL=whisper-1 # Override default OpenAI STT model
|
||||
GROQ_BASE_URL=https://api.groq.com/openai/v1 # Custom Groq endpoint
|
||||
STT_OPENAI_BASE_URL=https://api.openai.com/v1 # Custom OpenAI STT endpoint
|
||||
|
||||
# Text-to-Speech providers (Edge TTS and NeuTTS need no key)
|
||||
ELEVENLABS_API_KEY=*** # ElevenLabs (premium quality)
|
||||
# VOICE_TOOLS_OPENAI_KEY above also enables OpenAI TTS
|
||||
|
||||
# Discord voice channel
|
||||
DISCORD_BOT_TOKEN=...
|
||||
DISCORD_ALLOWED_USERS=...
|
||||
```
|
||||
|
||||
### STT Provider Comparison
|
||||
|
||||
| Provider | Model | Speed | Quality | Cost | API Key |
|
||||
|----------|-------|-------|---------|------|---------|
|
||||
| **Local** | `base` | Fast (depends on CPU/GPU) | Good | Free | No |
|
||||
| **Local** | `small` | Medium | Better | Free | No |
|
||||
| **Local** | `large-v3` | Slow | Best | Free | No |
|
||||
| **Groq** | `whisper-large-v3-turbo` | Very fast (~0.5s) | Good | Free tier | Yes |
|
||||
| **Groq** | `whisper-large-v3` | Fast (~1s) | Better | Free tier | Yes |
|
||||
| **OpenAI** | `whisper-1` | Fast (~1s) | Good | Paid | Yes |
|
||||
| **OpenAI** | `gpt-4o-transcribe` | Medium (~2s) | Best | Paid | Yes |
|
||||
|
||||
Provider priority (automatic fallback): **local** > **groq** > **openai**
|
||||
|
||||
### TTS Provider Comparison
|
||||
|
||||
| Provider | Quality | Cost | Latency | Key Required |
|
||||
|----------|---------|------|---------|-------------|
|
||||
| **Edge TTS** | Good | Free | ~1s | No |
|
||||
| **ElevenLabs** | Excellent | Paid | ~2s | Yes |
|
||||
| **OpenAI TTS** | Good | Paid | ~1.5s | Yes |
|
||||
| **NeuTTS** | Good | Free | Depends on CPU/GPU | No |
|
||||
|
||||
NeuTTS uses the `tts.neutts` config block above.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### "No audio device found" (CLI)
|
||||
|
||||
PortAudio is not installed:
|
||||
|
||||
```bash
|
||||
brew install portaudio # macOS
|
||||
sudo apt install portaudio19-dev # Ubuntu
|
||||
```
|
||||
|
||||
### Bot doesn't respond in Discord server channels
|
||||
|
||||
The bot requires an @mention by default in server channels. Make sure you:
|
||||
|
||||
1. Type `@` and select the **bot user** (with the #discriminator), not the **role** with the same name
|
||||
2. Or use DMs instead — no mention needed
|
||||
3. Or set `DISCORD_REQUIRE_MENTION=false` in `~/.hermes/.env`
|
||||
|
||||
### Bot joins VC but doesn't hear me
|
||||
|
||||
- Check your Discord user ID is in `DISCORD_ALLOWED_USERS`
|
||||
- Make sure you're not muted in Discord
|
||||
- The bot needs a SPEAKING event from Discord before it can map your audio — start speaking within a few seconds of joining
|
||||
|
||||
### Bot hears me but doesn't respond
|
||||
|
||||
- Verify STT is available: install `faster-whisper` (no key needed) or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY`
|
||||
- Check the LLM model is configured and accessible
|
||||
- Review gateway logs: `tail -f ~/.hermes/logs/gateway.log`
|
||||
|
||||
### Bot responds in text but not in voice channel
|
||||
|
||||
- TTS provider may be failing — check API key and quota
|
||||
- Edge TTS (free, no key) is the default fallback
|
||||
- Check logs for TTS errors
|
||||
|
||||
### Whisper returns garbage text
|
||||
|
||||
The hallucination filter catches most cases automatically. If you're still getting phantom transcripts:
|
||||
|
||||
- Use a quieter environment
|
||||
- Adjust `silence_threshold` in config (higher = less sensitive)
|
||||
- Try a different STT model
|
||||
173
hermes_code/website/docs/user-guide/git-worktrees.md
Normal file
173
hermes_code/website/docs/user-guide/git-worktrees.md
Normal file
|
|
@ -0,0 +1,173 @@
|
|||
---
|
||||
sidebar_position: 9
|
||||
title: "Git Worktrees"
|
||||
description: "Run multiple Hermes agents safely on the same repository using git worktrees and isolated checkouts"
|
||||
---
|
||||
|
||||
# Git Worktrees
|
||||
|
||||
Hermes Agent is often used on large, long‑lived repositories. When you want to:
|
||||
|
||||
- Run **multiple agents in parallel** on the same project, or
|
||||
- Keep experimental refactors isolated from your main branch,
|
||||
|
||||
Git **worktrees** are the safest way to give each agent its own checkout without duplicating the entire repository.
|
||||
|
||||
This page shows how to combine worktrees with Hermes so each session has a clean, isolated working directory.
|
||||
|
||||
## Why Use Worktrees with Hermes?
|
||||
|
||||
Hermes treats the **current working directory** as the project root:
|
||||
|
||||
- CLI: the directory where you run `hermes` or `hermes chat`
|
||||
- Messaging gateways: the directory set by `MESSAGING_CWD`
|
||||
|
||||
If you run multiple agents in the **same checkout**, their changes can interfere with each other:
|
||||
|
||||
- One agent may delete or rewrite files the other is using.
|
||||
- It becomes harder to understand which changes belong to which experiment.
|
||||
|
||||
With worktrees, each agent gets:
|
||||
|
||||
- Its **own branch and working directory**
|
||||
- Its **own Checkpoint Manager history** for `/rollback`
|
||||
|
||||
See also: [Checkpoints and /rollback](./checkpoints-and-rollback.md).
|
||||
|
||||
## Quick Start: Creating a Worktree
|
||||
|
||||
From your main repository (containing `.git/`), create a new worktree for a feature branch:
|
||||
|
||||
```bash
|
||||
# From the main repo root
|
||||
cd /path/to/your/repo
|
||||
|
||||
# Create a new branch and worktree in ../repo-feature
|
||||
git worktree add ../repo-feature feature/hermes-experiment
|
||||
```
|
||||
|
||||
This creates:
|
||||
|
||||
- A new directory: `../repo-feature`
|
||||
- A new branch: `feature/hermes-experiment` checked out in that directory
|
||||
|
||||
Now you can `cd` into the new worktree and run Hermes there:
|
||||
|
||||
```bash
|
||||
cd ../repo-feature
|
||||
|
||||
# Start Hermes in the worktree
|
||||
hermes
|
||||
```
|
||||
|
||||
Hermes will:
|
||||
|
||||
- See `../repo-feature` as the project root.
|
||||
- Use that directory for context files, code edits, and tools.
|
||||
- Use a **separate checkpoint history** for `/rollback` scoped to this worktree.
|
||||
|
||||
## Running Multiple Agents in Parallel
|
||||
|
||||
You can create multiple worktrees, each with its own branch:
|
||||
|
||||
```bash
|
||||
cd /path/to/your/repo
|
||||
|
||||
git worktree add ../repo-experiment-a feature/hermes-a
|
||||
git worktree add ../repo-experiment-b feature/hermes-b
|
||||
```
|
||||
|
||||
In separate terminals:
|
||||
|
||||
```bash
|
||||
# Terminal 1
|
||||
cd ../repo-experiment-a
|
||||
hermes
|
||||
|
||||
# Terminal 2
|
||||
cd ../repo-experiment-b
|
||||
hermes
|
||||
```
|
||||
|
||||
Each Hermes process:
|
||||
|
||||
- Works on its own branch (`feature/hermes-a` vs `feature/hermes-b`).
|
||||
- Writes checkpoints under a different shadow repo hash (derived from the worktree path).
|
||||
- Can use `/rollback` independently without affecting the other.
|
||||
|
||||
This is especially useful when:
|
||||
|
||||
- Running batch refactors.
|
||||
- Trying different approaches to the same task.
|
||||
- Pairing CLI + gateway sessions against the same upstream repo.
|
||||
|
||||
## Cleaning Up Worktrees Safely
|
||||
|
||||
When you are done with an experiment:
|
||||
|
||||
1. Decide whether to keep or discard the work.
|
||||
2. If you want to keep it:
|
||||
- Merge the branch into your main branch as usual.
|
||||
3. Remove the worktree:
|
||||
|
||||
```bash
|
||||
cd /path/to/your/repo
|
||||
|
||||
# Remove the worktree directory and its reference
|
||||
git worktree remove ../repo-feature
|
||||
```
|
||||
|
||||
Notes:
|
||||
|
||||
- `git worktree remove` will refuse to remove a worktree with uncommitted changes unless you force it.
|
||||
- Removing a worktree does **not** automatically delete the branch; you can delete or keep the branch using normal `git branch` commands.
|
||||
- Hermes checkpoint data under `~/.hermes/checkpoints/` is not automatically pruned when you remove a worktree, but it is usually very small.
|
||||
|
||||
## Best Practices
|
||||
|
||||
- **One worktree per Hermes experiment**
|
||||
- Create a dedicated branch/worktree for each substantial change.
|
||||
- This keeps diffs focused and PRs small and reviewable.
|
||||
- **Name branches after the experiment**
|
||||
- e.g. `feature/hermes-checkpoints-docs`, `feature/hermes-refactor-tests`.
|
||||
- **Commit frequently**
|
||||
- Use git commits for high‑level milestones.
|
||||
- Use [checkpoints and /rollback](./checkpoints-and-rollback.md) as a safety net for tool‑driven edits in between.
|
||||
- **Avoid running Hermes from the bare repo root when using worktrees**
|
||||
- Prefer the worktree directories instead, so each agent has a clear scope.
|
||||
|
||||
## Using `hermes -w` (Automatic Worktree Mode)
|
||||
|
||||
Hermes has a built‑in `-w` flag that **automatically creates a disposable git worktree** with its own branch. You don't need to set up worktrees manually — just `cd` into your repo and run:
|
||||
|
||||
```bash
|
||||
cd /path/to/your/repo
|
||||
hermes -w
|
||||
```
|
||||
|
||||
Hermes will:
|
||||
|
||||
- Create a temporary worktree under `.worktrees/` inside your repo.
|
||||
- Check out an isolated branch (e.g. `hermes/hermes-<hash>`).
|
||||
- Run the full CLI session inside that worktree.
|
||||
|
||||
This is the easiest way to get worktree isolation. You can also combine it with a single query:
|
||||
|
||||
```bash
|
||||
hermes -w -q "Fix issue #123"
|
||||
```
|
||||
|
||||
For parallel agents, open multiple terminals and run `hermes -w` in each — every invocation gets its own worktree and branch automatically.
|
||||
|
||||
## Putting It All Together
|
||||
|
||||
- Use **git worktrees** to give each Hermes session its own clean checkout.
|
||||
- Use **branches** to capture the high‑level history of your experiments.
|
||||
- Use **checkpoints + `/rollback`** to recover from mistakes inside each worktree.
|
||||
|
||||
This combination gives you:
|
||||
|
||||
- Strong guarantees that different agents and experiments do not step on each other.
|
||||
- Fast iteration cycles with easy recovery from bad edits.
|
||||
- Clean, reviewable pull requests.
|
||||
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
{
|
||||
"label": "Messaging Gateway",
|
||||
"position": 3,
|
||||
"link": {
|
||||
"type": "doc",
|
||||
"id": "user-guide/messaging/index"
|
||||
}
|
||||
}
|
||||
192
hermes_code/website/docs/user-guide/messaging/dingtalk.md
Normal file
192
hermes_code/website/docs/user-guide/messaging/dingtalk.md
Normal file
|
|
@ -0,0 +1,192 @@
|
|||
---
|
||||
sidebar_position: 10
|
||||
title: "DingTalk"
|
||||
description: "Set up Hermes Agent as a DingTalk chatbot"
|
||||
---
|
||||
|
||||
# DingTalk Setup
|
||||
|
||||
Hermes Agent integrates with DingTalk (钉钉) as a chatbot, letting you chat with your AI assistant through direct messages or group chats. The bot connects via DingTalk's Stream Mode — a long-lived WebSocket connection that requires no public URL or webhook server — and replies using markdown-formatted messages through DingTalk's session webhook API.
|
||||
|
||||
Before setup, here's the part most people want to know: how Hermes behaves once it's in your DingTalk workspace.
|
||||
|
||||
## How Hermes Behaves
|
||||
|
||||
| Context | Behavior |
|
||||
|---------|----------|
|
||||
| **DMs (1:1 chat)** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
|
||||
| **Group chats** | Hermes responds when you `@mention` it. Without a mention, Hermes ignores the message. |
|
||||
| **Shared groups with multiple users** | By default, Hermes isolates session history per user inside the group. Two people talking in the same group do not share one transcript unless you explicitly disable that. |
|
||||
|
||||
### Session Model in DingTalk
|
||||
|
||||
By default:
|
||||
|
||||
- each DM gets its own session
|
||||
- each user in a shared group chat gets their own session inside that group
|
||||
|
||||
This is controlled by `config.yaml`:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
Set it to `false` only if you explicitly want one shared conversation for the entire group:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: false
|
||||
```
|
||||
|
||||
This guide walks you through the full setup process — from creating your DingTalk bot to sending your first message.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Install the required Python packages:
|
||||
|
||||
```bash
|
||||
pip install dingtalk-stream httpx
|
||||
```
|
||||
|
||||
- `dingtalk-stream` — DingTalk's official SDK for Stream Mode (WebSocket-based real-time messaging)
|
||||
- `httpx` — async HTTP client used for sending replies via session webhooks
|
||||
|
||||
## Step 1: Create a DingTalk App
|
||||
|
||||
1. Go to the [DingTalk Developer Console](https://open-dev.dingtalk.com/).
|
||||
2. Log in with your DingTalk admin account.
|
||||
3. Click **Application Development** → **Custom Apps** → **Create App via H5 Micro-App** (or **Robot** depending on your console version).
|
||||
4. Fill in:
|
||||
- **App Name**: e.g., `Hermes Agent`
|
||||
- **Description**: optional
|
||||
5. After creating, navigate to **Credentials & Basic Info** to find your **Client ID** (AppKey) and **Client Secret** (AppSecret). Copy both.
|
||||
|
||||
:::warning[Credentials shown only once]
|
||||
The Client Secret is only displayed once when you create the app. If you lose it, you'll need to regenerate it. Never share these credentials publicly or commit them to Git.
|
||||
:::
|
||||
|
||||
## Step 2: Enable the Robot Capability
|
||||
|
||||
1. In your app's settings page, go to **Add Capability** → **Robot**.
|
||||
2. Enable the robot capability.
|
||||
3. Under **Message Reception Mode**, select **Stream Mode** (recommended — no public URL needed).
|
||||
|
||||
:::tip
|
||||
Stream Mode is the recommended setup. It uses a long-lived WebSocket connection initiated from your machine, so you don't need a public IP, domain name, or webhook endpoint. This works behind NAT, firewalls, and on local machines.
|
||||
:::
|
||||
|
||||
## Step 3: Find Your DingTalk User ID
|
||||
|
||||
Hermes Agent uses your DingTalk User ID to control who can interact with the bot. DingTalk User IDs are alphanumeric strings set by your organization's admin.
|
||||
|
||||
To find yours:
|
||||
|
||||
1. Ask your DingTalk organization admin — User IDs are configured in the DingTalk admin console under **Contacts** → **Members**.
|
||||
2. Alternatively, the bot logs the `sender_id` for each incoming message. Start the gateway, send the bot a message, then check the logs for your ID.
|
||||
|
||||
## Step 4: Configure Hermes Agent
|
||||
|
||||
### Option A: Interactive Setup (Recommended)
|
||||
|
||||
Run the guided setup command:
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **DingTalk** when prompted, then paste your Client ID, Client Secret, and allowed user IDs when asked.
|
||||
|
||||
### Option B: Manual Configuration
|
||||
|
||||
Add the following to your `~/.hermes/.env` file:
|
||||
|
||||
```bash
|
||||
# Required
|
||||
DINGTALK_CLIENT_ID=your-app-key
|
||||
DINGTALK_CLIENT_SECRET=your-app-secret
|
||||
|
||||
# Security: restrict who can interact with the bot
|
||||
DINGTALK_ALLOWED_USERS=user-id-1
|
||||
|
||||
# Multiple allowed users (comma-separated)
|
||||
# DINGTALK_ALLOWED_USERS=user-id-1,user-id-2
|
||||
```
|
||||
|
||||
Optional behavior settings in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
- `group_sessions_per_user: true` keeps each participant's context isolated inside shared group chats
|
||||
|
||||
### Start the Gateway
|
||||
|
||||
Once configured, start the DingTalk gateway:
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
The bot should connect to DingTalk's Stream Mode within a few seconds. Send it a message — either a DM or in a group where it's been added — to test.
|
||||
|
||||
:::tip
|
||||
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
|
||||
:::
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Bot is not responding to messages
|
||||
|
||||
**Cause**: The robot capability isn't enabled, or `DINGTALK_ALLOWED_USERS` doesn't include your User ID.
|
||||
|
||||
**Fix**: Verify the robot capability is enabled in your app settings and that Stream Mode is selected. Check that your User ID is in `DINGTALK_ALLOWED_USERS`. Restart the gateway.
|
||||
|
||||
### "dingtalk-stream not installed" error
|
||||
|
||||
**Cause**: The `dingtalk-stream` Python package is not installed.
|
||||
|
||||
**Fix**: Install it:
|
||||
|
||||
```bash
|
||||
pip install dingtalk-stream httpx
|
||||
```
|
||||
|
||||
### "DINGTALK_CLIENT_ID and DINGTALK_CLIENT_SECRET required"
|
||||
|
||||
**Cause**: The credentials aren't set in your environment or `.env` file.
|
||||
|
||||
**Fix**: Verify `DINGTALK_CLIENT_ID` and `DINGTALK_CLIENT_SECRET` are set correctly in `~/.hermes/.env`. The Client ID is your AppKey, and the Client Secret is your AppSecret from the DingTalk Developer Console.
|
||||
|
||||
### Stream disconnects / reconnection loops
|
||||
|
||||
**Cause**: Network instability, DingTalk platform maintenance, or credential issues.
|
||||
|
||||
**Fix**: The adapter automatically reconnects with exponential backoff (2s → 5s → 10s → 30s → 60s). Check that your credentials are valid and your app hasn't been deactivated. Verify your network allows outbound WebSocket connections.
|
||||
|
||||
### Bot is offline
|
||||
|
||||
**Cause**: The Hermes gateway isn't running, or it failed to connect.
|
||||
|
||||
**Fix**: Check that `hermes gateway` is running. Look at the terminal output for error messages. Common issues: wrong credentials, app deactivated, `dingtalk-stream` or `httpx` not installed.
|
||||
|
||||
### "No session_webhook available"
|
||||
|
||||
**Cause**: The bot tried to reply but doesn't have a session webhook URL. This typically happens if the webhook expired or the bot was restarted between receiving the message and sending the reply.
|
||||
|
||||
**Fix**: Send a new message to the bot — each incoming message provides a fresh session webhook for replies. This is a normal DingTalk limitation; the bot can only reply to messages it has received recently.
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
Always set `DINGTALK_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
|
||||
:::
|
||||
|
||||
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
|
||||
|
||||
## Notes
|
||||
|
||||
- **Stream Mode**: No public URL, domain name, or webhook server needed. The connection is initiated from your machine via WebSocket, so it works behind NAT and firewalls.
|
||||
- **Markdown responses**: Replies are formatted in DingTalk's markdown format for rich text display.
|
||||
- **Message deduplication**: The adapter deduplicates messages with a 5-minute window to prevent processing the same message twice.
|
||||
- **Auto-reconnection**: If the stream connection drops, the adapter automatically reconnects with exponential backoff.
|
||||
- **Message length limit**: Responses are capped at 20,000 characters per message. Longer responses are truncated.
|
||||
363
hermes_code/website/docs/user-guide/messaging/discord.md
Normal file
363
hermes_code/website/docs/user-guide/messaging/discord.md
Normal file
|
|
@ -0,0 +1,363 @@
|
|||
---
|
||||
sidebar_position: 3
|
||||
title: "Discord"
|
||||
description: "Set up Hermes Agent as a Discord bot"
|
||||
---
|
||||
|
||||
# Discord Setup
|
||||
|
||||
Hermes Agent integrates with Discord as a bot, letting you chat with your AI assistant through direct messages or server channels. The bot receives your messages, processes them through the Hermes Agent pipeline (including tool use, memory, and reasoning), and responds in real time. It supports text, voice messages, file attachments, and slash commands.
|
||||
|
||||
Before setup, here's the part most people want to know: how Hermes behaves once it's in your server.
|
||||
|
||||
## How Hermes Behaves
|
||||
|
||||
| Context | Behavior |
|
||||
|---------|----------|
|
||||
| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
|
||||
| **Server channels** | By default, Hermes only responds when you `@mention` it. If you post in a channel without mentioning it, Hermes ignores the message. |
|
||||
| **Free-response channels** | You can make specific channels mention-free with `DISCORD_FREE_RESPONSE_CHANNELS`, or disable mentions globally with `DISCORD_REQUIRE_MENTION=false`. |
|
||||
| **Threads** | Hermes replies in the same thread. Mention rules still apply unless that thread or its parent channel is configured as free-response. Threads stay isolated from the parent channel for session history. |
|
||||
| **Shared channels with multiple users** | By default, Hermes isolates session history per user inside the channel for safety and clarity. Two people talking in the same channel do not share one transcript unless you explicitly disable that. |
|
||||
|
||||
:::tip
|
||||
If you want a normal bot-help channel where people can talk to Hermes without tagging it every time, add that channel to `DISCORD_FREE_RESPONSE_CHANNELS`.
|
||||
:::
|
||||
|
||||
### Discord Gateway Model
|
||||
|
||||
Hermes on Discord is not a webhook that replies statelessly. It runs through the full messaging gateway, which means each incoming message goes through:
|
||||
|
||||
1. authorization (`DISCORD_ALLOWED_USERS`)
|
||||
2. mention / free-response checks
|
||||
3. session lookup
|
||||
4. session transcript loading
|
||||
5. normal Hermes agent execution, including tools, memory, and slash commands
|
||||
6. response delivery back to Discord
|
||||
|
||||
That matters because behavior in a busy server depends on both Discord routing and Hermes session policy.
|
||||
|
||||
### Session Model in Discord
|
||||
|
||||
By default:
|
||||
|
||||
- each DM gets its own session
|
||||
- each server thread gets its own session namespace
|
||||
- each user in a shared channel gets their own session inside that channel
|
||||
|
||||
So if Alice and Bob both talk to Hermes in `#research`, Hermes treats those as separate conversations by default even though they are using the same visible Discord channel.
|
||||
|
||||
This is controlled by `config.yaml`:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
Set it to `false` only if you explicitly want one shared conversation for the entire room:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: false
|
||||
```
|
||||
|
||||
Shared sessions can be useful for a collaborative room, but they also mean:
|
||||
|
||||
- users share context growth and token costs
|
||||
- one person's long tool-heavy task can bloat everyone else's context
|
||||
- one person's in-flight run can interrupt another person's follow-up in the same room
|
||||
|
||||
### Interrupts and Concurrency
|
||||
|
||||
Hermes tracks running agents by session key.
|
||||
|
||||
With the default `group_sessions_per_user: true`:
|
||||
|
||||
- Alice interrupting her own in-flight request only affects Alice's session in that channel
|
||||
- Bob can keep talking in the same channel without inheriting Alice's history or interrupting Alice's run
|
||||
|
||||
With `group_sessions_per_user: false`:
|
||||
|
||||
- the whole room shares one running-agent slot for that channel/thread
|
||||
- follow-up messages from different people can interrupt or queue behind each other
|
||||
|
||||
This guide walks you through the full setup process — from creating your bot on Discord's Developer Portal to sending your first message.
|
||||
|
||||
## Step 1: Create a Discord Application
|
||||
|
||||
1. Go to the [Discord Developer Portal](https://discord.com/developers/applications) and sign in with your Discord account.
|
||||
2. Click **New Application** in the top-right corner.
|
||||
3. Enter a name for your application (e.g., "Hermes Agent") and accept the Developer Terms of Service.
|
||||
4. Click **Create**.
|
||||
|
||||
You'll land on the **General Information** page. Note the **Application ID** — you'll need it later to build the invite URL.
|
||||
|
||||
## Step 2: Create the Bot
|
||||
|
||||
1. In the left sidebar, click **Bot**.
|
||||
2. Discord automatically creates a bot user for your application. You'll see the bot's username, which you can customize.
|
||||
3. Under **Authorization Flow**:
|
||||
- Set **Public Bot** to **OFF** — this prevents other people from inviting your bot to their servers.
|
||||
- Leave **Require OAuth2 Code Grant** set to **OFF**.
|
||||
|
||||
:::tip
|
||||
You can set a custom avatar and banner for your bot on this page. This is what users will see in Discord.
|
||||
:::
|
||||
|
||||
## Step 3: Enable Privileged Gateway Intents
|
||||
|
||||
This is the most critical step in the entire setup. Without the correct intents enabled, your bot will connect to Discord but **will not be able to read message content**.
|
||||
|
||||
On the **Bot** page, scroll down to **Privileged Gateway Intents**. You'll see three toggles:
|
||||
|
||||
| Intent | Purpose | Required? |
|
||||
|--------|---------|-----------|
|
||||
| **Presence Intent** | See user online/offline status | Optional |
|
||||
| **Server Members Intent** | Access the member list, resolve usernames | **Required** |
|
||||
| **Message Content Intent** | Read the text content of messages | **Required** |
|
||||
|
||||
**Enable both Server Members Intent and Message Content Intent** by toggling them **ON**.
|
||||
|
||||
- Without **Message Content Intent**, your bot receives message events but the message text is empty — the bot literally cannot see what you typed.
|
||||
- Without **Server Members Intent**, the bot cannot resolve usernames for the allowed users list and may fail to identify who is messaging it.
|
||||
|
||||
:::warning[This is the #1 reason Discord bots don't work]
|
||||
If your bot is online but never responds to messages, the **Message Content Intent** is almost certainly disabled. Go back to the [Developer Portal](https://discord.com/developers/applications), select your application → Bot → Privileged Gateway Intents, and make sure **Message Content Intent** is toggled ON. Click **Save Changes**.
|
||||
:::
|
||||
|
||||
**Regarding server count:**
|
||||
- If your bot is in **fewer than 100 servers**, you can simply toggle intents on and off freely.
|
||||
- If your bot is in **100 or more servers**, Discord requires you to submit a verification application to use privileged intents. For personal use, this is not a concern.
|
||||
|
||||
Click **Save Changes** at the bottom of the page.
|
||||
|
||||
## Step 4: Get the Bot Token
|
||||
|
||||
The bot token is the credential Hermes Agent uses to log in as your bot. Still on the **Bot** page:
|
||||
|
||||
1. Under the **Token** section, click **Reset Token**.
|
||||
2. If you have two-factor authentication enabled on your Discord account, enter your 2FA code.
|
||||
3. Discord will display your new token. **Copy it immediately.**
|
||||
|
||||
:::warning[Token shown only once]
|
||||
The token is only displayed once. If you lose it, you'll need to reset it and generate a new one. Never share your token publicly or commit it to Git — anyone with this token has full control of your bot.
|
||||
:::
|
||||
|
||||
Store the token somewhere safe (a password manager, for example). You'll need it in Step 8.
|
||||
|
||||
## Step 5: Generate the Invite URL
|
||||
|
||||
You need an OAuth2 URL to invite the bot to your server. There are two ways to do this:
|
||||
|
||||
### Option A: Using the Installation Tab (Recommended)
|
||||
|
||||
1. In the left sidebar, click **Installation**.
|
||||
2. Under **Installation Contexts**, enable **Guild Install**.
|
||||
3. For **Install Link**, select **Discord Provided Link**.
|
||||
4. Under **Default Install Settings** for Guild Install:
|
||||
- **Scopes**: select `bot` and `applications.commands`
|
||||
- **Permissions**: select the permissions listed below.
|
||||
|
||||
### Option B: Manual URL
|
||||
|
||||
You can construct the invite URL directly using this format:
|
||||
|
||||
```
|
||||
https://discord.com/oauth2/authorize?client_id=YOUR_APP_ID&scope=bot+applications.commands&permissions=274878286912
|
||||
```
|
||||
|
||||
Replace `YOUR_APP_ID` with the Application ID from Step 1.
|
||||
|
||||
### Required Permissions
|
||||
|
||||
These are the minimum permissions your bot needs:
|
||||
|
||||
- **View Channels** — see the channels it has access to
|
||||
- **Send Messages** — respond to your messages
|
||||
- **Embed Links** — format rich responses
|
||||
- **Attach Files** — send images, audio, and file outputs
|
||||
- **Read Message History** — maintain conversation context
|
||||
|
||||
### Recommended Additional Permissions
|
||||
|
||||
- **Send Messages in Threads** — respond in thread conversations
|
||||
- **Add Reactions** — react to messages for acknowledgment
|
||||
|
||||
### Permission Integers
|
||||
|
||||
| Level | Permissions Integer | What's Included |
|
||||
|-------|-------------------|-----------------|
|
||||
| Minimal | `117760` | View Channels, Send Messages, Read Message History, Attach Files |
|
||||
| Recommended | `274878286912` | All of the above plus Embed Links, Send Messages in Threads, Add Reactions |
|
||||
|
||||
## Step 6: Invite to Your Server
|
||||
|
||||
1. Open the invite URL in your browser (from the Installation tab or the manual URL you constructed).
|
||||
2. In the **Add to Server** dropdown, select your server.
|
||||
3. Click **Continue**, then **Authorize**.
|
||||
4. Complete the CAPTCHA if prompted.
|
||||
|
||||
:::info
|
||||
You need the **Manage Server** permission on the Discord server to invite a bot. If you don't see your server in the dropdown, ask a server admin to use the invite link instead.
|
||||
:::
|
||||
|
||||
After authorizing, the bot will appear in your server's member list (it will show as offline until you start the Hermes gateway).
|
||||
|
||||
## Step 7: Find Your Discord User ID
|
||||
|
||||
Hermes Agent uses your Discord User ID to control who can interact with the bot. To find it:
|
||||
|
||||
1. Open Discord (desktop or web app).
|
||||
2. Go to **Settings** → **Advanced** → toggle **Developer Mode** to **ON**.
|
||||
3. Close settings.
|
||||
4. Right-click your own username (in a message, the member list, or your profile) → **Copy User ID**.
|
||||
|
||||
Your User ID is a long number like `284102345871466496`.
|
||||
|
||||
:::tip
|
||||
Developer Mode also lets you copy **Channel IDs** and **Server IDs** the same way — right-click the channel or server name and select Copy ID. You'll need a Channel ID if you want to set a home channel manually.
|
||||
:::
|
||||
|
||||
## Step 8: Configure Hermes Agent
|
||||
|
||||
### Option A: Interactive Setup (Recommended)
|
||||
|
||||
Run the guided setup command:
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **Discord** when prompted, then paste your bot token and user ID when asked.
|
||||
|
||||
### Option B: Manual Configuration
|
||||
|
||||
Add the following to your `~/.hermes/.env` file:
|
||||
|
||||
```bash
|
||||
# Required
|
||||
DISCORD_BOT_TOKEN=your-bot-token
|
||||
DISCORD_ALLOWED_USERS=284102345871466496
|
||||
|
||||
# Multiple allowed users (comma-separated)
|
||||
# DISCORD_ALLOWED_USERS=284102345871466496,198765432109876543
|
||||
|
||||
# Optional: respond without @mention (default: true = require mention)
|
||||
# DISCORD_REQUIRE_MENTION=false
|
||||
|
||||
# Optional: channels where bot responds without @mention (comma-separated channel IDs)
|
||||
# DISCORD_FREE_RESPONSE_CHANNELS=1234567890,9876543210
|
||||
```
|
||||
|
||||
Optional behavior settings in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
discord:
|
||||
require_mention: true
|
||||
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
- `discord.require_mention: true` keeps Hermes quiet in normal server traffic unless mentioned
|
||||
- `group_sessions_per_user: true` keeps each participant's context isolated inside shared channels and threads
|
||||
|
||||
### Start the Gateway
|
||||
|
||||
Once configured, start the Discord gateway:
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
The bot should come online in Discord within a few seconds. Send it a message — either a DM or in a channel it can see — to test.
|
||||
|
||||
:::tip
|
||||
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
|
||||
:::
|
||||
|
||||
## Home Channel
|
||||
|
||||
You can designate a "home channel" where the bot sends proactive messages (such as cron job output, reminders, and notifications). There are two ways to set it:
|
||||
|
||||
### Using the Slash Command
|
||||
|
||||
Type `/sethome` in any Discord channel where the bot is present. That channel becomes the home channel.
|
||||
|
||||
### Manual Configuration
|
||||
|
||||
Add these to your `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
DISCORD_HOME_CHANNEL=123456789012345678
|
||||
DISCORD_HOME_CHANNEL_NAME="#bot-updates"
|
||||
```
|
||||
|
||||
Replace the ID with the actual channel ID (right-click → Copy Channel ID with Developer Mode on).
|
||||
|
||||
## Voice Messages
|
||||
|
||||
Hermes Agent supports Discord voice messages:
|
||||
|
||||
- **Incoming voice messages** are automatically transcribed using the configured STT provider: local `faster-whisper` (no key), Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`).
|
||||
- **Text-to-speech**: Use `/voice tts` to have the bot send spoken audio responses alongside text replies.
|
||||
- **Discord voice channels**: Hermes can also join a voice channel, listen to users speaking, and talk back in the channel.
|
||||
|
||||
For the full setup and operational guide, see:
|
||||
- [Voice Mode](/docs/user-guide/features/voice-mode)
|
||||
- [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Bot is online but not responding to messages
|
||||
|
||||
**Cause**: Message Content Intent is disabled.
|
||||
|
||||
**Fix**: Go to [Developer Portal](https://discord.com/developers/applications) → your app → Bot → Privileged Gateway Intents → enable **Message Content Intent** → Save Changes. Restart the gateway.
|
||||
|
||||
### "Disallowed Intents" error on startup
|
||||
|
||||
**Cause**: Your code requests intents that aren't enabled in the Developer Portal.
|
||||
|
||||
**Fix**: Enable all three Privileged Gateway Intents (Presence, Server Members, Message Content) in the Bot settings, then restart.
|
||||
|
||||
### Bot can't see messages in a specific channel
|
||||
|
||||
**Cause**: The bot's role doesn't have permission to view that channel.
|
||||
|
||||
**Fix**: In Discord, go to the channel's settings → Permissions → add the bot's role with **View Channel** and **Read Message History** enabled.
|
||||
|
||||
### 403 Forbidden errors
|
||||
|
||||
**Cause**: The bot is missing required permissions.
|
||||
|
||||
**Fix**: Re-invite the bot with the correct permissions using the URL from Step 5, or manually adjust the bot's role permissions in Server Settings → Roles.
|
||||
|
||||
### Bot is offline
|
||||
|
||||
**Cause**: The Hermes gateway isn't running, or the token is incorrect.
|
||||
|
||||
**Fix**: Check that `hermes gateway` is running. Verify `DISCORD_BOT_TOKEN` in your `.env` file. If you recently reset the token, update it.
|
||||
|
||||
### "User not allowed" / Bot ignores you
|
||||
|
||||
**Cause**: Your User ID isn't in `DISCORD_ALLOWED_USERS`.
|
||||
|
||||
**Fix**: Add your User ID to `DISCORD_ALLOWED_USERS` in `~/.hermes/.env` and restart the gateway.
|
||||
|
||||
### People in the same channel are sharing context unexpectedly
|
||||
|
||||
**Cause**: `group_sessions_per_user` is disabled, or the platform cannot provide a user ID for the messages in that context.
|
||||
|
||||
**Fix**: Set this in `~/.hermes/config.yaml` and restart the gateway:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
If you intentionally want a shared room conversation, leave it off — just expect shared transcript history and shared interrupt behavior.
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
Always set `DISCORD_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
|
||||
:::
|
||||
|
||||
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
|
||||
189
hermes_code/website/docs/user-guide/messaging/email.md
Normal file
189
hermes_code/website/docs/user-guide/messaging/email.md
Normal file
|
|
@ -0,0 +1,189 @@
|
|||
---
|
||||
sidebar_position: 7
|
||||
title: "Email"
|
||||
description: "Set up Hermes Agent as an email assistant via IMAP/SMTP"
|
||||
---
|
||||
|
||||
# Email Setup
|
||||
|
||||
Hermes can receive and reply to emails using standard IMAP and SMTP protocols. Send an email to the agent's address and it replies in-thread — no special client or bot API needed. Works with Gmail, Outlook, Yahoo, Fastmail, or any provider that supports IMAP/SMTP.
|
||||
|
||||
:::info No External Dependencies
|
||||
The Email adapter uses Python's built-in `imaplib`, `smtplib`, and `email` modules. No additional packages or external services are required.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **A dedicated email account** for your Hermes agent (don't use your personal email)
|
||||
- **IMAP enabled** on the email account
|
||||
- **An app password** if using Gmail or another provider with 2FA
|
||||
|
||||
### Gmail Setup
|
||||
|
||||
1. Enable 2-Factor Authentication on your Google Account
|
||||
2. Go to [App Passwords](https://myaccount.google.com/apppasswords)
|
||||
3. Create a new App Password (select "Mail" or "Other")
|
||||
4. Copy the 16-character password — you'll use this instead of your regular password
|
||||
|
||||
### Outlook / Microsoft 365
|
||||
|
||||
1. Go to [Security Settings](https://account.microsoft.com/security)
|
||||
2. Enable 2FA if not already active
|
||||
3. Create an App Password under "Additional security options"
|
||||
4. IMAP host: `outlook.office365.com`, SMTP host: `smtp.office365.com`
|
||||
|
||||
### Other Providers
|
||||
|
||||
Most email providers support IMAP/SMTP. Check your provider's documentation for:
|
||||
- IMAP host and port (usually port 993 with SSL)
|
||||
- SMTP host and port (usually port 587 with STARTTLS)
|
||||
- Whether app passwords are required
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Configure Hermes
|
||||
|
||||
The easiest way:
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **Email** from the platform menu. The wizard prompts for your email address, password, IMAP/SMTP hosts, and allowed senders.
|
||||
|
||||
### Manual Configuration
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
# Required
|
||||
EMAIL_ADDRESS=hermes@gmail.com
|
||||
EMAIL_PASSWORD=abcd efgh ijkl mnop # App password (not your regular password)
|
||||
EMAIL_IMAP_HOST=imap.gmail.com
|
||||
EMAIL_SMTP_HOST=smtp.gmail.com
|
||||
|
||||
# Security (recommended)
|
||||
EMAIL_ALLOWED_USERS=your@email.com,colleague@work.com
|
||||
|
||||
# Optional
|
||||
EMAIL_IMAP_PORT=993 # Default: 993 (IMAP SSL)
|
||||
EMAIL_SMTP_PORT=587 # Default: 587 (SMTP STARTTLS)
|
||||
EMAIL_POLL_INTERVAL=15 # Seconds between inbox checks (default: 15)
|
||||
EMAIL_HOME_ADDRESS=your@email.com # Default delivery target for cron jobs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Start the Gateway
|
||||
|
||||
```bash
|
||||
hermes gateway # Run in foreground
|
||||
hermes gateway install # Install as a user service
|
||||
sudo hermes gateway install --system # Linux only: boot-time system service
|
||||
```
|
||||
|
||||
On startup, the adapter:
|
||||
1. Tests IMAP and SMTP connections
|
||||
2. Marks all existing inbox messages as "seen" (only processes new emails)
|
||||
3. Starts polling for new messages
|
||||
|
||||
---
|
||||
|
||||
## How It Works
|
||||
|
||||
### Receiving Messages
|
||||
|
||||
The adapter polls the IMAP inbox for UNSEEN messages at a configurable interval (default: 15 seconds). For each new email:
|
||||
|
||||
- **Subject line** is included as context (e.g., `[Subject: Deploy to production]`)
|
||||
- **Reply emails** (subject starting with `Re:`) skip the subject prefix — the thread context is already established
|
||||
- **Attachments** are cached locally:
|
||||
- Images (JPEG, PNG, GIF, WebP) → available to the vision tool
|
||||
- Documents (PDF, ZIP, etc.) → available for file access
|
||||
- **HTML-only emails** have tags stripped for plain text extraction
|
||||
- **Self-messages** are filtered out to prevent reply loops
|
||||
|
||||
### Sending Replies
|
||||
|
||||
Replies are sent via SMTP with proper email threading:
|
||||
|
||||
- **In-Reply-To** and **References** headers maintain the thread
|
||||
- **Subject line** preserved with `Re:` prefix (no double `Re: Re:`)
|
||||
- **Message-ID** generated with the agent's domain
|
||||
- Responses are sent as plain text (UTF-8)
|
||||
|
||||
### File Attachments
|
||||
|
||||
The agent can send file attachments in replies. Include `MEDIA:/path/to/file` in the response and the file is attached to the outgoing email.
|
||||
|
||||
### Skipping Attachments
|
||||
|
||||
To ignore all incoming attachments (for malware protection or bandwidth savings), add to your `config.yaml`:
|
||||
|
||||
```yaml
|
||||
platforms:
|
||||
email:
|
||||
skip_attachments: true
|
||||
```
|
||||
|
||||
When enabled, attachment and inline parts are skipped before payload decoding. The email body text is still processed normally.
|
||||
|
||||
---
|
||||
|
||||
## Access Control
|
||||
|
||||
Email access follows the same pattern as all other Hermes platforms:
|
||||
|
||||
1. **`EMAIL_ALLOWED_USERS` set** → only emails from those addresses are processed
|
||||
2. **No allowlist set** → unknown senders get a pairing code
|
||||
3. **`EMAIL_ALLOW_ALL_USERS=true`** → any sender is accepted (use with caution)
|
||||
|
||||
:::warning
|
||||
**Always configure `EMAIL_ALLOWED_USERS`.** Without it, anyone who knows the agent's email address could send commands. The agent has terminal access by default.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Solution |
|
||||
|---------|----------|
|
||||
| **"IMAP connection failed"** at startup | Verify `EMAIL_IMAP_HOST` and `EMAIL_IMAP_PORT`. Ensure IMAP is enabled on the account. For Gmail, enable it in Settings → Forwarding and POP/IMAP. |
|
||||
| **"SMTP connection failed"** at startup | Verify `EMAIL_SMTP_HOST` and `EMAIL_SMTP_PORT`. Check that your password is correct (use App Password for Gmail). |
|
||||
| **Messages not received** | Check `EMAIL_ALLOWED_USERS` includes the sender's email. Check spam folder — some providers flag automated replies. |
|
||||
| **"Authentication failed"** | For Gmail, you must use an App Password, not your regular password. Ensure 2FA is enabled first. |
|
||||
| **Duplicate replies** | Ensure only one gateway instance is running. Check `hermes gateway status`. |
|
||||
| **Slow response** | The default poll interval is 15 seconds. Reduce with `EMAIL_POLL_INTERVAL=5` for faster response (but more IMAP connections). |
|
||||
| **Replies not threading** | The adapter uses In-Reply-To headers. Some email clients (especially web-based) may not thread correctly with automated messages. |
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
**Use a dedicated email account.** Don't use your personal email — the agent stores the password in `.env` and has full inbox access via IMAP.
|
||||
:::
|
||||
|
||||
- Use **App Passwords** instead of your main password (required for Gmail with 2FA)
|
||||
- Set `EMAIL_ALLOWED_USERS` to restrict who can interact with the agent
|
||||
- The password is stored in `~/.hermes/.env` — protect this file (`chmod 600`)
|
||||
- IMAP uses SSL (port 993) and SMTP uses STARTTLS (port 587) by default — connections are encrypted
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables Reference
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `EMAIL_ADDRESS` | Yes | — | Agent's email address |
|
||||
| `EMAIL_PASSWORD` | Yes | — | Email password or app password |
|
||||
| `EMAIL_IMAP_HOST` | Yes | — | IMAP server host (e.g., `imap.gmail.com`) |
|
||||
| `EMAIL_SMTP_HOST` | Yes | — | SMTP server host (e.g., `smtp.gmail.com`) |
|
||||
| `EMAIL_IMAP_PORT` | No | `993` | IMAP server port |
|
||||
| `EMAIL_SMTP_PORT` | No | `587` | SMTP server port |
|
||||
| `EMAIL_POLL_INTERVAL` | No | `15` | Seconds between inbox checks |
|
||||
| `EMAIL_ALLOWED_USERS` | No | — | Comma-separated allowed sender addresses |
|
||||
| `EMAIL_HOME_ADDRESS` | No | — | Default delivery target for cron jobs |
|
||||
| `EMAIL_ALLOW_ALL_USERS` | No | `false` | Allow all senders (not recommended) |
|
||||
249
hermes_code/website/docs/user-guide/messaging/homeassistant.md
Normal file
249
hermes_code/website/docs/user-guide/messaging/homeassistant.md
Normal file
|
|
@ -0,0 +1,249 @@
|
|||
---
|
||||
title: Home Assistant
|
||||
description: Control your smart home with Hermes Agent via Home Assistant integration.
|
||||
sidebar_label: Home Assistant
|
||||
sidebar_position: 5
|
||||
---
|
||||
|
||||
# Home Assistant Integration
|
||||
|
||||
Hermes Agent integrates with [Home Assistant](https://www.home-assistant.io/) in two ways:
|
||||
|
||||
1. **Gateway platform** — subscribes to real-time state changes via WebSocket and responds to events
|
||||
2. **Smart home tools** — four LLM-callable tools for querying and controlling devices via the REST API
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Create a Long-Lived Access Token
|
||||
|
||||
1. Open your Home Assistant instance
|
||||
2. Go to your **Profile** (click your name in the sidebar)
|
||||
3. Scroll to **Long-Lived Access Tokens**
|
||||
4. Click **Create Token**, give it a name like "Hermes Agent"
|
||||
5. Copy the token
|
||||
|
||||
### 2. Configure Environment Variables
|
||||
|
||||
```bash
|
||||
# Add to ~/.hermes/.env
|
||||
|
||||
# Required: your Long-Lived Access Token
|
||||
HASS_TOKEN=your-long-lived-access-token
|
||||
|
||||
# Optional: HA URL (default: http://homeassistant.local:8123)
|
||||
HASS_URL=http://192.168.1.100:8123
|
||||
```
|
||||
|
||||
:::info
|
||||
The `homeassistant` toolset is automatically enabled when `HASS_TOKEN` is set. Both the gateway platform and the device control tools activate from this single token.
|
||||
:::
|
||||
|
||||
### 3. Start the Gateway
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
Home Assistant will appear as a connected platform alongside any other messaging platforms (Telegram, Discord, etc.).
|
||||
|
||||
## Available Tools
|
||||
|
||||
Hermes Agent registers four tools for smart home control:
|
||||
|
||||
### `ha_list_entities`
|
||||
|
||||
List Home Assistant entities, optionally filtered by domain or area.
|
||||
|
||||
**Parameters:**
|
||||
- `domain` *(optional)* — Filter by entity domain: `light`, `switch`, `climate`, `sensor`, `binary_sensor`, `cover`, `fan`, `media_player`, etc.
|
||||
- `area` *(optional)* — Filter by area/room name (matches against friendly names): `living room`, `kitchen`, `bedroom`, etc.
|
||||
|
||||
**Example:**
|
||||
```
|
||||
List all lights in the living room
|
||||
```
|
||||
|
||||
Returns entity IDs, states, and friendly names.
|
||||
|
||||
### `ha_get_state`
|
||||
|
||||
Get detailed state of a single entity, including all attributes (brightness, color, temperature setpoint, sensor readings, etc.).
|
||||
|
||||
**Parameters:**
|
||||
- `entity_id` *(required)* — The entity to query, e.g., `light.living_room`, `climate.thermostat`, `sensor.temperature`
|
||||
|
||||
**Example:**
|
||||
```
|
||||
What's the current state of climate.thermostat?
|
||||
```
|
||||
|
||||
Returns: state, all attributes, last changed/updated timestamps.
|
||||
|
||||
### `ha_list_services`
|
||||
|
||||
List available services (actions) for device control. Shows what actions can be performed on each device type and what parameters they accept.
|
||||
|
||||
**Parameters:**
|
||||
- `domain` *(optional)* — Filter by domain, e.g., `light`, `climate`, `switch`
|
||||
|
||||
**Example:**
|
||||
```
|
||||
What services are available for climate devices?
|
||||
```
|
||||
|
||||
### `ha_call_service`
|
||||
|
||||
Call a Home Assistant service to control a device.
|
||||
|
||||
**Parameters:**
|
||||
- `domain` *(required)* — Service domain: `light`, `switch`, `climate`, `cover`, `media_player`, `fan`, `scene`, `script`
|
||||
- `service` *(required)* — Service name: `turn_on`, `turn_off`, `toggle`, `set_temperature`, `set_hvac_mode`, `open_cover`, `close_cover`, `set_volume_level`
|
||||
- `entity_id` *(optional)* — Target entity, e.g., `light.living_room`
|
||||
- `data` *(optional)* — Additional parameters as a JSON object
|
||||
|
||||
**Examples:**
|
||||
|
||||
```
|
||||
Turn on the living room lights
|
||||
→ ha_call_service(domain="light", service="turn_on", entity_id="light.living_room")
|
||||
```
|
||||
|
||||
```
|
||||
Set the thermostat to 22 degrees in heat mode
|
||||
→ ha_call_service(domain="climate", service="set_temperature",
|
||||
entity_id="climate.thermostat", data={"temperature": 22, "hvac_mode": "heat"})
|
||||
```
|
||||
|
||||
```
|
||||
Set living room lights to blue at 50% brightness
|
||||
→ ha_call_service(domain="light", service="turn_on",
|
||||
entity_id="light.living_room", data={"brightness": 128, "color_name": "blue"})
|
||||
```
|
||||
|
||||
## Gateway Platform: Real-Time Events
|
||||
|
||||
The Home Assistant gateway adapter connects via WebSocket and subscribes to `state_changed` events. When a device state changes and matches your filters, it's forwarded to the agent as a message.
|
||||
|
||||
### Event Filtering
|
||||
|
||||
:::warning Required Configuration
|
||||
By default, **no events are forwarded**. You must configure at least one of `watch_domains`, `watch_entities`, or `watch_all` to receive events. Without filters, a warning is logged at startup and all state changes are silently dropped.
|
||||
:::
|
||||
|
||||
Configure which events the agent sees in `~/.hermes/gateway.json` under the Home Assistant platform's `extra` section:
|
||||
|
||||
```json
|
||||
{
|
||||
"platforms": {
|
||||
"homeassistant": {
|
||||
"enabled": true,
|
||||
"extra": {
|
||||
"watch_domains": ["climate", "binary_sensor", "alarm_control_panel", "light"],
|
||||
"watch_entities": ["sensor.front_door_battery"],
|
||||
"ignore_entities": ["sensor.uptime", "sensor.cpu_usage", "sensor.memory_usage"],
|
||||
"cooldown_seconds": 30
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
| Setting | Default | Description |
|
||||
|---------|---------|-------------|
|
||||
| `watch_domains` | *(none)* | Only watch these entity domains (e.g., `climate`, `light`, `binary_sensor`) |
|
||||
| `watch_entities` | *(none)* | Only watch these specific entity IDs |
|
||||
| `watch_all` | `false` | Set to `true` to receive **all** state changes (not recommended for most setups) |
|
||||
| `ignore_entities` | *(none)* | Always ignore these entities (applied before domain/entity filters) |
|
||||
| `cooldown_seconds` | `30` | Minimum seconds between events for the same entity |
|
||||
|
||||
:::tip
|
||||
Start with a focused set of domains — `climate`, `binary_sensor`, and `alarm_control_panel` cover the most useful automations. Add more as needed. Use `ignore_entities` to suppress noisy sensors like CPU temperature or uptime counters.
|
||||
:::
|
||||
|
||||
### Event Formatting
|
||||
|
||||
State changes are formatted as human-readable messages based on domain:
|
||||
|
||||
| Domain | Format |
|
||||
|--------|--------|
|
||||
| `climate` | "HVAC mode changed from 'off' to 'heat' (current: 21, target: 23)" |
|
||||
| `sensor` | "changed from 21°C to 22°C" |
|
||||
| `binary_sensor` | "triggered" / "cleared" |
|
||||
| `light`, `switch`, `fan` | "turned on" / "turned off" |
|
||||
| `alarm_control_panel` | "alarm state changed from 'armed_away' to 'triggered'" |
|
||||
| *(other)* | "changed from 'old' to 'new'" |
|
||||
|
||||
### Agent Responses
|
||||
|
||||
Outbound messages from the agent are delivered as **Home Assistant persistent notifications** (via `persistent_notification.create`). These appear in the HA notification panel with the title "Hermes Agent".
|
||||
|
||||
### Connection Management
|
||||
|
||||
- **WebSocket** with 30-second heartbeat for real-time events
|
||||
- **Automatic reconnection** with backoff: 5s → 10s → 30s → 60s
|
||||
- **REST API** for outbound notifications (separate session to avoid WebSocket conflicts)
|
||||
- **Authorization** — HA events are always authorized (no user allowlist needed, since the `HASS_TOKEN` authenticates the connection)
|
||||
|
||||
## Security
|
||||
|
||||
The Home Assistant tools enforce security restrictions:
|
||||
|
||||
:::warning Blocked Domains
|
||||
The following service domains are **blocked** to prevent arbitrary code execution on the HA host:
|
||||
|
||||
- `shell_command` — arbitrary shell commands
|
||||
- `command_line` — sensors/switches that execute commands
|
||||
- `python_script` — scripted Python execution
|
||||
- `pyscript` — broader scripting integration
|
||||
- `hassio` — addon control, host shutdown/reboot
|
||||
- `rest_command` — HTTP requests from HA server (SSRF vector)
|
||||
|
||||
Attempting to call services in these domains returns an error.
|
||||
:::
|
||||
|
||||
Entity IDs are validated against the pattern `^[a-z_][a-z0-9_]*\.[a-z0-9_]+$` to prevent injection attacks.
|
||||
|
||||
## Example Automations
|
||||
|
||||
### Morning Routine
|
||||
|
||||
```
|
||||
User: Start my morning routine
|
||||
|
||||
Agent:
|
||||
1. ha_call_service(domain="light", service="turn_on",
|
||||
entity_id="light.bedroom", data={"brightness": 128})
|
||||
2. ha_call_service(domain="climate", service="set_temperature",
|
||||
entity_id="climate.thermostat", data={"temperature": 22})
|
||||
3. ha_call_service(domain="media_player", service="turn_on",
|
||||
entity_id="media_player.kitchen_speaker")
|
||||
```
|
||||
|
||||
### Security Check
|
||||
|
||||
```
|
||||
User: Is the house secure?
|
||||
|
||||
Agent:
|
||||
1. ha_list_entities(domain="binary_sensor")
|
||||
→ checks door/window sensors
|
||||
2. ha_get_state(entity_id="alarm_control_panel.home")
|
||||
→ checks alarm status
|
||||
3. ha_list_entities(domain="lock")
|
||||
→ checks lock states
|
||||
4. Reports: "All doors closed, alarm is armed_away, all locks engaged."
|
||||
```
|
||||
|
||||
### Reactive Automation (via Gateway Events)
|
||||
|
||||
When connected as a gateway platform, the agent can react to events:
|
||||
|
||||
```
|
||||
[Home Assistant] Front Door: triggered (was cleared)
|
||||
|
||||
Agent automatically:
|
||||
1. ha_get_state(entity_id="binary_sensor.front_door")
|
||||
2. ha_call_service(domain="light", service="turn_on",
|
||||
entity_id="light.hallway")
|
||||
3. Sends notification: "Front door opened. Hallway lights turned on."
|
||||
```
|
||||
332
hermes_code/website/docs/user-guide/messaging/index.md
Normal file
332
hermes_code/website/docs/user-guide/messaging/index.md
Normal file
|
|
@ -0,0 +1,332 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: "Messaging Gateway"
|
||||
description: "Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Home Assistant, Mattermost, Matrix, DingTalk, Webhooks, or any OpenAI-compatible frontend via the API server — architecture and setup overview"
|
||||
---
|
||||
|
||||
# Messaging Gateway
|
||||
|
||||
Chat with Hermes from Telegram, Discord, Slack, WhatsApp, Signal, SMS, Email, Home Assistant, Mattermost, Matrix, DingTalk, or your browser. The gateway is a single background process that connects to all your configured platforms, handles sessions, runs cron jobs, and delivers voice messages.
|
||||
|
||||
For the full voice feature set — including CLI microphone mode, spoken replies in messaging, and Discord voice-channel conversations — see [Voice Mode](/docs/user-guide/features/voice-mode) and [Use Voice Mode with Hermes](/docs/guides/use-voice-mode-with-hermes).
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart TB
|
||||
subgraph Gateway["Hermes Gateway"]
|
||||
subgraph Adapters["Platform adapters"]
|
||||
tg[Telegram]
|
||||
dc[Discord]
|
||||
wa[WhatsApp]
|
||||
sl[Slack]
|
||||
sig[Signal]
|
||||
sms[SMS]
|
||||
em[Email]
|
||||
ha[Home Assistant]
|
||||
mm[Mattermost]
|
||||
mx[Matrix]
|
||||
dt[DingTalk]
|
||||
api["API Server<br/>(OpenAI-compatible)"]
|
||||
wh[Webhooks]
|
||||
end
|
||||
|
||||
store["Session store<br/>per chat"]
|
||||
agent["AIAgent<br/>run_agent.py"]
|
||||
cron["Cron scheduler<br/>ticks every 60s"]
|
||||
end
|
||||
|
||||
tg --> store
|
||||
dc --> store
|
||||
wa --> store
|
||||
sl --> store
|
||||
sig --> store
|
||||
sms --> store
|
||||
em --> store
|
||||
ha --> store
|
||||
mm --> store
|
||||
mx --> store
|
||||
dt --> store
|
||||
api --> store
|
||||
wh --> store
|
||||
store --> agent
|
||||
cron --> store
|
||||
```
|
||||
|
||||
Each platform adapter receives messages, routes them through a per-chat session store, and dispatches them to the AIAgent for processing. The gateway also runs the cron scheduler, ticking every 60 seconds to execute any due jobs.
|
||||
|
||||
## Quick Setup
|
||||
|
||||
The easiest way to configure messaging platforms is the interactive wizard:
|
||||
|
||||
```bash
|
||||
hermes gateway setup # Interactive setup for all messaging platforms
|
||||
```
|
||||
|
||||
This walks you through configuring each platform with arrow-key selection, shows which platforms are already configured, and offers to start/restart the gateway when done.
|
||||
|
||||
## Gateway Commands
|
||||
|
||||
```bash
|
||||
hermes gateway # Run in foreground
|
||||
hermes gateway setup # Configure messaging platforms interactively
|
||||
hermes gateway install # Install as a user service (Linux) / launchd service (macOS)
|
||||
sudo hermes gateway install --system # Linux only: install a boot-time system service
|
||||
hermes gateway start # Start the default service
|
||||
hermes gateway stop # Stop the default service
|
||||
hermes gateway status # Check default service status
|
||||
hermes gateway status --system # Linux only: inspect the system service explicitly
|
||||
```
|
||||
|
||||
## Chat Commands (Inside Messaging)
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `/new` or `/reset` | Start a fresh conversation |
|
||||
| `/model [provider:model]` | Show or change the model (supports `provider:model` syntax) |
|
||||
| `/provider` | Show available providers with auth status |
|
||||
| `/personality [name]` | Set a personality |
|
||||
| `/retry` | Retry the last message |
|
||||
| `/undo` | Remove the last exchange |
|
||||
| `/status` | Show session info |
|
||||
| `/stop` | Stop the running agent |
|
||||
| `/approve` | Approve a pending dangerous command |
|
||||
| `/deny` | Reject a pending dangerous command |
|
||||
| `/sethome` | Set this chat as the home channel |
|
||||
| `/compress` | Manually compress conversation context |
|
||||
| `/title [name]` | Set or show the session title |
|
||||
| `/resume [name]` | Resume a previously named session |
|
||||
| `/usage` | Show token usage for this session |
|
||||
| `/insights [days]` | Show usage insights and analytics |
|
||||
| `/reasoning [level\|show\|hide]` | Change reasoning effort or toggle reasoning display |
|
||||
| `/voice [on\|off\|tts\|join\|leave\|status]` | Control messaging voice replies and Discord voice-channel behavior |
|
||||
| `/rollback [number]` | List or restore filesystem checkpoints |
|
||||
| `/background <prompt>` | Run a prompt in a separate background session |
|
||||
| `/reload-mcp` | Reload MCP servers from config |
|
||||
| `/update` | Update Hermes Agent to the latest version |
|
||||
| `/help` | Show available commands |
|
||||
| `/<skill-name>` | Invoke any installed skill |
|
||||
|
||||
## Session Management
|
||||
|
||||
### Session Persistence
|
||||
|
||||
Sessions persist across messages until they reset. The agent remembers your conversation context.
|
||||
|
||||
### Reset Policies
|
||||
|
||||
Sessions reset based on configurable policies:
|
||||
|
||||
| Policy | Default | Description |
|
||||
|--------|---------|-------------|
|
||||
| Daily | 4:00 AM | Reset at a specific hour each day |
|
||||
| Idle | 1440 min | Reset after N minutes of inactivity |
|
||||
| Both | (combined) | Whichever triggers first |
|
||||
|
||||
Configure per-platform overrides in `~/.hermes/gateway.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"reset_by_platform": {
|
||||
"telegram": { "mode": "idle", "idle_minutes": 240 },
|
||||
"discord": { "mode": "idle", "idle_minutes": 60 }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
**By default, the gateway denies all users who are not in an allowlist or paired via DM.** This is the safe default for a bot with terminal access.
|
||||
|
||||
```bash
|
||||
# Restrict to specific users (recommended):
|
||||
TELEGRAM_ALLOWED_USERS=123456789,987654321
|
||||
DISCORD_ALLOWED_USERS=123456789012345678
|
||||
SIGNAL_ALLOWED_USERS=+155****4567,+155****6543
|
||||
SMS_ALLOWED_USERS=+155****4567,+155****6543
|
||||
EMAIL_ALLOWED_USERS=trusted@example.com,colleague@work.com
|
||||
MATTERMOST_ALLOWED_USERS=3uo8dkh1p7g1mfk49ear5fzs5c
|
||||
MATRIX_ALLOWED_USERS=@alice:matrix.org
|
||||
DINGTALK_ALLOWED_USERS=user-id-1
|
||||
|
||||
# Or allow
|
||||
GATEWAY_ALLOWED_USERS=123456789,987654321
|
||||
|
||||
# Or explicitly allow all users (NOT recommended for bots with terminal access):
|
||||
GATEWAY_ALLOW_ALL_USERS=true
|
||||
```
|
||||
|
||||
### DM Pairing (Alternative to Allowlists)
|
||||
|
||||
Instead of manually configuring user IDs, unknown users receive a one-time pairing code when they DM the bot:
|
||||
|
||||
```bash
|
||||
# The user sees: "Pairing code: XKGH5N7P"
|
||||
# You approve them with:
|
||||
hermes pairing approve telegram XKGH5N7P
|
||||
|
||||
# Other pairing commands:
|
||||
hermes pairing list # View pending + approved users
|
||||
hermes pairing revoke telegram 123456789 # Remove access
|
||||
```
|
||||
|
||||
Pairing codes expire after 1 hour, are rate-limited, and use cryptographic randomness.
|
||||
|
||||
## Interrupting the Agent
|
||||
|
||||
Send any message while the agent is working to interrupt it. Key behaviors:
|
||||
|
||||
- **In-progress terminal commands are killed immediately** (SIGTERM, then SIGKILL after 1s)
|
||||
- **Tool calls are cancelled** — only the currently-executing one runs, the rest are skipped
|
||||
- **Multiple messages are combined** — messages sent during interruption are joined into one prompt
|
||||
- **`/stop` command** — interrupts without queuing a follow-up message
|
||||
|
||||
## Tool Progress Notifications
|
||||
|
||||
Control how much tool activity is displayed in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
display:
|
||||
tool_progress: all # off | new | all | verbose
|
||||
```
|
||||
|
||||
When enabled, the bot sends status messages as it works:
|
||||
|
||||
```text
|
||||
💻 `ls -la`...
|
||||
🔍 web_search...
|
||||
📄 web_extract...
|
||||
🐍 execute_code...
|
||||
```
|
||||
|
||||
## Background Sessions
|
||||
|
||||
Run a prompt in a separate background session so the agent works on it independently while your main chat stays responsive:
|
||||
|
||||
```
|
||||
/background Check all servers in the cluster and report any that are down
|
||||
```
|
||||
|
||||
Hermes confirms immediately:
|
||||
|
||||
```
|
||||
🔄 Background task started: "Check all servers in the cluster..."
|
||||
Task ID: bg_143022_a1b2c3
|
||||
```
|
||||
|
||||
### How It Works
|
||||
|
||||
Each `/background` prompt spawns a **separate agent instance** that runs asynchronously:
|
||||
|
||||
- **Isolated session** — the background agent has its own session with its own conversation history. It has no knowledge of your current chat context and receives only the prompt you provide.
|
||||
- **Same configuration** — inherits your model, provider, toolsets, reasoning settings, and provider routing from the current gateway setup.
|
||||
- **Non-blocking** — your main chat stays fully interactive. Send messages, run other commands, or start more background tasks while it works.
|
||||
- **Result delivery** — when the task finishes, the result is sent back to the **same chat or channel** where you issued the command, prefixed with "✅ Background task complete". If it fails, you'll see "❌ Background task failed" with the error.
|
||||
|
||||
### Background Process Notifications
|
||||
|
||||
When the agent running a background session uses `terminal(background=true)` to start long-running processes (servers, builds, etc.), the gateway can push status updates to your chat. Control this with `display.background_process_notifications` in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
display:
|
||||
background_process_notifications: all # all | result | error | off
|
||||
```
|
||||
|
||||
| Mode | What you receive |
|
||||
|------|-----------------|
|
||||
| `all` | Running-output updates **and** the final completion message (default) |
|
||||
| `result` | Only the final completion message (regardless of exit code) |
|
||||
| `error` | Only the final message when the exit code is non-zero |
|
||||
| `off` | No process watcher messages at all |
|
||||
|
||||
You can also set this via environment variable:
|
||||
|
||||
```bash
|
||||
HERMES_BACKGROUND_NOTIFICATIONS=result
|
||||
```
|
||||
|
||||
### Use Cases
|
||||
|
||||
- **Server monitoring** — "/background Check the health of all services and alert me if anything is down"
|
||||
- **Long builds** — "/background Build and deploy the staging environment" while you continue chatting
|
||||
- **Research tasks** — "/background Research competitor pricing and summarize in a table"
|
||||
- **File operations** — "/background Organize the photos in ~/Downloads by date into folders"
|
||||
|
||||
:::tip
|
||||
Background tasks on messaging platforms are fire-and-forget — you don't need to wait or check on them. Results arrive in the same chat automatically when the task finishes.
|
||||
:::
|
||||
|
||||
## Service Management
|
||||
|
||||
### Linux (systemd)
|
||||
|
||||
```bash
|
||||
hermes gateway install # Install as user service
|
||||
hermes gateway start # Start the service
|
||||
hermes gateway stop # Stop the service
|
||||
hermes gateway status # Check status
|
||||
journalctl --user -u hermes-gateway -f # View logs
|
||||
|
||||
# Enable lingering (keeps running after logout)
|
||||
sudo loginctl enable-linger $USER
|
||||
|
||||
# Or install a boot-time system service that still runs as your user
|
||||
sudo hermes gateway install --system
|
||||
sudo hermes gateway start --system
|
||||
sudo hermes gateway status --system
|
||||
journalctl -u hermes-gateway -f
|
||||
```
|
||||
|
||||
Use the user service on laptops and dev boxes. Use the system service on VPS or headless hosts that should come back at boot without relying on systemd linger.
|
||||
|
||||
Avoid keeping both the user and system gateway units installed at once unless you really mean to. Hermes will warn if it detects both because start/stop/status behavior gets ambiguous.
|
||||
|
||||
:::info Multiple installations
|
||||
If you run multiple Hermes installations on the same machine (with different `HERMES_HOME` directories), each gets its own systemd service name. The default `~/.hermes` uses `hermes-gateway`; other installations use `hermes-gateway-<hash>`. The `hermes gateway` commands automatically target the correct service for your current `HERMES_HOME`.
|
||||
:::
|
||||
|
||||
### macOS (launchd)
|
||||
|
||||
```bash
|
||||
hermes gateway install
|
||||
launchctl start ai.hermes.gateway
|
||||
launchctl stop ai.hermes.gateway
|
||||
tail -f ~/.hermes/logs/gateway.log
|
||||
```
|
||||
|
||||
## Platform-Specific Toolsets
|
||||
|
||||
Each platform has its own toolset:
|
||||
|
||||
| Platform | Toolset | Capabilities |
|
||||
|----------|---------|--------------|
|
||||
| CLI | `hermes-cli` | Full access |
|
||||
| Telegram | `hermes-telegram` | Full tools including terminal |
|
||||
| Discord | `hermes-discord` | Full tools including terminal |
|
||||
| WhatsApp | `hermes-whatsapp` | Full tools including terminal |
|
||||
| Slack | `hermes-slack` | Full tools including terminal |
|
||||
| Signal | `hermes-signal` | Full tools including terminal |
|
||||
| SMS | `hermes-sms` | Full tools including terminal |
|
||||
| Email | `hermes-email` | Full tools including terminal |
|
||||
| Home Assistant | `hermes-homeassistant` | Full tools + HA device control (ha_list_entities, ha_get_state, ha_call_service, ha_list_services) |
|
||||
| Mattermost | `hermes-mattermost` | Full tools including terminal |
|
||||
| Matrix | `hermes-matrix` | Full tools including terminal |
|
||||
| DingTalk | `hermes-dingtalk` | Full tools including terminal |
|
||||
| API Server | `hermes` (default) | Full tools including terminal |
|
||||
| Webhooks | `hermes-webhook` | Full tools including terminal |
|
||||
|
||||
## Next Steps
|
||||
|
||||
- [Telegram Setup](telegram.md)
|
||||
- [Discord Setup](discord.md)
|
||||
- [Slack Setup](slack.md)
|
||||
- [WhatsApp Setup](whatsapp.md)
|
||||
- [Signal Setup](signal.md)
|
||||
- [SMS Setup (Twilio)](sms.md)
|
||||
- [Email Setup](email.md)
|
||||
- [Home Assistant Integration](homeassistant.md)
|
||||
- [Mattermost Setup](mattermost.md)
|
||||
- [Matrix Setup](matrix.md)
|
||||
- [DingTalk Setup](dingtalk.md)
|
||||
- [Open WebUI + API Server](open-webui.md)
|
||||
- [Webhooks](webhooks.md)
|
||||
354
hermes_code/website/docs/user-guide/messaging/matrix.md
Normal file
354
hermes_code/website/docs/user-guide/messaging/matrix.md
Normal file
|
|
@ -0,0 +1,354 @@
|
|||
---
|
||||
sidebar_position: 9
|
||||
title: "Matrix"
|
||||
description: "Set up Hermes Agent as a Matrix bot"
|
||||
---
|
||||
|
||||
# Matrix Setup
|
||||
|
||||
Hermes Agent integrates with Matrix, the open, federated messaging protocol. Matrix lets you run your own homeserver or use a public one like matrix.org — either way, you keep control of your communications. The bot connects via the `matrix-nio` Python SDK, processes messages through the Hermes Agent pipeline (including tool use, memory, and reasoning), and responds in real time. It supports text, file attachments, images, audio, video, and optional end-to-end encryption (E2EE).
|
||||
|
||||
Hermes works with any Matrix homeserver — Synapse, Conduit, Dendrite, or matrix.org.
|
||||
|
||||
Before setup, here's the part most people want to know: how Hermes behaves once it's connected.
|
||||
|
||||
## How Hermes Behaves
|
||||
|
||||
| Context | Behavior |
|
||||
|---------|----------|
|
||||
| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
|
||||
| **Rooms** | Hermes responds to all messages in rooms it has joined. Room invites are auto-accepted. |
|
||||
| **Threads** | Hermes supports Matrix threads (MSC3440). If you reply in a thread, Hermes keeps the thread context isolated from the main room timeline. |
|
||||
| **Shared rooms with multiple users** | By default, Hermes isolates session history per user inside the room. Two people talking in the same room do not share one transcript unless you explicitly disable that. |
|
||||
|
||||
:::tip
|
||||
The bot automatically joins rooms when invited. Just invite the bot's Matrix user to any room and it will join and start responding.
|
||||
:::
|
||||
|
||||
### Session Model in Matrix
|
||||
|
||||
By default:
|
||||
|
||||
- each DM gets its own session
|
||||
- each thread gets its own session namespace
|
||||
- each user in a shared room gets their own session inside that room
|
||||
|
||||
This is controlled by `config.yaml`:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
Set it to `false` only if you explicitly want one shared conversation for the entire room:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: false
|
||||
```
|
||||
|
||||
Shared sessions can be useful for a collaborative room, but they also mean:
|
||||
|
||||
- users share context growth and token costs
|
||||
- one person's long tool-heavy task can bloat everyone else's context
|
||||
- one person's in-flight run can interrupt another person's follow-up in the same room
|
||||
|
||||
This guide walks you through the full setup process — from creating your bot account to sending your first message.
|
||||
|
||||
## Step 1: Create a Bot Account
|
||||
|
||||
You need a Matrix user account for the bot. There are several ways to do this:
|
||||
|
||||
### Option A: Register on Your Homeserver (Recommended)
|
||||
|
||||
If you run your own homeserver (Synapse, Conduit, Dendrite):
|
||||
|
||||
1. Use the admin API or registration tool to create a new user:
|
||||
|
||||
```bash
|
||||
# Synapse example
|
||||
register_new_matrix_user -c /etc/synapse/homeserver.yaml http://localhost:8008
|
||||
```
|
||||
|
||||
2. Choose a username like `hermes` — the full user ID will be `@hermes:your-server.org`.
|
||||
|
||||
### Option B: Use matrix.org or Another Public Homeserver
|
||||
|
||||
1. Go to [Element Web](https://app.element.io) and create a new account.
|
||||
2. Pick a username for your bot (e.g., `hermes-bot`).
|
||||
|
||||
### Option C: Use Your Own Account
|
||||
|
||||
You can also run Hermes as your own user. This means the bot posts as you — useful for personal assistants.
|
||||
|
||||
## Step 2: Get an Access Token
|
||||
|
||||
Hermes needs an access token to authenticate with the homeserver. You have two options:
|
||||
|
||||
### Option A: Access Token (Recommended)
|
||||
|
||||
The most reliable way to get a token:
|
||||
|
||||
**Via Element:**
|
||||
1. Log in to [Element](https://app.element.io) with the bot account.
|
||||
2. Go to **Settings** → **Help & About**.
|
||||
3. Scroll down and expand **Advanced** — the access token is displayed there.
|
||||
4. **Copy it immediately.**
|
||||
|
||||
**Via the API:**
|
||||
|
||||
```bash
|
||||
curl -X POST https://your-server/_matrix/client/v3/login \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"type": "m.login.password",
|
||||
"user": "@hermes:your-server.org",
|
||||
"password": "your-password"
|
||||
}'
|
||||
```
|
||||
|
||||
The response includes an `access_token` field — copy it.
|
||||
|
||||
:::warning[Keep your access token safe]
|
||||
The access token gives full access to the bot's Matrix account. Never share it publicly or commit it to Git. If compromised, revoke it by logging out all sessions for that user.
|
||||
:::
|
||||
|
||||
### Option B: Password Login
|
||||
|
||||
Instead of providing an access token, you can give Hermes the bot's user ID and password. Hermes will log in automatically on startup. This is simpler but means the password is stored in your `.env` file.
|
||||
|
||||
```bash
|
||||
MATRIX_USER_ID=@hermes:your-server.org
|
||||
MATRIX_PASSWORD=your-password
|
||||
```
|
||||
|
||||
## Step 3: Find Your Matrix User ID
|
||||
|
||||
Hermes Agent uses your Matrix User ID to control who can interact with the bot. Matrix User IDs follow the format `@username:server`.
|
||||
|
||||
To find yours:
|
||||
|
||||
1. Open [Element](https://app.element.io) (or your preferred Matrix client).
|
||||
2. Click your avatar → **Settings**.
|
||||
3. Your User ID is displayed at the top of the profile (e.g., `@alice:matrix.org`).
|
||||
|
||||
:::tip
|
||||
Matrix User IDs always start with `@` and contain a `:` followed by the server name. For example: `@alice:matrix.org`, `@bob:your-server.com`.
|
||||
:::
|
||||
|
||||
## Step 4: Configure Hermes Agent
|
||||
|
||||
### Option A: Interactive Setup (Recommended)
|
||||
|
||||
Run the guided setup command:
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **Matrix** when prompted, then provide your homeserver URL, access token (or user ID + password), and allowed user IDs when asked.
|
||||
|
||||
### Option B: Manual Configuration
|
||||
|
||||
Add the following to your `~/.hermes/.env` file:
|
||||
|
||||
**Using an access token:**
|
||||
|
||||
```bash
|
||||
# Required
|
||||
MATRIX_HOMESERVER=https://matrix.example.org
|
||||
MATRIX_ACCESS_TOKEN=***
|
||||
|
||||
# Optional: user ID (auto-detected from token if omitted)
|
||||
# MATRIX_USER_ID=@hermes:matrix.example.org
|
||||
|
||||
# Security: restrict who can interact with the bot
|
||||
MATRIX_ALLOWED_USERS=@alice:matrix.example.org
|
||||
|
||||
# Multiple allowed users (comma-separated)
|
||||
# MATRIX_ALLOWED_USERS=@alice:matrix.example.org,@bob:matrix.example.org
|
||||
```
|
||||
|
||||
**Using password login:**
|
||||
|
||||
```bash
|
||||
# Required
|
||||
MATRIX_HOMESERVER=https://matrix.example.org
|
||||
MATRIX_USER_ID=@hermes:matrix.example.org
|
||||
MATRIX_PASSWORD=***
|
||||
|
||||
# Security
|
||||
MATRIX_ALLOWED_USERS=@alice:matrix.example.org
|
||||
```
|
||||
|
||||
Optional behavior settings in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
- `group_sessions_per_user: true` keeps each participant's context isolated inside shared rooms
|
||||
|
||||
### Start the Gateway
|
||||
|
||||
Once configured, start the Matrix gateway:
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
The bot should connect to your homeserver and start syncing within a few seconds. Send it a message — either a DM or in a room it has joined — to test.
|
||||
|
||||
:::tip
|
||||
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
|
||||
:::
|
||||
|
||||
## End-to-End Encryption (E2EE)
|
||||
|
||||
Hermes supports Matrix end-to-end encryption, so you can chat with your bot in encrypted rooms.
|
||||
|
||||
### Requirements
|
||||
|
||||
E2EE requires the `matrix-nio` library with encryption extras and the `libolm` C library:
|
||||
|
||||
```bash
|
||||
# Install matrix-nio with E2EE support
|
||||
pip install 'matrix-nio[e2e]'
|
||||
|
||||
# Or install with hermes extras
|
||||
pip install 'hermes-agent[matrix]'
|
||||
```
|
||||
|
||||
You also need `libolm` installed on your system:
|
||||
|
||||
```bash
|
||||
# Debian/Ubuntu
|
||||
sudo apt install libolm-dev
|
||||
|
||||
# macOS
|
||||
brew install libolm
|
||||
|
||||
# Fedora
|
||||
sudo dnf install libolm-devel
|
||||
```
|
||||
|
||||
### Enable E2EE
|
||||
|
||||
Add to your `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
MATRIX_ENCRYPTION=true
|
||||
```
|
||||
|
||||
When E2EE is enabled, Hermes:
|
||||
|
||||
- Stores encryption keys in `~/.hermes/matrix/store/`
|
||||
- Uploads device keys on first connection
|
||||
- Decrypts incoming messages and encrypts outgoing messages automatically
|
||||
- Auto-joins encrypted rooms when invited
|
||||
|
||||
:::warning
|
||||
If you delete the `~/.hermes/matrix/store/` directory, the bot loses its encryption keys. You'll need to verify the device again in your Matrix client. Back up this directory if you want to preserve encrypted sessions.
|
||||
:::
|
||||
|
||||
:::info
|
||||
If `matrix-nio[e2e]` is not installed or `libolm` is missing, the bot falls back to a plain (unencrypted) client automatically. You'll see a warning in the logs.
|
||||
:::
|
||||
|
||||
## Home Room
|
||||
|
||||
You can designate a "home room" where the bot sends proactive messages (such as cron job output, reminders, and notifications). There are two ways to set it:
|
||||
|
||||
### Using the Slash Command
|
||||
|
||||
Type `/sethome` in any Matrix room where the bot is present. That room becomes the home room.
|
||||
|
||||
### Manual Configuration
|
||||
|
||||
Add this to your `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
MATRIX_HOME_ROOM=!abc123def456:matrix.example.org
|
||||
```
|
||||
|
||||
:::tip
|
||||
To find a Room ID: in Element, go to the room → **Settings** → **Advanced** → the **Internal room ID** is shown there (starts with `!`).
|
||||
:::
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Bot is not responding to messages
|
||||
|
||||
**Cause**: The bot hasn't joined the room, or `MATRIX_ALLOWED_USERS` doesn't include your User ID.
|
||||
|
||||
**Fix**: Invite the bot to the room — it auto-joins on invite. Verify your User ID is in `MATRIX_ALLOWED_USERS` (use the full `@user:server` format). Restart the gateway.
|
||||
|
||||
### "Failed to authenticate" / "whoami failed" on startup
|
||||
|
||||
**Cause**: The access token or homeserver URL is incorrect.
|
||||
|
||||
**Fix**: Verify `MATRIX_HOMESERVER` points to your homeserver (include `https://`, no trailing slash). Check that `MATRIX_ACCESS_TOKEN` is valid — try it with curl:
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer YOUR_TOKEN" \
|
||||
https://your-server/_matrix/client/v3/account/whoami
|
||||
```
|
||||
|
||||
If this returns your user info, the token is valid. If it returns an error, generate a new token.
|
||||
|
||||
### "matrix-nio not installed" error
|
||||
|
||||
**Cause**: The `matrix-nio` Python package is not installed.
|
||||
|
||||
**Fix**: Install it:
|
||||
|
||||
```bash
|
||||
pip install 'matrix-nio[e2e]'
|
||||
```
|
||||
|
||||
Or with Hermes extras:
|
||||
|
||||
```bash
|
||||
pip install 'hermes-agent[matrix]'
|
||||
```
|
||||
|
||||
### Encryption errors / "could not decrypt event"
|
||||
|
||||
**Cause**: Missing encryption keys, `libolm` not installed, or the bot's device isn't trusted.
|
||||
|
||||
**Fix**:
|
||||
1. Verify `libolm` is installed on your system (see the E2EE section above).
|
||||
2. Make sure `MATRIX_ENCRYPTION=true` is set in your `.env`.
|
||||
3. In your Matrix client (Element), go to the bot's profile → **Sessions** → verify/trust the bot's device.
|
||||
4. If the bot just joined an encrypted room, it can only decrypt messages sent *after* it joined. Older messages are inaccessible.
|
||||
|
||||
### Sync issues / bot falls behind
|
||||
|
||||
**Cause**: Long-running tool executions can delay the sync loop, or the homeserver is slow.
|
||||
|
||||
**Fix**: The sync loop automatically retries every 5 seconds on error. Check the Hermes logs for sync-related warnings. If the bot consistently falls behind, ensure your homeserver has adequate resources.
|
||||
|
||||
### Bot is offline
|
||||
|
||||
**Cause**: The Hermes gateway isn't running, or it failed to connect.
|
||||
|
||||
**Fix**: Check that `hermes gateway` is running. Look at the terminal output for error messages. Common issues: wrong homeserver URL, expired access token, homeserver unreachable.
|
||||
|
||||
### "User not allowed" / Bot ignores you
|
||||
|
||||
**Cause**: Your User ID isn't in `MATRIX_ALLOWED_USERS`.
|
||||
|
||||
**Fix**: Add your User ID to `MATRIX_ALLOWED_USERS` in `~/.hermes/.env` and restart the gateway. Use the full `@user:server` format.
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
Always set `MATRIX_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
|
||||
:::
|
||||
|
||||
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
|
||||
|
||||
## Notes
|
||||
|
||||
- **Any homeserver**: Works with Synapse, Conduit, Dendrite, matrix.org, or any spec-compliant Matrix homeserver. No specific homeserver software required.
|
||||
- **Federation**: If you're on a federated homeserver, the bot can communicate with users from other servers — just add their full `@user:server` IDs to `MATRIX_ALLOWED_USERS`.
|
||||
- **Auto-join**: The bot automatically accepts room invites and joins. It starts responding immediately after joining.
|
||||
- **Media support**: Hermes can send and receive images, audio, video, and file attachments. Media is uploaded to your homeserver using the Matrix content repository API.
|
||||
277
hermes_code/website/docs/user-guide/messaging/mattermost.md
Normal file
277
hermes_code/website/docs/user-guide/messaging/mattermost.md
Normal file
|
|
@ -0,0 +1,277 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "Mattermost"
|
||||
description: "Set up Hermes Agent as a Mattermost bot"
|
||||
---
|
||||
|
||||
# Mattermost Setup
|
||||
|
||||
Hermes Agent integrates with Mattermost as a bot, letting you chat with your AI assistant through direct messages or team channels. Mattermost is a self-hosted, open-source Slack alternative — you run it on your own infrastructure, keeping full control of your data. The bot connects via Mattermost's REST API (v4) and WebSocket for real-time events, processes messages through the Hermes Agent pipeline (including tool use, memory, and reasoning), and responds in real time. It supports text, file attachments, images, and slash commands.
|
||||
|
||||
No external Mattermost library is required — the adapter uses `aiohttp`, which is already a Hermes dependency.
|
||||
|
||||
Before setup, here's the part most people want to know: how Hermes behaves once it's in your Mattermost instance.
|
||||
|
||||
## How Hermes Behaves
|
||||
|
||||
| Context | Behavior |
|
||||
|---------|----------|
|
||||
| **DMs** | Hermes responds to every message. No `@mention` needed. Each DM has its own session. |
|
||||
| **Public/private channels** | Hermes responds when you `@mention` it. Without a mention, Hermes ignores the message. |
|
||||
| **Threads** | If `MATTERMOST_REPLY_MODE=thread`, Hermes replies in a thread under your message. Thread context stays isolated from the parent channel. |
|
||||
| **Shared channels with multiple users** | By default, Hermes isolates session history per user inside the channel. Two people talking in the same channel do not share one transcript unless you explicitly disable that. |
|
||||
|
||||
:::tip
|
||||
If you want Hermes to reply as threaded conversations (nested under your original message), set `MATTERMOST_REPLY_MODE=thread`. The default is `off`, which sends flat messages in the channel.
|
||||
:::
|
||||
|
||||
### Session Model in Mattermost
|
||||
|
||||
By default:
|
||||
|
||||
- each DM gets its own session
|
||||
- each thread gets its own session namespace
|
||||
- each user in a shared channel gets their own session inside that channel
|
||||
|
||||
This is controlled by `config.yaml`:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
Set it to `false` only if you explicitly want one shared conversation for the entire channel:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: false
|
||||
```
|
||||
|
||||
Shared sessions can be useful for a collaborative channel, but they also mean:
|
||||
|
||||
- users share context growth and token costs
|
||||
- one person's long tool-heavy task can bloat everyone else's context
|
||||
- one person's in-flight run can interrupt another person's follow-up in the same channel
|
||||
|
||||
This guide walks you through the full setup process — from creating your bot on Mattermost to sending your first message.
|
||||
|
||||
## Step 1: Enable Bot Accounts
|
||||
|
||||
Bot accounts must be enabled on your Mattermost server before you can create one.
|
||||
|
||||
1. Log in to Mattermost as a **System Admin**.
|
||||
2. Go to **System Console** → **Integrations** → **Bot Accounts**.
|
||||
3. Set **Enable Bot Account Creation** to **true**.
|
||||
4. Click **Save**.
|
||||
|
||||
:::info
|
||||
If you don't have System Admin access, ask your Mattermost administrator to enable bot accounts and create one for you.
|
||||
:::
|
||||
|
||||
## Step 2: Create a Bot Account
|
||||
|
||||
1. In Mattermost, click the **☰** menu (top-left) → **Integrations** → **Bot Accounts**.
|
||||
2. Click **Add Bot Account**.
|
||||
3. Fill in the details:
|
||||
- **Username**: e.g., `hermes`
|
||||
- **Display Name**: e.g., `Hermes Agent`
|
||||
- **Description**: optional
|
||||
- **Role**: `Member` is sufficient
|
||||
4. Click **Create Bot Account**.
|
||||
5. Mattermost will display the **bot token**. **Copy it immediately.**
|
||||
|
||||
:::warning[Token shown only once]
|
||||
The bot token is only displayed once when you create the bot account. If you lose it, you'll need to regenerate it from the bot account settings. Never share your token publicly or commit it to Git — anyone with this token has full control of the bot.
|
||||
:::
|
||||
|
||||
Store the token somewhere safe (a password manager, for example). You'll need it in Step 5.
|
||||
|
||||
:::tip
|
||||
You can also use a **personal access token** instead of a bot account. Go to **Profile** → **Security** → **Personal Access Tokens** → **Create Token**. This is useful if you want Hermes to post as your own user rather than a separate bot user.
|
||||
:::
|
||||
|
||||
## Step 3: Add the Bot to Channels
|
||||
|
||||
The bot needs to be a member of any channel where you want it to respond:
|
||||
|
||||
1. Open the channel where you want the bot.
|
||||
2. Click the channel name → **Add Members**.
|
||||
3. Search for your bot username (e.g., `hermes`) and add it.
|
||||
|
||||
For DMs, simply open a direct message with the bot — it will be able to respond immediately.
|
||||
|
||||
## Step 4: Find Your Mattermost User ID
|
||||
|
||||
Hermes Agent uses your Mattermost User ID to control who can interact with the bot. To find it:
|
||||
|
||||
1. Click your **avatar** (top-left corner) → **Profile**.
|
||||
2. Your User ID is displayed in the profile dialog — click it to copy.
|
||||
|
||||
Your User ID is a 26-character alphanumeric string like `3uo8dkh1p7g1mfk49ear5fzs5c`.
|
||||
|
||||
:::warning
|
||||
Your User ID is **not** your username. The username is what appears after `@` (e.g., `@alice`). The User ID is a long alphanumeric identifier that Mattermost uses internally.
|
||||
:::
|
||||
|
||||
**Alternative**: You can also get your User ID via the API:
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer YOUR_TOKEN" \
|
||||
https://your-mattermost-server/api/v4/users/me | jq .id
|
||||
```
|
||||
|
||||
:::tip
|
||||
To get a **Channel ID**: click the channel name → **View Info**. The Channel ID is shown in the info panel. You'll need this if you want to set a home channel manually.
|
||||
:::
|
||||
|
||||
## Step 5: Configure Hermes Agent
|
||||
|
||||
### Option A: Interactive Setup (Recommended)
|
||||
|
||||
Run the guided setup command:
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **Mattermost** when prompted, then paste your server URL, bot token, and user ID when asked.
|
||||
|
||||
### Option B: Manual Configuration
|
||||
|
||||
Add the following to your `~/.hermes/.env` file:
|
||||
|
||||
```bash
|
||||
# Required
|
||||
MATTERMOST_URL=https://mm.example.com
|
||||
MATTERMOST_TOKEN=***
|
||||
MATTERMOST_ALLOWED_USERS=3uo8dkh1p7g1mfk49ear5fzs5c
|
||||
|
||||
# Multiple allowed users (comma-separated)
|
||||
# MATTERMOST_ALLOWED_USERS=3uo8dkh1p7g1mfk49ear5fzs5c,8fk2jd9s0a7bncm1xqw4tp6r3e
|
||||
|
||||
# Optional: reply mode (thread or off, default: off)
|
||||
# MATTERMOST_REPLY_MODE=thread
|
||||
```
|
||||
|
||||
Optional behavior settings in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: true
|
||||
```
|
||||
|
||||
- `group_sessions_per_user: true` keeps each participant's context isolated inside shared channels and threads
|
||||
|
||||
### Start the Gateway
|
||||
|
||||
Once configured, start the Mattermost gateway:
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
The bot should connect to your Mattermost server within a few seconds. Send it a message — either a DM or in a channel where it's been added — to test.
|
||||
|
||||
:::tip
|
||||
You can run `hermes gateway` in the background or as a systemd service for persistent operation. See the deployment docs for details.
|
||||
:::
|
||||
|
||||
## Home Channel
|
||||
|
||||
You can designate a "home channel" where the bot sends proactive messages (such as cron job output, reminders, and notifications). There are two ways to set it:
|
||||
|
||||
### Using the Slash Command
|
||||
|
||||
Type `/sethome` in any Mattermost channel where the bot is present. That channel becomes the home channel.
|
||||
|
||||
### Manual Configuration
|
||||
|
||||
Add this to your `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
MATTERMOST_HOME_CHANNEL=abc123def456ghi789jkl012mn
|
||||
```
|
||||
|
||||
Replace the ID with the actual channel ID (click the channel name → View Info → copy the ID).
|
||||
|
||||
## Reply Mode
|
||||
|
||||
The `MATTERMOST_REPLY_MODE` setting controls how Hermes posts responses:
|
||||
|
||||
| Mode | Behavior |
|
||||
|------|----------|
|
||||
| `off` (default) | Hermes posts flat messages in the channel, like a normal user. |
|
||||
| `thread` | Hermes replies in a thread under your original message. Keeps channels clean when there's lots of back-and-forth. |
|
||||
|
||||
Set it in your `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
MATTERMOST_REPLY_MODE=thread
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Bot is not responding to messages
|
||||
|
||||
**Cause**: The bot is not a member of the channel, or `MATTERMOST_ALLOWED_USERS` doesn't include your User ID.
|
||||
|
||||
**Fix**: Add the bot to the channel (channel name → Add Members → search for the bot). Verify your User ID is in `MATTERMOST_ALLOWED_USERS`. Restart the gateway.
|
||||
|
||||
### 403 Forbidden errors
|
||||
|
||||
**Cause**: The bot token is invalid, or the bot doesn't have permission to post in the channel.
|
||||
|
||||
**Fix**: Check that `MATTERMOST_TOKEN` in your `.env` file is correct. Make sure the bot account hasn't been deactivated. Verify the bot has been added to the channel. If using a personal access token, ensure your account has the required permissions.
|
||||
|
||||
### WebSocket disconnects / reconnection loops
|
||||
|
||||
**Cause**: Network instability, Mattermost server restarts, or firewall/proxy issues with WebSocket connections.
|
||||
|
||||
**Fix**: The adapter automatically reconnects with exponential backoff (2s → 60s). Check your server's WebSocket configuration — reverse proxies (nginx, Apache) need WebSocket upgrade headers configured. Verify no firewall is blocking WebSocket connections on your Mattermost server.
|
||||
|
||||
For nginx, ensure your config includes:
|
||||
|
||||
```nginx
|
||||
location /api/v4/websocket {
|
||||
proxy_pass http://mattermost-backend;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
proxy_read_timeout 600s;
|
||||
}
|
||||
```
|
||||
|
||||
### "Failed to authenticate" on startup
|
||||
|
||||
**Cause**: The token or server URL is incorrect.
|
||||
|
||||
**Fix**: Verify `MATTERMOST_URL` points to your Mattermost server (include `https://`, no trailing slash). Check that `MATTERMOST_TOKEN` is valid — try it with curl:
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: Bearer YOUR_TOKEN" \
|
||||
https://your-server/api/v4/users/me
|
||||
```
|
||||
|
||||
If this returns your bot's user info, the token is valid. If it returns an error, regenerate the token.
|
||||
|
||||
### Bot is offline
|
||||
|
||||
**Cause**: The Hermes gateway isn't running, or it failed to connect.
|
||||
|
||||
**Fix**: Check that `hermes gateway` is running. Look at the terminal output for error messages. Common issues: wrong URL, expired token, Mattermost server unreachable.
|
||||
|
||||
### "User not allowed" / Bot ignores you
|
||||
|
||||
**Cause**: Your User ID isn't in `MATTERMOST_ALLOWED_USERS`.
|
||||
|
||||
**Fix**: Add your User ID to `MATTERMOST_ALLOWED_USERS` in `~/.hermes/.env` and restart the gateway. Remember: the User ID is a 26-character alphanumeric string, not your `@username`.
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
Always set `MATTERMOST_ALLOWED_USERS` to restrict who can interact with the bot. Without it, the gateway denies all users by default as a safety measure. Only add User IDs of people you trust — authorized users have full access to the agent's capabilities, including tool use and system access.
|
||||
:::
|
||||
|
||||
For more information on securing your Hermes Agent deployment, see the [Security Guide](../security.md).
|
||||
|
||||
## Notes
|
||||
|
||||
- **Self-hosted friendly**: Works with any self-hosted Mattermost instance. No Mattermost Cloud account or subscription required.
|
||||
- **No extra dependencies**: The adapter uses `aiohttp` for HTTP and WebSocket, which is already included with Hermes Agent.
|
||||
- **Team Edition compatible**: Works with both Mattermost Team Edition (free) and Enterprise Edition.
|
||||
208
hermes_code/website/docs/user-guide/messaging/open-webui.md
Normal file
208
hermes_code/website/docs/user-guide/messaging/open-webui.md
Normal file
|
|
@ -0,0 +1,208 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "Open WebUI"
|
||||
description: "Connect Open WebUI to Hermes Agent via the OpenAI-compatible API server"
|
||||
---
|
||||
|
||||
# Open WebUI Integration
|
||||
|
||||
[Open WebUI](https://github.com/open-webui/open-webui) (126k★) is the most popular self-hosted chat interface for AI. With Hermes Agent's built-in API server, you can use Open WebUI as a polished web frontend for your agent — complete with conversation management, user accounts, and a modern chat interface.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A["Open WebUI<br/>browser UI<br/>port 3000"]
|
||||
B["hermes-agent<br/>gateway API server<br/>port 8642"]
|
||||
A -->|POST /v1/chat/completions| B
|
||||
B -->|SSE streaming response| A
|
||||
```
|
||||
|
||||
Open WebUI connects to Hermes Agent's API server just like it would connect to OpenAI. Your agent handles the requests with its full toolset — terminal, file operations, web search, memory, skills — and returns the final response.
|
||||
|
||||
Open WebUI talks to Hermes server-to-server, so you do not need `API_SERVER_CORS_ORIGINS` for this integration.
|
||||
|
||||
## Quick Setup
|
||||
|
||||
### 1. Enable the API server
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
API_SERVER_ENABLED=true
|
||||
API_SERVER_KEY=your-secret-key
|
||||
```
|
||||
|
||||
### 2. Start Hermes Agent gateway
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
You should see:
|
||||
|
||||
```
|
||||
[API Server] API server listening on http://127.0.0.1:8642
|
||||
```
|
||||
|
||||
### 3. Start Open WebUI
|
||||
|
||||
```bash
|
||||
docker run -d -p 3000:8080 \
|
||||
-e OPENAI_API_BASE_URL=http://host.docker.internal:8642/v1 \
|
||||
-e OPENAI_API_KEY=your-secret-key \
|
||||
--add-host=host.docker.internal:host-gateway \
|
||||
-v open-webui:/app/backend/data \
|
||||
--name open-webui \
|
||||
--restart always \
|
||||
ghcr.io/open-webui/open-webui:main
|
||||
```
|
||||
|
||||
### 4. Open the UI
|
||||
|
||||
Go to **http://localhost:3000**. Create your admin account (the first user becomes admin). You should see **hermes-agent** in the model dropdown. Start chatting!
|
||||
|
||||
## Docker Compose Setup
|
||||
|
||||
For a more permanent setup, create a `docker-compose.yml`:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
open-webui:
|
||||
image: ghcr.io/open-webui/open-webui:main
|
||||
ports:
|
||||
- "3000:8080"
|
||||
volumes:
|
||||
- open-webui:/app/backend/data
|
||||
environment:
|
||||
- OPENAI_API_BASE_URL=http://host.docker.internal:8642/v1
|
||||
- OPENAI_API_KEY=your-secret-key
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
restart: always
|
||||
|
||||
volumes:
|
||||
open-webui:
|
||||
```
|
||||
|
||||
Then:
|
||||
|
||||
```bash
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
## Configuring via the Admin UI
|
||||
|
||||
If you prefer to configure the connection through the UI instead of environment variables:
|
||||
|
||||
1. Log in to Open WebUI at **http://localhost:3000**
|
||||
2. Click your **profile avatar** → **Admin Settings**
|
||||
3. Go to **Connections**
|
||||
4. Under **OpenAI API**, click the **wrench icon** (Manage)
|
||||
5. Click **+ Add New Connection**
|
||||
6. Enter:
|
||||
- **URL**: `http://host.docker.internal:8642/v1`
|
||||
- **API Key**: your key or any non-empty value (e.g., `not-needed`)
|
||||
7. Click the **checkmark** to verify the connection
|
||||
8. **Save**
|
||||
|
||||
The **hermes-agent** model should now appear in the model dropdown.
|
||||
|
||||
:::warning
|
||||
Environment variables only take effect on Open WebUI's **first launch**. After that, connection settings are stored in its internal database. To change them later, use the Admin UI or delete the Docker volume and start fresh.
|
||||
:::
|
||||
|
||||
## API Type: Chat Completions vs Responses
|
||||
|
||||
Open WebUI supports two API modes when connecting to a backend:
|
||||
|
||||
| Mode | Format | When to use |
|
||||
|------|--------|-------------|
|
||||
| **Chat Completions** (default) | `/v1/chat/completions` | Recommended. Works out of the box. |
|
||||
| **Responses** (experimental) | `/v1/responses` | For server-side conversation state via `previous_response_id`. |
|
||||
|
||||
### Using Chat Completions (recommended)
|
||||
|
||||
This is the default and requires no extra configuration. Open WebUI sends standard OpenAI-format requests and Hermes Agent responds accordingly. Each request includes the full conversation history.
|
||||
|
||||
### Using Responses API
|
||||
|
||||
To use the Responses API mode:
|
||||
|
||||
1. Go to **Admin Settings** → **Connections** → **OpenAI** → **Manage**
|
||||
2. Edit your hermes-agent connection
|
||||
3. Change **API Type** from "Chat Completions" to **"Responses (Experimental)"**
|
||||
4. Save
|
||||
|
||||
With the Responses API, Open WebUI sends requests in the Responses format (`input` array + `instructions`), and Hermes Agent can preserve full tool call history across turns via `previous_response_id`.
|
||||
|
||||
:::note
|
||||
Open WebUI currently manages conversation history client-side even in Responses mode — it sends the full message history in each request rather than using `previous_response_id`. The Responses API mode is mainly useful for future compatibility as frontends evolve.
|
||||
:::
|
||||
|
||||
## How It Works
|
||||
|
||||
When you send a message in Open WebUI:
|
||||
|
||||
1. Open WebUI sends a `POST /v1/chat/completions` request with your message and conversation history
|
||||
2. Hermes Agent creates an AIAgent instance with its full toolset
|
||||
3. The agent processes your request — it may call tools (terminal, file operations, web search, etc.)
|
||||
4. Tool calls happen invisibly server-side
|
||||
5. The agent's final text response is returned to Open WebUI
|
||||
6. Open WebUI displays the response in its chat interface
|
||||
|
||||
Your agent has access to all the same tools and capabilities as when using the CLI or Telegram — the only difference is the frontend.
|
||||
|
||||
## Configuration Reference
|
||||
|
||||
### Hermes Agent (API server)
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `API_SERVER_ENABLED` | `false` | Enable the API server |
|
||||
| `API_SERVER_PORT` | `8642` | HTTP server port |
|
||||
| `API_SERVER_HOST` | `127.0.0.1` | Bind address |
|
||||
| `API_SERVER_KEY` | _(required)_ | Bearer token for auth. Match `OPENAI_API_KEY`. |
|
||||
|
||||
### Open WebUI
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `OPENAI_API_BASE_URL` | Hermes Agent's API URL (include `/v1`) |
|
||||
| `OPENAI_API_KEY` | Must be non-empty. Match your `API_SERVER_KEY`. |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### No models appear in the dropdown
|
||||
|
||||
- **Check the URL has `/v1` suffix**: `http://host.docker.internal:8642/v1` (not just `:8642`)
|
||||
- **Verify the gateway is running**: `curl http://localhost:8642/health` should return `{"status": "ok"}`
|
||||
- **Check model listing**: `curl http://localhost:8642/v1/models` should return a list with `hermes-agent`
|
||||
- **Docker networking**: From inside Docker, `localhost` means the container, not your host. Use `host.docker.internal` or `--network=host`.
|
||||
|
||||
### Connection test passes but no models load
|
||||
|
||||
This is almost always the missing `/v1` suffix. Open WebUI's connection test is a basic connectivity check — it doesn't verify model listing works.
|
||||
|
||||
### Response takes a long time
|
||||
|
||||
Hermes Agent may be executing multiple tool calls (reading files, running commands, searching the web) before producing its final response. This is normal for complex queries. The response appears all at once when the agent finishes.
|
||||
|
||||
### "Invalid API key" errors
|
||||
|
||||
Make sure your `OPENAI_API_KEY` in Open WebUI matches the `API_SERVER_KEY` in Hermes Agent.
|
||||
|
||||
## Linux Docker (no Docker Desktop)
|
||||
|
||||
On Linux without Docker Desktop, `host.docker.internal` doesn't resolve by default. Options:
|
||||
|
||||
```bash
|
||||
# Option 1: Add host mapping
|
||||
docker run --add-host=host.docker.internal:host-gateway ...
|
||||
|
||||
# Option 2: Use host networking
|
||||
docker run --network=host -e OPENAI_API_BASE_URL=http://localhost:8642/v1 ...
|
||||
|
||||
# Option 3: Use Docker bridge IP
|
||||
docker run -e OPENAI_API_BASE_URL=http://172.17.0.1:8642/v1 ...
|
||||
```
|
||||
238
hermes_code/website/docs/user-guide/messaging/signal.md
Normal file
238
hermes_code/website/docs/user-guide/messaging/signal.md
Normal file
|
|
@ -0,0 +1,238 @@
|
|||
---
|
||||
sidebar_position: 6
|
||||
title: "Signal"
|
||||
description: "Set up Hermes Agent as a Signal messenger bot via signal-cli daemon"
|
||||
---
|
||||
|
||||
# Signal Setup
|
||||
|
||||
Hermes connects to Signal through the [signal-cli](https://github.com/AsamK/signal-cli) daemon running in HTTP mode. The adapter streams messages in real-time via SSE (Server-Sent Events) and sends responses via JSON-RPC.
|
||||
|
||||
Signal is the most privacy-focused mainstream messenger — end-to-end encrypted by default, open-source protocol, minimal metadata collection. This makes it ideal for security-sensitive agent workflows.
|
||||
|
||||
:::info No New Python Dependencies
|
||||
The Signal adapter uses `httpx` (already a core Hermes dependency) for all communication. No additional Python packages are required. You just need signal-cli installed externally.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **signal-cli** — Java-based Signal client ([GitHub](https://github.com/AsamK/signal-cli))
|
||||
- **Java 17+** runtime — required by signal-cli
|
||||
- **A phone number** with Signal installed (for linking as a secondary device)
|
||||
|
||||
### Installing signal-cli
|
||||
|
||||
```bash
|
||||
# Linux (Debian/Ubuntu)
|
||||
sudo apt install signal-cli
|
||||
|
||||
# macOS
|
||||
brew install signal-cli
|
||||
|
||||
# Manual install (any platform)
|
||||
# Download from https://github.com/AsamK/signal-cli/releases
|
||||
# Extract and add to PATH
|
||||
```
|
||||
|
||||
### Alternative: Docker (signal-cli-rest-api)
|
||||
|
||||
If you prefer Docker, use the [signal-cli-rest-api](https://github.com/bbernhard/signal-cli-rest-api) container:
|
||||
|
||||
```bash
|
||||
docker run -d --name signal-cli \
|
||||
-p 8080:8080 \
|
||||
-v $HOME/.local/share/signal-cli:/home/.local/share/signal-cli \
|
||||
-e MODE=json-rpc \
|
||||
bbernhard/signal-cli-rest-api
|
||||
```
|
||||
|
||||
:::tip
|
||||
Use `MODE=json-rpc` for best performance. The `normal` mode spawns a JVM per request and is much slower.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Link Your Signal Account
|
||||
|
||||
Signal-cli works as a **linked device** — like WhatsApp Web, but for Signal. Your phone stays the primary device.
|
||||
|
||||
```bash
|
||||
# Generate a linking URI (displays a QR code or link)
|
||||
signal-cli link -n "HermesAgent"
|
||||
```
|
||||
|
||||
1. Open **Signal** on your phone
|
||||
2. Go to **Settings → Linked Devices**
|
||||
3. Tap **Link New Device**
|
||||
4. Scan the QR code or enter the URI
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Start the signal-cli Daemon
|
||||
|
||||
```bash
|
||||
# Replace +1234567890 with your Signal phone number (E.164 format)
|
||||
signal-cli --account +1234567890 daemon --http 127.0.0.1:8080
|
||||
```
|
||||
|
||||
:::tip
|
||||
Keep this running in the background. You can use `systemd`, `tmux`, `screen`, or run it as a service.
|
||||
:::
|
||||
|
||||
Verify it's running:
|
||||
|
||||
```bash
|
||||
curl http://127.0.0.1:8080/api/v1/check
|
||||
# Should return: {"versions":{"signal-cli":...}}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Configure Hermes
|
||||
|
||||
The easiest way:
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **Signal** from the platform menu. The wizard will:
|
||||
|
||||
1. Check if signal-cli is installed
|
||||
2. Prompt for the HTTP URL (default: `http://127.0.0.1:8080`)
|
||||
3. Test connectivity to the daemon
|
||||
4. Ask for your account phone number
|
||||
5. Configure allowed users and access policies
|
||||
|
||||
### Manual Configuration
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
# Required
|
||||
SIGNAL_HTTP_URL=http://127.0.0.1:8080
|
||||
SIGNAL_ACCOUNT=+1234567890
|
||||
|
||||
# Security (recommended)
|
||||
SIGNAL_ALLOWED_USERS=+1234567890,+0987654321 # Comma-separated E.164 numbers or UUIDs
|
||||
|
||||
# Optional
|
||||
SIGNAL_GROUP_ALLOWED_USERS=groupId1,groupId2 # Enable groups (omit to disable, * for all)
|
||||
SIGNAL_HOME_CHANNEL=+1234567890 # Default delivery target for cron jobs
|
||||
```
|
||||
|
||||
Then start the gateway:
|
||||
|
||||
```bash
|
||||
hermes gateway # Foreground
|
||||
hermes gateway install # Install as a user service
|
||||
sudo hermes gateway install --system # Linux only: boot-time system service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Access Control
|
||||
|
||||
### DM Access
|
||||
|
||||
DM access follows the same pattern as all other Hermes platforms:
|
||||
|
||||
1. **`SIGNAL_ALLOWED_USERS` set** → only those users can message
|
||||
2. **No allowlist set** → unknown users get a DM pairing code (approve via `hermes pairing approve signal CODE`)
|
||||
3. **`SIGNAL_ALLOW_ALL_USERS=true`** → anyone can message (use with caution)
|
||||
|
||||
### Group Access
|
||||
|
||||
Group access is controlled by the `SIGNAL_GROUP_ALLOWED_USERS` env var:
|
||||
|
||||
| Configuration | Behavior |
|
||||
|---------------|----------|
|
||||
| Not set (default) | All group messages are ignored. The bot only responds to DMs. |
|
||||
| Set with group IDs | Only listed groups are monitored (e.g., `groupId1,groupId2`). |
|
||||
| Set to `*` | The bot responds in any group it's a member of. |
|
||||
|
||||
---
|
||||
|
||||
## Features
|
||||
|
||||
### Attachments
|
||||
|
||||
The adapter supports sending and receiving:
|
||||
|
||||
- **Images** — PNG, JPEG, GIF, WebP (auto-detected via magic bytes)
|
||||
- **Audio** — MP3, OGG, WAV, M4A (voice messages transcribed if Whisper is configured)
|
||||
- **Documents** — PDF, ZIP, and other file types
|
||||
|
||||
Attachment size limit: **100 MB**.
|
||||
|
||||
### Typing Indicators
|
||||
|
||||
The bot sends typing indicators while processing messages, refreshing every 8 seconds.
|
||||
|
||||
### Phone Number Redaction
|
||||
|
||||
All phone numbers are automatically redacted in logs:
|
||||
- `+15551234567` → `+155****4567`
|
||||
- This applies to both Hermes gateway logs and the global redaction system
|
||||
|
||||
### Note to Self (Single-Number Setup)
|
||||
|
||||
If you run signal-cli as a **linked secondary device** on your own phone number (rather than a separate bot number), you can interact with Hermes through Signal's "Note to Self" feature.
|
||||
|
||||
Just send a message to yourself from your phone — signal-cli picks it up and Hermes responds in the same conversation.
|
||||
|
||||
**How it works:**
|
||||
- "Note to Self" messages arrive as `syncMessage.sentMessage` envelopes
|
||||
- The adapter detects when these are addressed to the bot's own account and processes them as regular inbound messages
|
||||
- Echo-back protection (sent-timestamp tracking) prevents infinite loops — the bot's own replies are filtered out automatically
|
||||
|
||||
**No extra configuration needed.** This works automatically as long as `SIGNAL_ACCOUNT` matches your phone number.
|
||||
|
||||
### Health Monitoring
|
||||
|
||||
The adapter monitors the SSE connection and automatically reconnects if:
|
||||
- The connection drops (with exponential backoff: 2s → 60s)
|
||||
- No activity is detected for 120 seconds (pings signal-cli to verify)
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Solution |
|
||||
|---------|----------|
|
||||
| **"Cannot reach signal-cli"** during setup | Ensure signal-cli daemon is running: `signal-cli --account +YOUR_NUMBER daemon --http 127.0.0.1:8080` |
|
||||
| **Messages not received** | Check that `SIGNAL_ALLOWED_USERS` includes the sender's number in E.164 format (with `+` prefix) |
|
||||
| **"signal-cli not found on PATH"** | Install signal-cli and ensure it's in your PATH, or use Docker |
|
||||
| **Connection keeps dropping** | Check signal-cli logs for errors. Ensure Java 17+ is installed. |
|
||||
| **Group messages ignored** | Configure `SIGNAL_GROUP_ALLOWED_USERS` with specific group IDs, or `*` to allow all groups. |
|
||||
| **Bot responds to no one** | Configure `SIGNAL_ALLOWED_USERS`, use DM pairing, or explicitly allow all users through gateway policy if you want broader access. |
|
||||
| **Duplicate messages** | Ensure only one signal-cli instance is listening on your phone number |
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
**Always configure access controls.** The bot has terminal access by default. Without `SIGNAL_ALLOWED_USERS` or DM pairing, the gateway denies all incoming messages as a safety measure.
|
||||
:::
|
||||
|
||||
- Phone numbers are redacted in all log output
|
||||
- Use DM pairing or explicit allowlists for safe onboarding of new users
|
||||
- Keep groups disabled unless you specifically need group support, or allowlist only the groups you trust
|
||||
- Signal's end-to-end encryption protects message content in transit
|
||||
- The signal-cli session data in `~/.local/share/signal-cli/` contains account credentials — protect it like a password
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables Reference
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `SIGNAL_HTTP_URL` | Yes | — | signal-cli HTTP endpoint |
|
||||
| `SIGNAL_ACCOUNT` | Yes | — | Bot phone number (E.164) |
|
||||
| `SIGNAL_ALLOWED_USERS` | No | — | Comma-separated phone numbers/UUIDs |
|
||||
| `SIGNAL_GROUP_ALLOWED_USERS` | No | — | Group IDs to monitor, or `*` for all (omit to disable groups) |
|
||||
| `SIGNAL_ALLOW_ALL_USERS` | No | `false` | Allow any user to interact (skip allowlist) |
|
||||
| `SIGNAL_HOME_CHANNEL` | No | — | Default delivery target for cron jobs |
|
||||
274
hermes_code/website/docs/user-guide/messaging/slack.md
Normal file
274
hermes_code/website/docs/user-guide/messaging/slack.md
Normal file
|
|
@ -0,0 +1,274 @@
|
|||
---
|
||||
sidebar_position: 4
|
||||
title: "Slack"
|
||||
description: "Set up Hermes Agent as a Slack bot using Socket Mode"
|
||||
---
|
||||
|
||||
# Slack Setup
|
||||
|
||||
Connect Hermes Agent to Slack as a bot using Socket Mode. Socket Mode uses WebSockets instead of
|
||||
public HTTP endpoints, so your Hermes instance doesn't need to be publicly accessible — it works
|
||||
behind firewalls, on your laptop, or on a private server.
|
||||
|
||||
:::warning Classic Slack Apps Deprecated
|
||||
Classic Slack apps (using RTM API) were **fully deprecated in March 2025**. Hermes uses the modern
|
||||
Bolt SDK with Socket Mode. If you have an old classic app, you must create a new one following
|
||||
the steps below.
|
||||
:::
|
||||
|
||||
## Overview
|
||||
|
||||
| Component | Value |
|
||||
|-----------|-------|
|
||||
| **Library** | `slack-bolt` / `slack_sdk` for Python (Socket Mode) |
|
||||
| **Connection** | WebSocket — no public URL required |
|
||||
| **Auth tokens needed** | Bot Token (`xoxb-`) + App-Level Token (`xapp-`) |
|
||||
| **User identification** | Slack Member IDs (e.g., `U01ABC2DEF3`) |
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Create a Slack App
|
||||
|
||||
1. Go to [https://api.slack.com/apps](https://api.slack.com/apps)
|
||||
2. Click **Create New App**
|
||||
3. Choose **From scratch**
|
||||
4. Enter an app name (e.g., "Hermes Agent") and select your workspace
|
||||
5. Click **Create App**
|
||||
|
||||
You'll land on the app's **Basic Information** page.
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Configure Bot Token Scopes
|
||||
|
||||
Navigate to **Features → OAuth & Permissions** in the sidebar. Scroll to **Scopes → Bot Token Scopes** and add the following:
|
||||
|
||||
| Scope | Purpose |
|
||||
|-------|---------|
|
||||
| `chat:write` | Send messages as the bot |
|
||||
| `app_mentions:read` | Detect when @mentioned in channels |
|
||||
| `channels:history` | Read messages in public channels the bot is in |
|
||||
| `channels:read` | List and get info about public channels |
|
||||
| `groups:history` | Read messages in private channels the bot is invited to |
|
||||
| `im:history` | Read direct message history |
|
||||
| `im:read` | View basic DM info |
|
||||
| `im:write` | Open and manage DMs |
|
||||
| `users:read` | Look up user information |
|
||||
| `files:write` | Upload files (images, audio, documents) |
|
||||
|
||||
:::caution Missing scopes = missing features
|
||||
Without `channels:history` and `groups:history`, the bot **will not receive messages in channels** —
|
||||
it will only work in DMs. These are the most commonly missed scopes.
|
||||
:::
|
||||
|
||||
**Optional scopes:**
|
||||
|
||||
| Scope | Purpose |
|
||||
|-------|---------|
|
||||
| `groups:read` | List and get info about private channels |
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Enable Socket Mode
|
||||
|
||||
Socket Mode lets the bot connect via WebSocket instead of requiring a public URL.
|
||||
|
||||
1. In the sidebar, go to **Settings → Socket Mode**
|
||||
2. Toggle **Enable Socket Mode** to ON
|
||||
3. You'll be prompted to create an **App-Level Token**:
|
||||
- Name it something like `hermes-socket` (the name doesn't matter)
|
||||
- Add the **`connections:write`** scope
|
||||
- Click **Generate**
|
||||
4. **Copy the token** — it starts with `xapp-`. This is your `SLACK_APP_TOKEN`
|
||||
|
||||
:::tip
|
||||
You can always find or regenerate app-level tokens under **Settings → Basic Information → App-Level Tokens**.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Subscribe to Events
|
||||
|
||||
This step is critical — it controls what messages the bot can see.
|
||||
|
||||
|
||||
1. In the sidebar, go to **Features → Event Subscriptions**
|
||||
2. Toggle **Enable Events** to ON
|
||||
3. Expand **Subscribe to bot events** and add:
|
||||
|
||||
| Event | Required? | Purpose |
|
||||
|-------|-----------|---------|
|
||||
| `message.im` | **Yes** | Bot receives direct messages |
|
||||
| `message.channels` | **Yes** | Bot receives messages in **public** channels it's added to |
|
||||
| `message.groups` | **Recommended** | Bot receives messages in **private** channels it's invited to |
|
||||
| `app_mention` | **Yes** | Prevents Bolt SDK errors when bot is @mentioned |
|
||||
|
||||
4. Click **Save Changes** at the bottom of the page
|
||||
|
||||
:::danger Missing event subscriptions is the #1 setup issue
|
||||
If the bot works in DMs but **not in channels**, you almost certainly forgot to add
|
||||
`message.channels` (for public channels) and/or `message.groups` (for private channels).
|
||||
Without these events, Slack simply never delivers channel messages to the bot.
|
||||
:::
|
||||
|
||||
|
||||
---
|
||||
|
||||
## Step 5: Install App to Workspace
|
||||
|
||||
1. In the sidebar, go to **Settings → Install App**
|
||||
2. Click **Install to Workspace**
|
||||
3. Review the permissions and click **Allow**
|
||||
4. After authorization, you'll see a **Bot User OAuth Token** starting with `xoxb-`
|
||||
5. **Copy this token** — this is your `SLACK_BOT_TOKEN`
|
||||
|
||||
:::tip
|
||||
If you change scopes or event subscriptions later, you **must reinstall the app** for the changes
|
||||
to take effect. The Install App page will show a banner prompting you to do so.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Step 6: Find User IDs for the Allowlist
|
||||
|
||||
Hermes uses Slack **Member IDs** (not usernames or display names) for the allowlist.
|
||||
|
||||
To find a Member ID:
|
||||
|
||||
1. In Slack, click on the user's name or avatar
|
||||
2. Click **View full profile**
|
||||
3. Click the **⋮** (more) button
|
||||
4. Select **Copy member ID**
|
||||
|
||||
Member IDs look like `U01ABC2DEF3`. You need your own Member ID at minimum.
|
||||
|
||||
---
|
||||
|
||||
## Step 7: Configure Hermes
|
||||
|
||||
Add the following to your `~/.hermes/.env` file:
|
||||
|
||||
```bash
|
||||
# Required
|
||||
SLACK_BOT_TOKEN=xoxb-your-bot-token-here
|
||||
SLACK_APP_TOKEN=xapp-your-app-token-here
|
||||
SLACK_ALLOWED_USERS=U01ABC2DEF3 # Comma-separated Member IDs
|
||||
|
||||
# Optional
|
||||
SLACK_HOME_CHANNEL=C01234567890 # Default channel for cron/scheduled messages
|
||||
SLACK_HOME_CHANNEL_NAME=general # Human-readable name for the home channel (optional)
|
||||
```
|
||||
|
||||
Or run the interactive setup:
|
||||
|
||||
```bash
|
||||
hermes gateway setup # Select Slack when prompted
|
||||
```
|
||||
|
||||
Then start the gateway:
|
||||
|
||||
```bash
|
||||
hermes gateway # Foreground
|
||||
hermes gateway install # Install as a user service
|
||||
sudo hermes gateway install --system # Linux only: boot-time system service
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 8: Invite the Bot to Channels
|
||||
|
||||
After starting the gateway, you need to **invite the bot** to any channel where you want it to respond:
|
||||
|
||||
```
|
||||
/invite @Hermes Agent
|
||||
```
|
||||
|
||||
The bot will **not** automatically join channels. You must invite it to each channel individually.
|
||||
|
||||
---
|
||||
|
||||
## How the Bot Responds
|
||||
|
||||
Understanding how Hermes behaves in different contexts:
|
||||
|
||||
| Context | Behavior |
|
||||
|---------|----------|
|
||||
| **DMs** | Bot responds to every message — no @mention needed |
|
||||
| **Channels** | Bot **only responds when @mentioned** (e.g., `@Hermes Agent what time is it?`). In channels, Hermes replies in a thread attached to that message. |
|
||||
| **Threads** | If you @mention Hermes inside an existing thread, it replies in that same thread. |
|
||||
|
||||
:::tip
|
||||
In channels, always @mention the bot. Simply typing a message without mentioning it will be ignored.
|
||||
This is intentional — it prevents the bot from responding to every message in busy channels.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
|
||||
## Home Channel
|
||||
|
||||
Set `SLACK_HOME_CHANNEL` to a channel ID where Hermes will deliver scheduled messages,
|
||||
cron job results, and other proactive notifications. To find a channel ID:
|
||||
|
||||
1. Right-click the channel name in Slack
|
||||
2. Click **View channel details**
|
||||
3. Scroll to the bottom — the Channel ID is shown there
|
||||
|
||||
```bash
|
||||
SLACK_HOME_CHANNEL=C01234567890
|
||||
```
|
||||
|
||||
Make sure the bot has been **invited to the channel** (`/invite @Hermes Agent`).
|
||||
|
||||
---
|
||||
|
||||
## Voice Messages
|
||||
|
||||
Hermes supports voice on Slack:
|
||||
|
||||
- **Incoming:** Voice/audio messages are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Outgoing:** TTS responses are sent as audio file attachments
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Solution |
|
||||
|---------|----------|
|
||||
| Bot doesn't respond to DMs | Verify `message.im` is in your event subscriptions and the app is reinstalled |
|
||||
| Bot works in DMs but not in channels | **Most common issue.** Add `message.channels` and `message.groups` to event subscriptions, reinstall the app, and invite the bot to the channel with `/invite @Hermes Agent` |
|
||||
| Bot doesn't respond to @mentions in channels | 1) Check `message.channels` event is subscribed. 2) Bot must be invited to the channel. 3) Ensure `channels:history` scope is added. 4) Reinstall the app after scope/event changes |
|
||||
| Bot ignores messages in private channels | Add both the `message.groups` event subscription and `groups:history` scope, then reinstall the app and `/invite` the bot |
|
||||
| "not_authed" or "invalid_auth" errors | Regenerate your Bot Token and App Token, update `.env` |
|
||||
| Bot responds but can't post in a channel | Invite the bot to the channel with `/invite @Hermes Agent` |
|
||||
| "missing_scope" error | Add the required scope in OAuth & Permissions, then **reinstall** the app |
|
||||
| Socket disconnects frequently | Check your network; Bolt auto-reconnects but unstable connections cause lag |
|
||||
| Changed scopes/events but nothing changed | You **must reinstall** the app to your workspace after any scope or event subscription change |
|
||||
|
||||
### Quick Checklist
|
||||
|
||||
If the bot isn't working in channels, verify **all** of the following:
|
||||
|
||||
1. ✅ `message.channels` event is subscribed (for public channels)
|
||||
2. ✅ `message.groups` event is subscribed (for private channels)
|
||||
3. ✅ `app_mention` event is subscribed
|
||||
4. ✅ `channels:history` scope is added (for public channels)
|
||||
5. ✅ `groups:history` scope is added (for private channels)
|
||||
6. ✅ App was **reinstalled** after adding scopes/events
|
||||
7. ✅ Bot was **invited** to the channel (`/invite @Hermes Agent`)
|
||||
8. ✅ You are **@mentioning** the bot in your message
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
**Always set `SLACK_ALLOWED_USERS`** with the Member IDs of authorized users. Without this setting,
|
||||
the gateway will **deny all messages** by default as a safety measure. Never share your bot tokens —
|
||||
treat them like passwords.
|
||||
:::
|
||||
|
||||
- Tokens should be stored in `~/.hermes/.env` (file permissions `600`)
|
||||
- Rotate tokens periodically via the Slack app settings
|
||||
- Audit who has access to your Hermes config directory
|
||||
- Socket Mode means no public endpoint is exposed — one less attack surface
|
||||
175
hermes_code/website/docs/user-guide/messaging/sms.md
Normal file
175
hermes_code/website/docs/user-guide/messaging/sms.md
Normal file
|
|
@ -0,0 +1,175 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "SMS (Twilio)"
|
||||
description: "Set up Hermes Agent as an SMS chatbot via Twilio"
|
||||
---
|
||||
|
||||
# SMS Setup (Twilio)
|
||||
|
||||
Hermes connects to SMS through the [Twilio](https://www.twilio.com/) API. People text your Twilio phone number and get AI responses back — same conversational experience as Telegram or Discord, but over standard text messages.
|
||||
|
||||
:::info Shared Credentials
|
||||
The SMS gateway shares credentials with the optional [telephony skill](/docs/reference/skills-catalog). If you've already set up Twilio for voice calls or one-off SMS, the gateway works with the same `TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`, and `TWILIO_PHONE_NUMBER`.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Twilio account** — [Sign up at twilio.com](https://www.twilio.com/try-twilio) (free trial available)
|
||||
- **A Twilio phone number** with SMS capability
|
||||
- **A publicly accessible server** — Twilio sends webhooks to your server when SMS arrives
|
||||
- **aiohttp** — `pip install 'hermes-agent[sms]'`
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Get Your Twilio Credentials
|
||||
|
||||
1. Go to the [Twilio Console](https://console.twilio.com/)
|
||||
2. Copy your **Account SID** and **Auth Token** from the dashboard
|
||||
3. Go to **Phone Numbers → Manage → Active Numbers** — note your phone number in E.164 format (e.g., `+15551234567`)
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Configure Hermes
|
||||
|
||||
### Interactive setup (recommended)
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **SMS (Twilio)** from the platform list. The wizard will prompt for your credentials.
|
||||
|
||||
### Manual setup
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
|
||||
TWILIO_AUTH_TOKEN=your_auth_token_here
|
||||
TWILIO_PHONE_NUMBER=+15551234567
|
||||
|
||||
# Security: restrict to specific phone numbers (recommended)
|
||||
SMS_ALLOWED_USERS=+15559876543,+15551112222
|
||||
|
||||
# Optional: set a home channel for cron job delivery
|
||||
SMS_HOME_CHANNEL=+15559876543
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Configure Twilio Webhook
|
||||
|
||||
Twilio needs to know where to send incoming messages. In the [Twilio Console](https://console.twilio.com/):
|
||||
|
||||
1. Go to **Phone Numbers → Manage → Active Numbers**
|
||||
2. Click your phone number
|
||||
3. Under **Messaging → A MESSAGE COMES IN**, set:
|
||||
- **Webhook**: `https://your-server:8080/webhooks/twilio`
|
||||
- **HTTP Method**: `POST`
|
||||
|
||||
:::tip Exposing Your Webhook
|
||||
If you're running Hermes locally, use a tunnel to expose the webhook:
|
||||
|
||||
```bash
|
||||
# Using cloudflared
|
||||
cloudflared tunnel --url http://localhost:8080
|
||||
|
||||
# Using ngrok
|
||||
ngrok http 8080
|
||||
```
|
||||
|
||||
Set the resulting public URL as your Twilio webhook.
|
||||
:::
|
||||
|
||||
The webhook port defaults to `8080`. Override with:
|
||||
|
||||
```bash
|
||||
SMS_WEBHOOK_PORT=3000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Step 4: Start the Gateway
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
You should see:
|
||||
|
||||
```
|
||||
[sms] Twilio webhook server listening on port 8080, from: +1555***4567
|
||||
```
|
||||
|
||||
Text your Twilio number — Hermes will respond via SMS.
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `TWILIO_ACCOUNT_SID` | Yes | Twilio Account SID (starts with `AC`) |
|
||||
| `TWILIO_AUTH_TOKEN` | Yes | Twilio Auth Token |
|
||||
| `TWILIO_PHONE_NUMBER` | Yes | Your Twilio phone number (E.164 format) |
|
||||
| `SMS_WEBHOOK_PORT` | No | Webhook listener port (default: `8080`) |
|
||||
| `SMS_ALLOWED_USERS` | No | Comma-separated E.164 phone numbers allowed to chat |
|
||||
| `SMS_ALLOW_ALL_USERS` | No | Set to `true` to allow anyone (not recommended) |
|
||||
| `SMS_HOME_CHANNEL` | No | Phone number for cron job / notification delivery |
|
||||
| `SMS_HOME_CHANNEL_NAME` | No | Display name for the home channel (default: `Home`) |
|
||||
|
||||
---
|
||||
|
||||
## SMS-Specific Behavior
|
||||
|
||||
- **Plain text only** — Markdown is automatically stripped since SMS renders it as literal characters
|
||||
- **1600 character limit** — Longer responses are split across multiple messages at natural boundaries (newlines, then spaces)
|
||||
- **Echo prevention** — Messages from your own Twilio number are ignored to prevent loops
|
||||
- **Phone number redaction** — Phone numbers are redacted in logs for privacy
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
**The gateway denies all users by default.** Configure an allowlist:
|
||||
|
||||
```bash
|
||||
# Recommended: restrict to specific phone numbers
|
||||
SMS_ALLOWED_USERS=+15559876543,+15551112222
|
||||
|
||||
# Or allow all (NOT recommended for bots with terminal access)
|
||||
SMS_ALLOW_ALL_USERS=true
|
||||
```
|
||||
|
||||
:::warning
|
||||
SMS has no built-in encryption. Don't use SMS for sensitive operations unless you understand the security implications. For sensitive use cases, prefer Signal or Telegram.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Messages not arriving
|
||||
|
||||
1. Check your Twilio webhook URL is correct and publicly accessible
|
||||
2. Verify `TWILIO_ACCOUNT_SID` and `TWILIO_AUTH_TOKEN` are correct
|
||||
3. Check the Twilio Console → **Monitor → Logs → Messaging** for delivery errors
|
||||
4. Ensure your phone number is in `SMS_ALLOWED_USERS` (or `SMS_ALLOW_ALL_USERS=true`)
|
||||
|
||||
### Replies not sending
|
||||
|
||||
1. Check `TWILIO_PHONE_NUMBER` is set correctly (E.164 format with `+`)
|
||||
2. Verify your Twilio account has SMS-capable numbers
|
||||
3. Check Hermes gateway logs for Twilio API errors
|
||||
|
||||
### Webhook port conflicts
|
||||
|
||||
If port 8080 is already in use, change it:
|
||||
|
||||
```bash
|
||||
SMS_WEBHOOK_PORT=3001
|
||||
```
|
||||
|
||||
Update the webhook URL in Twilio Console to match.
|
||||
200
hermes_code/website/docs/user-guide/messaging/telegram.md
Normal file
200
hermes_code/website/docs/user-guide/messaging/telegram.md
Normal file
|
|
@ -0,0 +1,200 @@
|
|||
---
|
||||
sidebar_position: 1
|
||||
title: "Telegram"
|
||||
description: "Set up Hermes Agent as a Telegram bot"
|
||||
---
|
||||
|
||||
# Telegram Setup
|
||||
|
||||
Hermes Agent integrates with Telegram as a full-featured conversational bot. Once connected, you can chat with your agent from any device, send voice memos that get auto-transcribed, receive scheduled task results, and use the agent in group chats. The integration is built on [python-telegram-bot](https://python-telegram-bot.org/) and supports text, voice, images, and file attachments.
|
||||
|
||||
## Step 1: Create a Bot via BotFather
|
||||
|
||||
Every Telegram bot requires an API token issued by [@BotFather](https://t.me/BotFather), Telegram's official bot management tool.
|
||||
|
||||
1. Open Telegram and search for **@BotFather**, or visit [t.me/BotFather](https://t.me/BotFather)
|
||||
2. Send `/newbot`
|
||||
3. Choose a **display name** (e.g., "Hermes Agent") — this can be anything
|
||||
4. Choose a **username** — this must be unique and end in `bot` (e.g., `my_hermes_bot`)
|
||||
5. BotFather replies with your **API token**. It looks like this:
|
||||
|
||||
```
|
||||
123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
|
||||
```
|
||||
|
||||
:::warning
|
||||
Keep your bot token secret. Anyone with this token can control your bot. If it leaks, revoke it immediately via `/revoke` in BotFather.
|
||||
:::
|
||||
|
||||
## Step 2: Customize Your Bot (Optional)
|
||||
|
||||
These BotFather commands improve the user experience. Message @BotFather and use:
|
||||
|
||||
| Command | Purpose |
|
||||
|---------|---------|
|
||||
| `/setdescription` | The "What can this bot do?" text shown before a user starts chatting |
|
||||
| `/setabouttext` | Short text on the bot's profile page |
|
||||
| `/setuserpic` | Upload an avatar for your bot |
|
||||
| `/setcommands` | Define the command menu (the `/` button in chat) |
|
||||
| `/setprivacy` | Control whether the bot sees all group messages (see Step 3) |
|
||||
|
||||
:::tip
|
||||
For `/setcommands`, a useful starting set:
|
||||
|
||||
```
|
||||
help - Show help information
|
||||
new - Start a new conversation
|
||||
sethome - Set this chat as the home channel
|
||||
```
|
||||
:::
|
||||
|
||||
## Step 3: Privacy Mode (Critical for Groups)
|
||||
|
||||
Telegram bots have a **privacy mode** that is **enabled by default**. This is the single most common source of confusion when using bots in groups.
|
||||
|
||||
**With privacy mode ON**, your bot can only see:
|
||||
- Messages that start with a `/` command
|
||||
- Replies directly to the bot's own messages
|
||||
- Service messages (member joins/leaves, pinned messages, etc.)
|
||||
- Messages in channels where the bot is an admin
|
||||
|
||||
**With privacy mode OFF**, the bot receives every message in the group.
|
||||
|
||||
### How to disable privacy mode
|
||||
|
||||
1. Message **@BotFather**
|
||||
2. Send `/mybots`
|
||||
3. Select your bot
|
||||
4. Go to **Bot Settings → Group Privacy → Turn off**
|
||||
|
||||
:::warning
|
||||
**You must remove and re-add the bot to any group** after changing the privacy setting. Telegram caches the privacy state when a bot joins a group, and it will not update until the bot is removed and re-added.
|
||||
:::
|
||||
|
||||
:::tip
|
||||
An alternative to disabling privacy mode: promote the bot to **group admin**. Admin bots always receive all messages regardless of the privacy setting, and this avoids needing to toggle the global privacy mode.
|
||||
:::
|
||||
|
||||
## Step 4: Find Your User ID
|
||||
|
||||
Hermes Agent uses numeric Telegram user IDs to control access. Your user ID is **not** your username — it's a number like `123456789`.
|
||||
|
||||
**Method 1 (recommended):** Message [@userinfobot](https://t.me/userinfobot) — it instantly replies with your user ID.
|
||||
|
||||
**Method 2:** Message [@get_id_bot](https://t.me/get_id_bot) — another reliable option.
|
||||
|
||||
Save this number; you'll need it for the next step.
|
||||
|
||||
## Step 5: Configure Hermes
|
||||
|
||||
### Option A: Interactive Setup (Recommended)
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Select **Telegram** when prompted. The wizard asks for your bot token and allowed user IDs, then writes the configuration for you.
|
||||
|
||||
### Option B: Manual Configuration
|
||||
|
||||
Add the following to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
TELEGRAM_BOT_TOKEN=123456789:ABCdefGHIjklMNOpqrSTUvwxYZ
|
||||
TELEGRAM_ALLOWED_USERS=123456789 # Comma-separated for multiple users
|
||||
```
|
||||
|
||||
### Start the Gateway
|
||||
|
||||
```bash
|
||||
hermes gateway
|
||||
```
|
||||
|
||||
The bot should come online within seconds. Send it a message on Telegram to verify.
|
||||
|
||||
## Home Channel
|
||||
|
||||
Use the `/sethome` command in any Telegram chat (DM or group) to designate it as the **home channel**. Scheduled tasks (cron jobs) deliver their results to this channel.
|
||||
|
||||
You can also set it manually in `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
TELEGRAM_HOME_CHANNEL=-1001234567890
|
||||
TELEGRAM_HOME_CHANNEL_NAME="My Notes"
|
||||
```
|
||||
|
||||
:::tip
|
||||
Group chat IDs are negative numbers (e.g., `-1001234567890`). Your personal DM chat ID is the same as your user ID.
|
||||
:::
|
||||
|
||||
## Voice Messages
|
||||
|
||||
### Incoming Voice (Speech-to-Text)
|
||||
|
||||
Voice messages you send on Telegram are automatically transcribed by Hermes's configured STT provider and injected as text into the conversation.
|
||||
|
||||
- `local` uses `faster-whisper` on the machine running Hermes — no API key required
|
||||
- `groq` uses Groq Whisper and requires `GROQ_API_KEY`
|
||||
- `openai` uses OpenAI Whisper and requires `VOICE_TOOLS_OPENAI_KEY`
|
||||
|
||||
### Outgoing Voice (Text-to-Speech)
|
||||
|
||||
When the agent generates audio via TTS, it's delivered as native Telegram **voice bubbles** — the round, inline-playable kind.
|
||||
|
||||
- **OpenAI and ElevenLabs** produce Opus natively — no extra setup needed
|
||||
- **Edge TTS** (the default free provider) outputs MP3 and requires **ffmpeg** to convert to Opus:
|
||||
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install ffmpeg
|
||||
|
||||
# macOS
|
||||
brew install ffmpeg
|
||||
```
|
||||
|
||||
Without ffmpeg, Edge TTS audio is sent as a regular audio file (still playable, but uses the rectangular player instead of a voice bubble).
|
||||
|
||||
Configure the TTS provider in your `config.yaml` under the `tts.provider` key.
|
||||
|
||||
## Group Chat Usage
|
||||
|
||||
Hermes Agent works in Telegram group chats with a few considerations:
|
||||
|
||||
- **Privacy mode** determines what messages the bot can see (see [Step 3](#step-3-privacy-mode-critical-for-groups))
|
||||
- When privacy mode is on, **@mention the bot** (e.g., `@my_hermes_bot what's the weather?`) or **reply to its messages** to interact
|
||||
- When privacy mode is off (or bot is admin), the bot sees all messages and can participate naturally
|
||||
- `TELEGRAM_ALLOWED_USERS` still applies — only authorized users can trigger the bot, even in groups
|
||||
|
||||
## Recent Bot API Features (2024–2025)
|
||||
|
||||
- **Privacy policy:** Telegram now requires bots to have a privacy policy. Set one via BotFather with `/setprivacy_policy`, or Telegram may auto-generate a placeholder. This is particularly important if your bot is public-facing.
|
||||
- **Message streaming:** Bot API 9.x added support for streaming long responses, which can improve perceived latency for lengthy agent replies.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Solution |
|
||||
|---------|----------|
|
||||
| Bot not responding at all | Verify `TELEGRAM_BOT_TOKEN` is correct. Check `hermes gateway` logs for errors. |
|
||||
| Bot responds with "unauthorized" | Your user ID is not in `TELEGRAM_ALLOWED_USERS`. Double-check with @userinfobot. |
|
||||
| Bot ignores group messages | Privacy mode is likely on. Disable it (Step 3) or make the bot a group admin. **Remember to remove and re-add the bot after changing privacy.** |
|
||||
| Voice messages not transcribed | Verify STT is available: install `faster-whisper` for local transcription, or set `GROQ_API_KEY` / `VOICE_TOOLS_OPENAI_KEY` in `~/.hermes/.env`. |
|
||||
| Voice replies are files, not bubbles | Install `ffmpeg` (needed for Edge TTS Opus conversion). |
|
||||
| Bot token revoked/invalid | Generate a new token via `/revoke` then `/newbot` or `/token` in BotFather. Update your `.env` file. |
|
||||
|
||||
## Exec Approval
|
||||
|
||||
When the agent tries to run a potentially dangerous command, it asks you for approval in the chat:
|
||||
|
||||
> ⚠️ This command is potentially dangerous (recursive delete). Reply "yes" to approve.
|
||||
|
||||
Reply "yes"/"y" to approve or "no"/"n" to deny.
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
Always set `TELEGRAM_ALLOWED_USERS` to restrict who can interact with your bot. Without it, the gateway denies all users by default as a safety measure.
|
||||
:::
|
||||
|
||||
Never share your bot token publicly. If compromised, revoke it immediately via BotFather's `/revoke` command.
|
||||
|
||||
For more details, see the [Security documentation](/user-guide/security). You can also use [DM pairing](/user-guide/messaging#dm-pairing-alternative-to-allowlists) for a more dynamic approach to user authorization.
|
||||
310
hermes_code/website/docs/user-guide/messaging/webhooks.md
Normal file
310
hermes_code/website/docs/user-guide/messaging/webhooks.md
Normal file
|
|
@ -0,0 +1,310 @@
|
|||
---
|
||||
sidebar_position: 13
|
||||
title: "Webhooks"
|
||||
description: "Receive events from GitHub, GitLab, and other services to trigger Hermes agent runs"
|
||||
---
|
||||
|
||||
# Webhooks
|
||||
|
||||
Receive events from external services (GitHub, GitLab, JIRA, Stripe, etc.) and trigger Hermes agent runs automatically. The webhook adapter runs an HTTP server that accepts POST requests, validates HMAC signatures, transforms payloads into agent prompts, and routes responses back to the source or to another configured platform.
|
||||
|
||||
The agent processes the event and can respond by posting comments on PRs, sending messages to Telegram/Discord, or logging the result.
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. Enable via `hermes gateway setup` or environment variables
|
||||
2. Define webhook routes in `config.yaml`
|
||||
3. Point your service at `http://your-server:8644/webhooks/<route-name>`
|
||||
|
||||
---
|
||||
|
||||
## Setup
|
||||
|
||||
There are two ways to enable the webhook adapter.
|
||||
|
||||
### Via setup wizard
|
||||
|
||||
```bash
|
||||
hermes gateway setup
|
||||
```
|
||||
|
||||
Follow the prompts to enable webhooks, set the port, and set a global HMAC secret.
|
||||
|
||||
### Via environment variables
|
||||
|
||||
Add to `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
WEBHOOK_ENABLED=true
|
||||
WEBHOOK_PORT=8644 # default
|
||||
WEBHOOK_SECRET=your-global-secret
|
||||
```
|
||||
|
||||
### Verify the server
|
||||
|
||||
Once the gateway is running:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8644/health
|
||||
```
|
||||
|
||||
Expected response:
|
||||
|
||||
```json
|
||||
{"status": "ok", "platform": "webhook"}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuring Routes {#configuring-routes}
|
||||
|
||||
Routes define how different webhook sources are handled. Each route is a named entry under `platforms.webhook.extra.routes` in your `config.yaml`.
|
||||
|
||||
### Route properties
|
||||
|
||||
| Property | Required | Description |
|
||||
|----------|----------|-------------|
|
||||
| `events` | No | List of event types to accept (e.g. `["pull_request"]`). If empty, all events are accepted. Event type is read from `X-GitHub-Event`, `X-GitLab-Event`, or `event_type` in the payload. |
|
||||
| `secret` | **Yes** | HMAC secret for signature validation. Falls back to the global `secret` if not set on the route. Set to `"INSECURE_NO_AUTH"` for testing only (skips validation). |
|
||||
| `prompt` | No | Template string with dot-notation payload access (e.g. `{pull_request.title}`). If omitted, the full JSON payload is dumped into the prompt. |
|
||||
| `skills` | No | List of skill names to load for the agent run. |
|
||||
| `deliver` | No | Where to send the response: `github_comment`, `telegram`, `discord`, `slack`, `signal`, `sms`, or `log` (default). |
|
||||
| `deliver_extra` | No | Additional delivery config — keys depend on `deliver` type (e.g. `repo`, `pr_number`, `chat_id`). Values support the same `{dot.notation}` templates as `prompt`. |
|
||||
|
||||
### Full example
|
||||
|
||||
```yaml
|
||||
platforms:
|
||||
webhook:
|
||||
enabled: true
|
||||
extra:
|
||||
port: 8644
|
||||
secret: "global-fallback-secret"
|
||||
routes:
|
||||
github-pr:
|
||||
events: ["pull_request"]
|
||||
secret: "github-webhook-secret"
|
||||
prompt: |
|
||||
Review this pull request:
|
||||
Repository: {repository.full_name}
|
||||
PR #{number}: {pull_request.title}
|
||||
Author: {pull_request.user.login}
|
||||
URL: {pull_request.html_url}
|
||||
Diff URL: {pull_request.diff_url}
|
||||
Action: {action}
|
||||
skills: ["github-code-review"]
|
||||
deliver: "github_comment"
|
||||
deliver_extra:
|
||||
repo: "{repository.full_name}"
|
||||
pr_number: "{number}"
|
||||
deploy-notify:
|
||||
events: ["push"]
|
||||
secret: "deploy-secret"
|
||||
prompt: "New push to {repository.full_name} branch {ref}: {head_commit.message}"
|
||||
deliver: "telegram"
|
||||
```
|
||||
|
||||
### Prompt Templates
|
||||
|
||||
Prompts use dot-notation to access nested fields in the webhook payload:
|
||||
|
||||
- `{pull_request.title}` resolves to `payload["pull_request"]["title"]`
|
||||
- `{repository.full_name}` resolves to `payload["repository"]["full_name"]`
|
||||
- Missing keys are left as the literal `{key}` string (no error)
|
||||
- Nested dicts and lists are JSON-serialized and truncated at 2000 characters
|
||||
|
||||
If no `prompt` template is configured for a route, the entire payload is dumped as indented JSON (truncated at 4000 characters).
|
||||
|
||||
The same dot-notation templates work in `deliver_extra` values.
|
||||
|
||||
---
|
||||
|
||||
## GitHub PR Review (Step by Step) {#github-pr-review}
|
||||
|
||||
This walkthrough sets up automatic code review on every pull request.
|
||||
|
||||
### 1. Create the webhook in GitHub
|
||||
|
||||
1. Go to your repository → **Settings** → **Webhooks** → **Add webhook**
|
||||
2. Set **Payload URL** to `http://your-server:8644/webhooks/github-pr`
|
||||
3. Set **Content type** to `application/json`
|
||||
4. Set **Secret** to match your route config (e.g. `github-webhook-secret`)
|
||||
5. Under **Which events?**, select **Let me select individual events** and check **Pull requests**
|
||||
6. Click **Add webhook**
|
||||
|
||||
### 2. Add the route config
|
||||
|
||||
Add the `github-pr` route to your `~/.hermes/config.yaml` as shown in the example above.
|
||||
|
||||
### 3. Ensure `gh` CLI is authenticated
|
||||
|
||||
The `github_comment` delivery type uses the GitHub CLI to post comments:
|
||||
|
||||
```bash
|
||||
gh auth login
|
||||
```
|
||||
|
||||
### 4. Test it
|
||||
|
||||
Open a pull request on the repository. The webhook fires, Hermes processes the event, and posts a review comment on the PR.
|
||||
|
||||
---
|
||||
|
||||
## GitLab Webhook Setup {#gitlab-webhook-setup}
|
||||
|
||||
GitLab webhooks work similarly but use a different authentication mechanism. GitLab sends the secret as a plain `X-Gitlab-Token` header (exact string match, not HMAC).
|
||||
|
||||
### 1. Create the webhook in GitLab
|
||||
|
||||
1. Go to your project → **Settings** → **Webhooks**
|
||||
2. Set the **URL** to `http://your-server:8644/webhooks/gitlab-mr`
|
||||
3. Enter your **Secret token**
|
||||
4. Select **Merge request events** (and any other events you want)
|
||||
5. Click **Add webhook**
|
||||
|
||||
### 2. Add the route config
|
||||
|
||||
```yaml
|
||||
platforms:
|
||||
webhook:
|
||||
enabled: true
|
||||
extra:
|
||||
routes:
|
||||
gitlab-mr:
|
||||
events: ["merge_request"]
|
||||
secret: "your-gitlab-secret-token"
|
||||
prompt: |
|
||||
Review this merge request:
|
||||
Project: {project.path_with_namespace}
|
||||
MR !{object_attributes.iid}: {object_attributes.title}
|
||||
Author: {object_attributes.last_commit.author.name}
|
||||
URL: {object_attributes.url}
|
||||
Action: {object_attributes.action}
|
||||
deliver: "log"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Delivery Options {#delivery-options}
|
||||
|
||||
The `deliver` field controls where the agent's response goes after processing the webhook event.
|
||||
|
||||
| Deliver Type | Description |
|
||||
|-------------|-------------|
|
||||
| `log` | Logs the response to the gateway log output. This is the default and is useful for testing. |
|
||||
| `github_comment` | Posts the response as a PR/issue comment via the `gh` CLI. Requires `deliver_extra.repo` and `deliver_extra.pr_number`. The `gh` CLI must be installed and authenticated on the gateway host (`gh auth login`). |
|
||||
| `telegram` | Routes the response to Telegram. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
|
||||
| `discord` | Routes the response to Discord. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
|
||||
| `slack` | Routes the response to Slack. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
|
||||
| `signal` | Routes the response to Signal. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
|
||||
| `sms` | Routes the response to SMS via Twilio. Uses the home channel, or specify `chat_id` in `deliver_extra`. |
|
||||
|
||||
For cross-platform delivery (telegram, discord, slack, signal, sms), the target platform must also be enabled and connected in the gateway. If no `chat_id` is provided in `deliver_extra`, the response is sent to that platform's configured home channel.
|
||||
|
||||
---
|
||||
|
||||
## Security {#security}
|
||||
|
||||
The webhook adapter includes multiple layers of security:
|
||||
|
||||
### HMAC signature validation
|
||||
|
||||
The adapter validates incoming webhook signatures using the appropriate method for each source:
|
||||
|
||||
- **GitHub**: `X-Hub-Signature-256` header — HMAC-SHA256 hex digest prefixed with `sha256=`
|
||||
- **GitLab**: `X-Gitlab-Token` header — plain secret string match
|
||||
- **Generic**: `X-Webhook-Signature` header — raw HMAC-SHA256 hex digest
|
||||
|
||||
If a secret is configured but no recognized signature header is present, the request is rejected.
|
||||
|
||||
### Secret is required
|
||||
|
||||
Every route must have a secret — either set directly on the route or inherited from the global `secret`. Routes without a secret cause the adapter to fail at startup with an error. For development/testing only, you can set the secret to `"INSECURE_NO_AUTH"` to skip validation entirely.
|
||||
|
||||
### Rate limiting
|
||||
|
||||
Each route is rate-limited to **30 requests per minute** by default (fixed-window). Configure this globally:
|
||||
|
||||
```yaml
|
||||
platforms:
|
||||
webhook:
|
||||
extra:
|
||||
rate_limit: 60 # requests per minute
|
||||
```
|
||||
|
||||
Requests exceeding the limit receive a `429 Too Many Requests` response.
|
||||
|
||||
### Idempotency
|
||||
|
||||
Delivery IDs (from `X-GitHub-Delivery`, `X-Request-ID`, or a timestamp fallback) are cached for **1 hour**. Duplicate deliveries (e.g. webhook retries) are silently skipped with a `200` response, preventing duplicate agent runs.
|
||||
|
||||
### Body size limits
|
||||
|
||||
Payloads exceeding **1 MB** are rejected before the body is read. Configure this:
|
||||
|
||||
```yaml
|
||||
platforms:
|
||||
webhook:
|
||||
extra:
|
||||
max_body_bytes: 2097152 # 2 MB
|
||||
```
|
||||
|
||||
### Prompt injection risk
|
||||
|
||||
:::warning
|
||||
Webhook payloads contain attacker-controlled data — PR titles, commit messages, issue descriptions, etc. can all contain malicious instructions. Run the gateway in a sandboxed environment (Docker, VM) when exposed to the internet. Consider using the Docker or SSH terminal backend for isolation.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting {#troubleshooting}
|
||||
|
||||
### Webhook not arriving
|
||||
|
||||
- Verify the port is exposed and accessible from the webhook source
|
||||
- Check firewall rules — port `8644` (or your configured port) must be open
|
||||
- Verify the URL path matches: `http://your-server:8644/webhooks/<route-name>`
|
||||
- Use the `/health` endpoint to confirm the server is running
|
||||
|
||||
### Signature validation failing
|
||||
|
||||
- Ensure the secret in your route config exactly matches the secret configured in the webhook source
|
||||
- For GitHub, the secret is HMAC-based — check `X-Hub-Signature-256`
|
||||
- For GitLab, the secret is a plain token match — check `X-Gitlab-Token`
|
||||
- Check gateway logs for `Invalid signature` warnings
|
||||
|
||||
### Event being ignored
|
||||
|
||||
- Check that the event type is in your route's `events` list
|
||||
- GitHub events use values like `pull_request`, `push`, `issues` (the `X-GitHub-Event` header value)
|
||||
- GitLab events use values like `merge_request`, `push` (the `X-GitLab-Event` header value)
|
||||
- If `events` is empty or not set, all events are accepted
|
||||
|
||||
### Agent not responding
|
||||
|
||||
- Run the gateway in foreground to see logs: `hermes gateway run`
|
||||
- Check that the prompt template is rendering correctly
|
||||
- Verify the delivery target is configured and connected
|
||||
|
||||
### Duplicate responses
|
||||
|
||||
- The idempotency cache should prevent this — check that the webhook source is sending a delivery ID header (`X-GitHub-Delivery` or `X-Request-ID`)
|
||||
- Delivery IDs are cached for 1 hour
|
||||
|
||||
### `gh` CLI errors (GitHub comment delivery)
|
||||
|
||||
- Run `gh auth login` on the gateway host
|
||||
- Ensure the authenticated GitHub user has write access to the repository
|
||||
- Check that `gh` is installed and on the PATH
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables {#environment-variables}
|
||||
|
||||
| Variable | Description | Default |
|
||||
|----------|-------------|---------|
|
||||
| `WEBHOOK_ENABLED` | Enable the webhook platform adapter | `false` |
|
||||
| `WEBHOOK_PORT` | HTTP server port for receiving webhooks | `8644` |
|
||||
| `WEBHOOK_SECRET` | Global HMAC secret (used as fallback when routes don't specify their own) | _(none)_ |
|
||||
200
hermes_code/website/docs/user-guide/messaging/whatsapp.md
Normal file
200
hermes_code/website/docs/user-guide/messaging/whatsapp.md
Normal file
|
|
@ -0,0 +1,200 @@
|
|||
---
|
||||
sidebar_position: 5
|
||||
title: "WhatsApp"
|
||||
description: "Set up Hermes Agent as a WhatsApp bot via the built-in Baileys bridge"
|
||||
---
|
||||
|
||||
# WhatsApp Setup
|
||||
|
||||
Hermes connects to WhatsApp through a built-in bridge based on **Baileys**. This works by emulating a WhatsApp Web session — **not** through the official WhatsApp Business API. No Meta developer account or Business verification is required.
|
||||
|
||||
:::warning Unofficial API — Ban Risk
|
||||
WhatsApp does **not** officially support third-party bots outside the Business API. Using a third-party bridge carries a small risk of account restrictions. To minimize risk:
|
||||
- **Use a dedicated phone number** for the bot (not your personal number)
|
||||
- **Don't send bulk/spam messages** — keep usage conversational
|
||||
- **Don't automate outbound messaging** to people who haven't messaged first
|
||||
:::
|
||||
|
||||
:::warning WhatsApp Web Protocol Updates
|
||||
WhatsApp periodically updates their Web protocol, which can temporarily break compatibility
|
||||
with third-party bridges. When this happens, Hermes will update the bridge dependency. If the
|
||||
bot stops working after a WhatsApp update, pull the latest Hermes version and re-pair.
|
||||
:::
|
||||
|
||||
## Two Modes
|
||||
|
||||
| Mode | How it works | Best for |
|
||||
|------|-------------|----------|
|
||||
| **Separate bot number** (recommended) | Dedicate a phone number to the bot. People message that number directly. | Clean UX, multiple users, lower ban risk |
|
||||
| **Personal self-chat** | Use your own WhatsApp. You message yourself to talk to the agent. | Quick setup, single user, testing |
|
||||
|
||||
---
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- **Node.js v18+** and **npm** — the WhatsApp bridge runs as a Node.js process
|
||||
- **A phone with WhatsApp** installed (for scanning the QR code)
|
||||
|
||||
Unlike older browser-driven bridges, the current Baileys-based bridge does **not** require a local Chromium or Puppeteer dependency stack.
|
||||
|
||||
---
|
||||
|
||||
## Step 1: Run the Setup Wizard
|
||||
|
||||
```bash
|
||||
hermes whatsapp
|
||||
```
|
||||
|
||||
The wizard will:
|
||||
|
||||
1. Ask which mode you want (**bot** or **self-chat**)
|
||||
2. Install bridge dependencies if needed
|
||||
3. Display a **QR code** in your terminal
|
||||
4. Wait for you to scan it
|
||||
|
||||
**To scan the QR code:**
|
||||
|
||||
1. Open WhatsApp on your phone
|
||||
2. Go to **Settings → Linked Devices**
|
||||
3. Tap **Link a Device**
|
||||
4. Point your camera at the terminal QR code
|
||||
|
||||
Once paired, the wizard confirms the connection and exits. Your session is saved automatically.
|
||||
|
||||
:::tip
|
||||
If the QR code looks garbled, make sure your terminal is at least 60 columns wide and supports
|
||||
Unicode. You can also try a different terminal emulator.
|
||||
:::
|
||||
|
||||
---
|
||||
|
||||
## Step 2: Getting a Second Phone Number (Bot Mode)
|
||||
|
||||
For bot mode, you need a phone number that isn't already registered with WhatsApp. Three options:
|
||||
|
||||
| Option | Cost | Notes |
|
||||
|--------|------|-------|
|
||||
| **Google Voice** | Free | US only. Get a number at [voice.google.com](https://voice.google.com). Verify WhatsApp via SMS through the Google Voice app. |
|
||||
| **Prepaid SIM** | $5–15 one-time | Any carrier. Activate, verify WhatsApp, then the SIM can sit in a drawer. Number must stay active (make a call every 90 days). |
|
||||
| **VoIP services** | Free–$5/month | TextNow, TextFree, or similar. Some VoIP numbers are blocked by WhatsApp — try a few if the first doesn't work. |
|
||||
|
||||
After getting the number:
|
||||
|
||||
1. Install WhatsApp on a phone (or use WhatsApp Business app with dual-SIM)
|
||||
2. Register the new number with WhatsApp
|
||||
3. Run `hermes whatsapp` and scan the QR code from that WhatsApp account
|
||||
|
||||
---
|
||||
|
||||
## Step 3: Configure Hermes
|
||||
|
||||
Add the following to your `~/.hermes/.env` file:
|
||||
|
||||
```bash
|
||||
# Required
|
||||
WHATSAPP_ENABLED=true
|
||||
WHATSAPP_MODE=bot # "bot" or "self-chat"
|
||||
WHATSAPP_ALLOWED_USERS=15551234567 # Comma-separated phone numbers (with country code, no +)
|
||||
```
|
||||
|
||||
Optional behavior settings in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
unauthorized_dm_behavior: pair
|
||||
|
||||
whatsapp:
|
||||
unauthorized_dm_behavior: ignore
|
||||
```
|
||||
|
||||
- `unauthorized_dm_behavior: pair` is the global default. Unknown DM senders get a pairing code.
|
||||
- `whatsapp.unauthorized_dm_behavior: ignore` makes WhatsApp stay silent for unauthorized DMs, which is usually the better choice for a private number.
|
||||
|
||||
Then start the gateway:
|
||||
|
||||
```bash
|
||||
hermes gateway # Foreground
|
||||
hermes gateway install # Install as a user service
|
||||
sudo hermes gateway install --system # Linux only: boot-time system service
|
||||
```
|
||||
|
||||
The gateway starts the WhatsApp bridge automatically using the saved session.
|
||||
|
||||
---
|
||||
|
||||
## Session Persistence
|
||||
|
||||
The Baileys bridge saves its session under `~/.hermes/whatsapp/session`. This means:
|
||||
|
||||
- **Sessions survive restarts** — you don't need to re-scan the QR code every time
|
||||
- The session data includes encryption keys and device credentials
|
||||
- **Do not share or commit this session directory** — it grants full access to the WhatsApp account
|
||||
|
||||
---
|
||||
|
||||
## Re-pairing
|
||||
|
||||
If the session breaks (phone reset, WhatsApp update, manually unlinked), you'll see connection
|
||||
errors in the gateway logs. To fix it:
|
||||
|
||||
```bash
|
||||
hermes whatsapp
|
||||
```
|
||||
|
||||
This generates a fresh QR code. Scan it again and the session is re-established. The gateway
|
||||
handles **temporary** disconnections (network blips, phone going offline briefly) automatically
|
||||
with reconnection logic.
|
||||
|
||||
---
|
||||
|
||||
## Voice Messages
|
||||
|
||||
Hermes supports voice on WhatsApp:
|
||||
|
||||
- **Incoming:** Voice messages (`.ogg` opus) are automatically transcribed using the configured STT provider: local `faster-whisper`, Groq Whisper (`GROQ_API_KEY`), or OpenAI Whisper (`VOICE_TOOLS_OPENAI_KEY`)
|
||||
- **Outgoing:** TTS responses are sent as MP3 audio file attachments
|
||||
- Agent responses are prefixed with "⚕ **Hermes Agent**" by default. You can customize or disable this in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
# ~/.hermes/config.yaml
|
||||
whatsapp:
|
||||
reply_prefix: "" # Empty string disables the header
|
||||
# reply_prefix: "🤖 *My Bot*\n──────\n" # Custom prefix (supports \n for newlines)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Problem | Solution |
|
||||
|---------|----------|
|
||||
| **QR code not scanning** | Ensure terminal is wide enough (60+ columns). Try a different terminal. Make sure you're scanning from the correct WhatsApp account (bot number, not personal). |
|
||||
| **QR code expires** | QR codes refresh every ~20 seconds. If it times out, restart `hermes whatsapp`. |
|
||||
| **Session not persisting** | Check that `~/.hermes/whatsapp/session` exists and is writable. If containerized, mount it as a persistent volume. |
|
||||
| **Logged out unexpectedly** | WhatsApp unlinks devices after long inactivity. Keep the phone on and connected to the network, then re-pair with `hermes whatsapp` if needed. |
|
||||
| **Bridge crashes or reconnect loops** | Restart the gateway, update Hermes, and re-pair if the session was invalidated by a WhatsApp protocol change. |
|
||||
| **Bot stops working after WhatsApp update** | Update Hermes to get the latest bridge version, then re-pair. |
|
||||
| **Messages not being received** | Verify `WHATSAPP_ALLOWED_USERS` includes the sender's number (with country code, no `+` or spaces). |
|
||||
| **Bot replies to strangers with a pairing code** | Set `whatsapp.unauthorized_dm_behavior: ignore` in `~/.hermes/config.yaml` if you want unauthorized DMs to be silently ignored instead. |
|
||||
|
||||
---
|
||||
|
||||
## Security
|
||||
|
||||
:::warning
|
||||
**Always set `WHATSAPP_ALLOWED_USERS`** with phone numbers (including country code, without the `+`)
|
||||
of authorized users. Without this setting, the gateway will **deny all incoming messages** as a
|
||||
safety measure.
|
||||
:::
|
||||
|
||||
By default, unauthorized DMs still receive a pairing code reply. If you want a private WhatsApp number to stay completely silent to strangers, set:
|
||||
|
||||
```yaml
|
||||
whatsapp:
|
||||
unauthorized_dm_behavior: ignore
|
||||
```
|
||||
|
||||
- The `~/.hermes/whatsapp/session` directory contains full session credentials — protect it like a password
|
||||
- Set file permissions: `chmod 700 ~/.hermes/whatsapp/session`
|
||||
- Use a **dedicated phone number** for the bot to isolate risk from your personal account
|
||||
- If you suspect compromise, unlink the device from WhatsApp → Settings → Linked Devices
|
||||
- Phone numbers in logs are partially redacted, but review your log retention policy
|
||||
450
hermes_code/website/docs/user-guide/security.md
Normal file
450
hermes_code/website/docs/user-guide/security.md
Normal file
|
|
@ -0,0 +1,450 @@
|
|||
---
|
||||
sidebar_position: 8
|
||||
title: "Security"
|
||||
description: "Security model, dangerous command approval, user authorization, container isolation, and production deployment best practices"
|
||||
---
|
||||
|
||||
# Security
|
||||
|
||||
Hermes Agent is designed with a defense-in-depth security model. This page covers every security boundary — from command approval to container isolation to user authorization on messaging platforms.
|
||||
|
||||
## Overview
|
||||
|
||||
The security model has five layers:
|
||||
|
||||
1. **User authorization** — who can talk to the agent (allowlists, DM pairing)
|
||||
2. **Dangerous command approval** — human-in-the-loop for destructive operations
|
||||
3. **Container isolation** — Docker/Singularity/Modal sandboxing with hardened settings
|
||||
4. **MCP credential filtering** — environment variable isolation for MCP subprocesses
|
||||
5. **Context file scanning** — prompt injection detection in project files
|
||||
|
||||
## Dangerous Command Approval
|
||||
|
||||
Before executing any command, Hermes checks it against a curated list of dangerous patterns. If a match is found, the user must explicitly approve it.
|
||||
|
||||
### What Triggers Approval
|
||||
|
||||
The following patterns trigger approval prompts (defined in `tools/approval.py`):
|
||||
|
||||
| Pattern | Description |
|
||||
|---------|-------------|
|
||||
| `rm -r` / `rm --recursive` | Recursive delete |
|
||||
| `rm ... /` | Delete in root path |
|
||||
| `chmod 777` | World-writable permissions |
|
||||
| `mkfs` | Format filesystem |
|
||||
| `dd if=` | Disk copy |
|
||||
| `DROP TABLE/DATABASE` | SQL DROP |
|
||||
| `DELETE FROM` (without WHERE) | SQL DELETE without WHERE |
|
||||
| `TRUNCATE TABLE` | SQL TRUNCATE |
|
||||
| `> /etc/` | Overwrite system config |
|
||||
| `systemctl stop/disable/mask` | Stop/disable system services |
|
||||
| `kill -9 -1` | Kill all processes |
|
||||
| `curl ... \| sh` | Pipe remote content to shell |
|
||||
| `bash -c`, `python -e` | Shell/script execution via flags |
|
||||
| `find -exec rm`, `find -delete` | Find with destructive actions |
|
||||
| Fork bomb patterns | Fork bombs |
|
||||
|
||||
:::info
|
||||
**Container bypass**: When running in `docker`, `singularity`, `modal`, or `daytona` backends, dangerous command checks are **skipped** because the container itself is the security boundary. Destructive commands inside a container can't harm the host.
|
||||
:::
|
||||
|
||||
### Approval Flow (CLI)
|
||||
|
||||
In the interactive CLI, dangerous commands show an inline approval prompt:
|
||||
|
||||
```
|
||||
⚠️ DANGEROUS COMMAND: recursive delete
|
||||
rm -rf /tmp/old-project
|
||||
|
||||
[o]nce | [s]ession | [a]lways | [d]eny
|
||||
|
||||
Choice [o/s/a/D]:
|
||||
```
|
||||
|
||||
The four options:
|
||||
|
||||
- **once** — allow this single execution
|
||||
- **session** — allow this pattern for the rest of the session
|
||||
- **always** — add to permanent allowlist (saved to `config.yaml`)
|
||||
- **deny** (default) — block the command
|
||||
|
||||
### Approval Flow (Gateway/Messaging)
|
||||
|
||||
On messaging platforms, the agent sends the dangerous command details to the chat and waits for the user to reply:
|
||||
|
||||
- Reply **yes**, **y**, **approve**, **ok**, or **go** to approve
|
||||
- Reply **no**, **n**, **deny**, or **cancel** to deny
|
||||
|
||||
The `HERMES_EXEC_ASK=1` environment variable is automatically set when running the gateway.
|
||||
|
||||
### Permanent Allowlist
|
||||
|
||||
Commands approved with "always" are saved to `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
# Permanently allowed dangerous command patterns
|
||||
command_allowlist:
|
||||
- rm
|
||||
- systemctl
|
||||
```
|
||||
|
||||
These patterns are loaded at startup and silently approved in all future sessions.
|
||||
|
||||
:::tip
|
||||
Use `hermes config edit` to review or remove patterns from your permanent allowlist.
|
||||
:::
|
||||
|
||||
## User Authorization (Gateway)
|
||||
|
||||
When running the messaging gateway, Hermes controls who can interact with the bot through a layered authorization system.
|
||||
|
||||
### Authorization Check Order
|
||||
|
||||
The `_is_user_authorized()` method checks in this order:
|
||||
|
||||
1. **Per-platform allow-all flag** (e.g., `DISCORD_ALLOW_ALL_USERS=true`)
|
||||
2. **DM pairing approved list** (users approved via pairing codes)
|
||||
3. **Platform-specific allowlists** (e.g., `TELEGRAM_ALLOWED_USERS=12345,67890`)
|
||||
4. **Global allowlist** (`GATEWAY_ALLOWED_USERS=12345,67890`)
|
||||
5. **Global allow-all** (`GATEWAY_ALLOW_ALL_USERS=true`)
|
||||
6. **Default: deny**
|
||||
|
||||
### Platform Allowlists
|
||||
|
||||
Set allowed user IDs as comma-separated values in `~/.hermes/.env`:
|
||||
|
||||
```bash
|
||||
# Platform-specific allowlists
|
||||
TELEGRAM_ALLOWED_USERS=123456789,987654321
|
||||
DISCORD_ALLOWED_USERS=111222333444555666
|
||||
WHATSAPP_ALLOWED_USERS=15551234567
|
||||
SLACK_ALLOWED_USERS=U01ABC123
|
||||
|
||||
# Cross-platform allowlist (checked for all platforms)
|
||||
GATEWAY_ALLOWED_USERS=123456789
|
||||
|
||||
# Per-platform allow-all (use with caution)
|
||||
DISCORD_ALLOW_ALL_USERS=true
|
||||
|
||||
# Global allow-all (use with extreme caution)
|
||||
GATEWAY_ALLOW_ALL_USERS=true
|
||||
```
|
||||
|
||||
:::warning
|
||||
If **no allowlists are configured** and `GATEWAY_ALLOW_ALL_USERS` is not set, **all users are denied**. The gateway logs a warning at startup:
|
||||
|
||||
```
|
||||
No user allowlists configured. All unauthorized users will be denied.
|
||||
Set GATEWAY_ALLOW_ALL_USERS=true in ~/.hermes/.env to allow open access,
|
||||
or configure platform allowlists (e.g., TELEGRAM_ALLOWED_USERS=your_id).
|
||||
```
|
||||
:::
|
||||
|
||||
### DM Pairing System
|
||||
|
||||
For more flexible authorization, Hermes includes a code-based pairing system. Instead of requiring user IDs upfront, unknown users receive a one-time pairing code that the bot owner approves via the CLI.
|
||||
|
||||
**How it works:**
|
||||
|
||||
1. An unknown user sends a DM to the bot
|
||||
2. The bot replies with an 8-character pairing code
|
||||
3. The bot owner runs `hermes pairing approve <platform> <code>` on the CLI
|
||||
4. The user is permanently approved for that platform
|
||||
|
||||
Control how unauthorized direct messages are handled in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
unauthorized_dm_behavior: pair
|
||||
|
||||
whatsapp:
|
||||
unauthorized_dm_behavior: ignore
|
||||
```
|
||||
|
||||
- `pair` is the default. Unauthorized DMs get a pairing code reply.
|
||||
- `ignore` silently drops unauthorized DMs.
|
||||
- Platform sections override the global default, so you can keep pairing on Telegram while keeping WhatsApp silent.
|
||||
|
||||
**Security features** (based on OWASP + NIST SP 800-63-4 guidance):
|
||||
|
||||
| Feature | Details |
|
||||
|---------|---------|
|
||||
| Code format | 8-char from 32-char unambiguous alphabet (no 0/O/1/I) |
|
||||
| Randomness | Cryptographic (`secrets.choice()`) |
|
||||
| Code TTL | 1 hour expiry |
|
||||
| Rate limiting | 1 request per user per 10 minutes |
|
||||
| Pending limit | Max 3 pending codes per platform |
|
||||
| Lockout | 5 failed approval attempts → 1-hour lockout |
|
||||
| File security | `chmod 0600` on all pairing data files |
|
||||
| Logging | Codes are never logged to stdout |
|
||||
|
||||
**Pairing CLI commands:**
|
||||
|
||||
```bash
|
||||
# List pending and approved users
|
||||
hermes pairing list
|
||||
|
||||
# Approve a pairing code
|
||||
hermes pairing approve telegram ABC12DEF
|
||||
|
||||
# Revoke a user's access
|
||||
hermes pairing revoke telegram 123456789
|
||||
|
||||
# Clear all pending codes
|
||||
hermes pairing clear-pending
|
||||
```
|
||||
|
||||
**Storage:** Pairing data is stored in `~/.hermes/pairing/` with per-platform JSON files:
|
||||
- `{platform}-pending.json` — pending pairing requests
|
||||
- `{platform}-approved.json` — approved users
|
||||
- `_rate_limits.json` — rate limit and lockout tracking
|
||||
|
||||
## Container Isolation
|
||||
|
||||
When using the `docker` terminal backend, Hermes applies strict security hardening to every container.
|
||||
|
||||
### Docker Security Flags
|
||||
|
||||
Every container runs with these flags (defined in `tools/environments/docker.py`):
|
||||
|
||||
```python
|
||||
_SECURITY_ARGS = [
|
||||
"--cap-drop", "ALL", # Drop ALL Linux capabilities
|
||||
"--security-opt", "no-new-privileges", # Block privilege escalation
|
||||
"--pids-limit", "256", # Limit process count
|
||||
"--tmpfs", "/tmp:rw,nosuid,size=512m", # Size-limited /tmp
|
||||
"--tmpfs", "/var/tmp:rw,noexec,nosuid,size=256m", # No-exec /var/tmp
|
||||
"--tmpfs", "/run:rw,noexec,nosuid,size=64m", # No-exec /run
|
||||
]
|
||||
```
|
||||
|
||||
### Resource Limits
|
||||
|
||||
Container resources are configurable in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: docker
|
||||
docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
|
||||
docker_forward_env: [] # Explicit allowlist only; empty keeps secrets out of the container
|
||||
container_cpu: 1 # CPU cores
|
||||
container_memory: 5120 # MB (default 5GB)
|
||||
container_disk: 51200 # MB (default 50GB, requires overlay2 on XFS)
|
||||
container_persistent: true # Persist filesystem across sessions
|
||||
```
|
||||
|
||||
### Filesystem Persistence
|
||||
|
||||
- **Persistent mode** (`container_persistent: true`): Bind-mounts `/workspace` and `/root` from `~/.hermes/sandboxes/docker/<task_id>/`
|
||||
- **Ephemeral mode** (`container_persistent: false`): Uses tmpfs for workspace — everything is lost on cleanup
|
||||
|
||||
:::tip
|
||||
For production gateway deployments, use `docker`, `modal`, or `daytona` backend to isolate agent commands from your host system. This eliminates the need for dangerous command approval entirely.
|
||||
:::
|
||||
|
||||
:::warning
|
||||
If you add names to `terminal.docker_forward_env`, those variables are intentionally injected into the container for terminal commands. This is useful for task-specific credentials like `GITHUB_TOKEN`, but it also means code running in the container can read and exfiltrate them.
|
||||
:::
|
||||
|
||||
## Terminal Backend Security Comparison
|
||||
|
||||
| Backend | Isolation | Dangerous Cmd Check | Best For |
|
||||
|---------|-----------|-------------------|----------|
|
||||
| **local** | None — runs on host | ✅ Yes | Development, trusted users |
|
||||
| **ssh** | Remote machine | ✅ Yes | Running on a separate server |
|
||||
| **docker** | Container | ❌ Skipped (container is boundary) | Production gateway |
|
||||
| **singularity** | Container | ❌ Skipped | HPC environments |
|
||||
| **modal** | Cloud sandbox | ❌ Skipped | Scalable cloud isolation |
|
||||
| **daytona** | Cloud sandbox | ❌ Skipped | Persistent cloud workspaces |
|
||||
|
||||
## Environment Variable Passthrough {#environment-variable-passthrough}
|
||||
|
||||
Both `execute_code` and `terminal` strip sensitive environment variables from child processes to prevent credential exfiltration by LLM-generated code. However, skills that declare `required_environment_variables` legitimately need access to those vars.
|
||||
|
||||
### How It Works
|
||||
|
||||
Two mechanisms allow specific variables through the sandbox filters:
|
||||
|
||||
**1. Skill-scoped passthrough (automatic)**
|
||||
|
||||
When a skill is loaded (via `skill_view` or the `/skill` command) and declares `required_environment_variables`, any of those vars that are actually set in the environment are automatically registered as passthrough. Missing vars (still in setup-needed state) are **not** registered.
|
||||
|
||||
```yaml
|
||||
# In a skill's SKILL.md frontmatter
|
||||
required_environment_variables:
|
||||
- name: TENOR_API_KEY
|
||||
prompt: Tenor API key
|
||||
help: Get a key from https://developers.google.com/tenor
|
||||
```
|
||||
|
||||
After loading this skill, `TENOR_API_KEY` passes through to both `execute_code` and `terminal` subprocesses — no manual configuration needed.
|
||||
|
||||
**2. Config-based passthrough (manual)**
|
||||
|
||||
For env vars not declared by any skill, add them to `terminal.env_passthrough` in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
env_passthrough:
|
||||
- MY_CUSTOM_KEY
|
||||
- ANOTHER_TOKEN
|
||||
```
|
||||
|
||||
### What Each Sandbox Filters
|
||||
|
||||
| Sandbox | Default Filter | Passthrough Override |
|
||||
|---------|---------------|---------------------|
|
||||
| **execute_code** | Blocks vars containing `KEY`, `TOKEN`, `SECRET`, `PASSWORD`, `CREDENTIAL`, `PASSWD`, `AUTH` in name; only allows safe-prefix vars through | ✅ Passthrough vars bypass both checks |
|
||||
| **terminal** (local) | Blocks explicit Hermes infrastructure vars (provider keys, gateway tokens, tool API keys) | ✅ Passthrough vars bypass the blocklist |
|
||||
| **MCP** | Blocks everything except safe system vars + explicitly configured `env` | ❌ Not affected by passthrough (use MCP `env` config instead) |
|
||||
|
||||
### Security Considerations
|
||||
|
||||
- The passthrough only affects vars you or your skills explicitly declare — the default security posture is unchanged for arbitrary LLM-generated code
|
||||
- Skills Guard scans skill content for suspicious env access patterns before installation
|
||||
- Missing/unset vars are never registered (you can't leak what doesn't exist)
|
||||
- Hermes infrastructure secrets (provider API keys, gateway tokens) should never be added to `env_passthrough` — they have dedicated mechanisms
|
||||
|
||||
## MCP Credential Handling
|
||||
|
||||
MCP (Model Context Protocol) server subprocesses receive a **filtered environment** to prevent accidental credential leakage.
|
||||
|
||||
### Safe Environment Variables
|
||||
|
||||
Only these variables are passed through from the host to MCP stdio subprocesses:
|
||||
|
||||
```
|
||||
PATH, HOME, USER, LANG, LC_ALL, TERM, SHELL, TMPDIR
|
||||
```
|
||||
|
||||
Plus any `XDG_*` variables. All other environment variables (API keys, tokens, secrets) are **stripped**.
|
||||
|
||||
Variables explicitly defined in the MCP server's `env` config are passed through:
|
||||
|
||||
```yaml
|
||||
mcp_servers:
|
||||
github:
|
||||
command: "npx"
|
||||
args: ["-y", "@modelcontextprotocol/server-github"]
|
||||
env:
|
||||
GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..." # Only this is passed
|
||||
```
|
||||
|
||||
### Credential Redaction
|
||||
|
||||
Error messages from MCP tools are sanitized before being returned to the LLM. The following patterns are replaced with `[REDACTED]`:
|
||||
|
||||
- GitHub PATs (`ghp_...`)
|
||||
- OpenAI-style keys (`sk-...`)
|
||||
- Bearer tokens
|
||||
- `token=`, `key=`, `API_KEY=`, `password=`, `secret=` parameters
|
||||
|
||||
### Website Access Policy
|
||||
|
||||
You can restrict which websites the agent can access through its web and browser tools. This is useful for preventing the agent from accessing internal services, admin panels, or other sensitive URLs.
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
security:
|
||||
website_blocklist:
|
||||
enabled: true
|
||||
domains:
|
||||
- "*.internal.company.com"
|
||||
- "admin.example.com"
|
||||
shared_files:
|
||||
- "/etc/hermes/blocked-sites.txt"
|
||||
```
|
||||
|
||||
When a blocked URL is requested, the tool returns an error explaining the domain is blocked by policy. The blocklist is enforced across `web_search`, `web_extract`, `browser_navigate`, and all URL-capable tools.
|
||||
|
||||
See [Website Blocklist](/docs/user-guide/configuration#website-blocklist) in the configuration guide for full details.
|
||||
|
||||
### SSRF Protection
|
||||
|
||||
All URL-capable tools (web search, web extract, vision, browser) validate URLs before fetching them to prevent Server-Side Request Forgery (SSRF) attacks. Blocked addresses include:
|
||||
|
||||
- **Private networks** (RFC 1918): `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`
|
||||
- **Loopback**: `127.0.0.0/8`, `::1`
|
||||
- **Link-local**: `169.254.0.0/16` (includes cloud metadata at `169.254.169.254`)
|
||||
- **CGNAT / shared address space** (RFC 6598): `100.64.0.0/10` (Tailscale, WireGuard VPNs)
|
||||
- **Cloud metadata hostnames**: `metadata.google.internal`, `metadata.goog`
|
||||
- **Reserved, multicast, and unspecified addresses**
|
||||
|
||||
SSRF protection is always active and cannot be disabled. DNS failures are treated as blocked (fail-closed). Redirect chains are re-validated at each hop to prevent redirect-based bypasses.
|
||||
|
||||
### Tirith Pre-Exec Security Scanning
|
||||
|
||||
Hermes integrates [tirith](https://github.com/sheeki03/tirith) for content-level command scanning before execution. Tirith detects threats that pattern matching alone misses:
|
||||
|
||||
- Homograph URL spoofing (internationalized domain attacks)
|
||||
- Pipe-to-interpreter patterns (`curl | bash`, `wget | sh`)
|
||||
- Terminal injection attacks
|
||||
|
||||
Tirith auto-installs from GitHub releases on first use with SHA-256 checksum verification (and cosign provenance verification if cosign is available).
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
security:
|
||||
tirith_enabled: true # Enable/disable tirith scanning (default: true)
|
||||
tirith_path: "tirith" # Path to tirith binary (default: PATH lookup)
|
||||
tirith_timeout: 5 # Subprocess timeout in seconds
|
||||
tirith_fail_open: true # Allow execution when tirith is unavailable (default: true)
|
||||
```
|
||||
|
||||
When `tirith_fail_open` is `true` (default), commands proceed if tirith is not installed or times out. Set to `false` in high-security environments to block commands when tirith is unavailable.
|
||||
|
||||
Tirith's verdict integrates with the approval flow: safe commands pass through, suspicious commands trigger user approval, and dangerous commands are blocked.
|
||||
|
||||
### Context File Injection Protection
|
||||
|
||||
Context files (AGENTS.md, .cursorrules, SOUL.md) are scanned for prompt injection before being included in the system prompt. The scanner checks for:
|
||||
|
||||
- Instructions to ignore/disregard prior instructions
|
||||
- Hidden HTML comments with suspicious keywords
|
||||
- Attempts to read secrets (`.env`, `credentials`, `.netrc`)
|
||||
- Credential exfiltration via `curl`
|
||||
- Invisible Unicode characters (zero-width spaces, bidirectional overrides)
|
||||
|
||||
Blocked files show a warning:
|
||||
|
||||
```
|
||||
[BLOCKED: AGENTS.md contained potential prompt injection (prompt_injection). Content not loaded.]
|
||||
```
|
||||
|
||||
## Best Practices for Production Deployment
|
||||
|
||||
### Gateway Deployment Checklist
|
||||
|
||||
1. **Set explicit allowlists** — never use `GATEWAY_ALLOW_ALL_USERS=true` in production
|
||||
2. **Use container backend** — set `terminal.backend: docker` in config.yaml
|
||||
3. **Restrict resource limits** — set appropriate CPU, memory, and disk limits
|
||||
4. **Store secrets securely** — keep API keys in `~/.hermes/.env` with proper file permissions
|
||||
5. **Enable DM pairing** — use pairing codes instead of hardcoding user IDs when possible
|
||||
6. **Review command allowlist** — periodically audit `command_allowlist` in config.yaml
|
||||
7. **Set `MESSAGING_CWD`** — don't let the agent operate from sensitive directories
|
||||
8. **Run as non-root** — never run the gateway as root
|
||||
9. **Monitor logs** — check `~/.hermes/logs/` for unauthorized access attempts
|
||||
10. **Keep updated** — run `hermes update` regularly for security patches
|
||||
|
||||
### Securing API Keys
|
||||
|
||||
```bash
|
||||
# Set proper permissions on the .env file
|
||||
chmod 600 ~/.hermes/.env
|
||||
|
||||
# Keep separate keys for different services
|
||||
# Never commit .env files to version control
|
||||
```
|
||||
|
||||
### Network Isolation
|
||||
|
||||
For maximum security, run the gateway on a separate machine or VM:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
backend: ssh
|
||||
ssh_host: "agent-worker.local"
|
||||
ssh_user: "hermes"
|
||||
ssh_key: "~/.ssh/hermes_agent_key"
|
||||
```
|
||||
|
||||
This keeps the gateway's messaging connections separate from the agent's command execution.
|
||||
390
hermes_code/website/docs/user-guide/sessions.md
Normal file
390
hermes_code/website/docs/user-guide/sessions.md
Normal file
|
|
@ -0,0 +1,390 @@
|
|||
---
|
||||
sidebar_position: 7
|
||||
title: "Sessions"
|
||||
description: "Session persistence, resume, search, management, and per-platform session tracking"
|
||||
---
|
||||
|
||||
# Sessions
|
||||
|
||||
Hermes Agent automatically saves every conversation as a session. Sessions enable conversation resume, cross-session search, and full conversation history management.
|
||||
|
||||
## How Sessions Work
|
||||
|
||||
Every conversation — whether from the CLI, Telegram, Discord, WhatsApp, or Slack — is stored as a session with full message history. Sessions are tracked in two complementary systems:
|
||||
|
||||
1. **SQLite database** (`~/.hermes/state.db`) — structured session metadata with FTS5 full-text search
|
||||
2. **JSONL transcripts** (`~/.hermes/sessions/`) — raw conversation transcripts including tool calls (gateway)
|
||||
|
||||
The SQLite database stores:
|
||||
- Session ID, source platform, user ID
|
||||
- **Session title** (unique, human-readable name)
|
||||
- Model name and configuration
|
||||
- System prompt snapshot
|
||||
- Full message history (role, content, tool calls, tool results)
|
||||
- Token counts (input/output)
|
||||
- Timestamps (started_at, ended_at)
|
||||
- Parent session ID (for compression-triggered session splitting)
|
||||
|
||||
### Session Sources
|
||||
|
||||
Each session is tagged with its source platform:
|
||||
|
||||
| Source | Description |
|
||||
|--------|-------------|
|
||||
| `cli` | Interactive CLI (`hermes` or `hermes chat`) |
|
||||
| `telegram` | Telegram messenger |
|
||||
| `discord` | Discord server/DM |
|
||||
| `whatsapp` | WhatsApp messenger |
|
||||
| `slack` | Slack workspace |
|
||||
|
||||
## CLI Session Resume
|
||||
|
||||
Resume previous conversations from the CLI using `--continue` or `--resume`:
|
||||
|
||||
### Continue Last Session
|
||||
|
||||
```bash
|
||||
# Resume the most recent CLI session
|
||||
hermes --continue
|
||||
hermes -c
|
||||
|
||||
# Or with the chat subcommand
|
||||
hermes chat --continue
|
||||
hermes chat -c
|
||||
```
|
||||
|
||||
This looks up the most recent `cli` session from the SQLite database and loads its full conversation history.
|
||||
|
||||
### Resume by Name
|
||||
|
||||
If you've given a session a title (see [Session Naming](#session-naming) below), you can resume it by name:
|
||||
|
||||
```bash
|
||||
# Resume a named session
|
||||
hermes -c "my project"
|
||||
|
||||
# If there are lineage variants (my project, my project #2, my project #3),
|
||||
# this automatically resumes the most recent one
|
||||
hermes -c "my project" # → resumes "my project #3"
|
||||
```
|
||||
|
||||
### Resume Specific Session
|
||||
|
||||
```bash
|
||||
# Resume a specific session by ID
|
||||
hermes --resume 20250305_091523_a1b2c3d4
|
||||
hermes -r 20250305_091523_a1b2c3d4
|
||||
|
||||
# Resume by title
|
||||
hermes --resume "refactoring auth"
|
||||
|
||||
# Or with the chat subcommand
|
||||
hermes chat --resume 20250305_091523_a1b2c3d4
|
||||
```
|
||||
|
||||
Session IDs are shown when you exit a CLI session, and can be found with `hermes sessions list`.
|
||||
|
||||
### Conversation Recap on Resume
|
||||
|
||||
When you resume a session, Hermes displays a compact recap of the previous conversation in a styled panel before the input prompt:
|
||||
|
||||
<img className="docs-terminal-figure" src="/img/docs/session-recap.svg" alt="Stylized preview of the Previous Conversation recap panel shown when resuming a Hermes session." />
|
||||
<p className="docs-figure-caption">Resume mode shows a compact recap panel with recent user and assistant turns before returning you to the live prompt.</p>
|
||||
|
||||
The recap:
|
||||
- Shows **user messages** (gold `●`) and **assistant responses** (green `◆`)
|
||||
- **Truncates** long messages (300 chars for user, 200 chars / 3 lines for assistant)
|
||||
- **Collapses tool calls** to a count with tool names (e.g., `[3 tool calls: terminal, web_search]`)
|
||||
- **Hides** system messages, tool results, and internal reasoning
|
||||
- **Caps** at the last 10 exchanges with a "... N earlier messages ..." indicator
|
||||
- Uses **dim styling** to distinguish from the active conversation
|
||||
|
||||
To disable the recap and keep the minimal one-liner behavior, set in `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
display:
|
||||
resume_display: minimal # default: full
|
||||
```
|
||||
|
||||
:::tip
|
||||
Session IDs follow the format `YYYYMMDD_HHMMSS_<8-char-hex>`, e.g. `20250305_091523_a1b2c3d4`. You can resume by ID or by title — both work with `-c` and `-r`.
|
||||
:::
|
||||
|
||||
## Session Naming
|
||||
|
||||
Give sessions human-readable titles so you can find and resume them easily.
|
||||
|
||||
### Auto-Generated Titles
|
||||
|
||||
Hermes automatically generates a short descriptive title (3–7 words) for each session after the first exchange. This runs in a background thread using a fast auxiliary model, so it adds no latency. You'll see auto-generated titles when browsing sessions with `hermes sessions list` or `hermes sessions browse`.
|
||||
|
||||
Auto-titling only fires once per session and is skipped if you've already set a title manually.
|
||||
|
||||
### Setting a Title Manually
|
||||
|
||||
Use the `/title` slash command inside any chat session (CLI or gateway):
|
||||
|
||||
```
|
||||
/title my research project
|
||||
```
|
||||
|
||||
The title is applied immediately. If the session hasn't been created in the database yet (e.g., you run `/title` before sending your first message), it's queued and applied once the session starts.
|
||||
|
||||
You can also rename existing sessions from the command line:
|
||||
|
||||
```bash
|
||||
hermes sessions rename 20250305_091523_a1b2c3d4 "refactoring auth module"
|
||||
```
|
||||
|
||||
### Title Rules
|
||||
|
||||
- **Unique** — no two sessions can share the same title
|
||||
- **Max 100 characters** — keeps listing output clean
|
||||
- **Sanitized** — control characters, zero-width chars, and RTL overrides are stripped automatically
|
||||
- **Normal Unicode is fine** — emoji, CJK, accented characters all work
|
||||
|
||||
### Auto-Lineage on Compression
|
||||
|
||||
When a session's context is compressed (manually via `/compress` or automatically), Hermes creates a new continuation session. If the original had a title, the new session automatically gets a numbered title:
|
||||
|
||||
```
|
||||
"my project" → "my project #2" → "my project #3"
|
||||
```
|
||||
|
||||
When you resume by name (`hermes -c "my project"`), it automatically picks the most recent session in the lineage.
|
||||
|
||||
### /title in Messaging Platforms
|
||||
|
||||
The `/title` command works in all gateway platforms (Telegram, Discord, Slack, WhatsApp):
|
||||
|
||||
- `/title My Research` — set the session title
|
||||
- `/title` — show the current title
|
||||
|
||||
## Session Management Commands
|
||||
|
||||
Hermes provides a full set of session management commands via `hermes sessions`:
|
||||
|
||||
### List Sessions
|
||||
|
||||
```bash
|
||||
# List recent sessions (default: last 20)
|
||||
hermes sessions list
|
||||
|
||||
# Filter by platform
|
||||
hermes sessions list --source telegram
|
||||
|
||||
# Show more sessions
|
||||
hermes sessions list --limit 50
|
||||
```
|
||||
|
||||
When sessions have titles, the output shows titles, previews, and relative timestamps:
|
||||
|
||||
```
|
||||
Title Preview Last Active ID
|
||||
────────────────────────────────────────────────────────────────────────────────────────────────
|
||||
refactoring auth Help me refactor the auth module please 2h ago 20250305_091523_a
|
||||
my project #3 Can you check the test failures? yesterday 20250304_143022_e
|
||||
— What's the weather in Las Vegas? 3d ago 20250303_101500_f
|
||||
```
|
||||
|
||||
When no sessions have titles, a simpler format is used:
|
||||
|
||||
```
|
||||
Preview Last Active Src ID
|
||||
──────────────────────────────────────────────────────────────────────────────────────
|
||||
Help me refactor the auth module please 2h ago cli 20250305_091523_a
|
||||
What's the weather in Las Vegas? 3d ago tele 20250303_101500_f
|
||||
```
|
||||
|
||||
### Export Sessions
|
||||
|
||||
```bash
|
||||
# Export all sessions to a JSONL file
|
||||
hermes sessions export backup.jsonl
|
||||
|
||||
# Export sessions from a specific platform
|
||||
hermes sessions export telegram-history.jsonl --source telegram
|
||||
|
||||
# Export a single session
|
||||
hermes sessions export session.jsonl --session-id 20250305_091523_a1b2c3d4
|
||||
```
|
||||
|
||||
Exported files contain one JSON object per line with full session metadata and all messages.
|
||||
|
||||
### Delete a Session
|
||||
|
||||
```bash
|
||||
# Delete a specific session (with confirmation)
|
||||
hermes sessions delete 20250305_091523_a1b2c3d4
|
||||
|
||||
# Delete without confirmation
|
||||
hermes sessions delete 20250305_091523_a1b2c3d4 --yes
|
||||
```
|
||||
|
||||
### Rename a Session
|
||||
|
||||
```bash
|
||||
# Set or change a session's title
|
||||
hermes sessions rename 20250305_091523_a1b2c3d4 "debugging auth flow"
|
||||
|
||||
# Multi-word titles don't need quotes in the CLI
|
||||
hermes sessions rename 20250305_091523_a1b2c3d4 debugging auth flow
|
||||
```
|
||||
|
||||
If the title is already in use by another session, an error is shown.
|
||||
|
||||
### Prune Old Sessions
|
||||
|
||||
```bash
|
||||
# Delete ended sessions older than 90 days (default)
|
||||
hermes sessions prune
|
||||
|
||||
# Custom age threshold
|
||||
hermes sessions prune --older-than 30
|
||||
|
||||
# Only prune sessions from a specific platform
|
||||
hermes sessions prune --source telegram --older-than 60
|
||||
|
||||
# Skip confirmation
|
||||
hermes sessions prune --older-than 30 --yes
|
||||
```
|
||||
|
||||
:::info
|
||||
Pruning only deletes **ended** sessions (sessions that have been explicitly ended or auto-reset). Active sessions are never pruned.
|
||||
:::
|
||||
|
||||
### Session Statistics
|
||||
|
||||
```bash
|
||||
hermes sessions stats
|
||||
```
|
||||
|
||||
Output:
|
||||
|
||||
```
|
||||
Total sessions: 142
|
||||
Total messages: 3847
|
||||
cli: 89 sessions
|
||||
telegram: 38 sessions
|
||||
discord: 15 sessions
|
||||
Database size: 12.4 MB
|
||||
```
|
||||
|
||||
For deeper analytics — token usage, cost estimates, tool breakdown, and activity patterns — use [`hermes insights`](/docs/reference/cli-commands#hermes-insights).
|
||||
|
||||
## Session Search Tool
|
||||
|
||||
The agent has a built-in `session_search` tool that performs full-text search across all past conversations using SQLite's FTS5 engine.
|
||||
|
||||
### How It Works
|
||||
|
||||
1. FTS5 searches matching messages ranked by relevance
|
||||
2. Groups results by session, takes the top N unique sessions (default 3)
|
||||
3. Loads each session's conversation, truncates to ~100K chars centered on matches
|
||||
4. Sends to a fast summarization model for focused summaries
|
||||
5. Returns per-session summaries with metadata and surrounding context
|
||||
|
||||
### FTS5 Query Syntax
|
||||
|
||||
The search supports standard FTS5 query syntax:
|
||||
|
||||
- Simple keywords: `docker deployment`
|
||||
- Phrases: `"exact phrase"`
|
||||
- Boolean: `docker OR kubernetes`, `python NOT java`
|
||||
- Prefix: `deploy*`
|
||||
|
||||
### When It's Used
|
||||
|
||||
The agent is prompted to use session search automatically:
|
||||
|
||||
> *"When the user references something from a past conversation or you suspect relevant prior context exists, use session_search to recall it before asking them to repeat themselves."*
|
||||
|
||||
## Per-Platform Session Tracking
|
||||
|
||||
### Gateway Sessions
|
||||
|
||||
On messaging platforms, sessions are keyed by a deterministic session key built from the message source:
|
||||
|
||||
| Chat Type | Default Key Format | Behavior |
|
||||
|-----------|--------------------|----------|
|
||||
| Telegram DM | `agent:main:telegram:dm:<chat_id>` | One session per DM chat |
|
||||
| Discord DM | `agent:main:discord:dm:<chat_id>` | One session per DM chat |
|
||||
| WhatsApp DM | `agent:main:whatsapp:dm:<chat_id>` | One session per DM chat |
|
||||
| Group chat | `agent:main:<platform>:group:<chat_id>:<user_id>` | Per-user inside the group when the platform exposes a user ID |
|
||||
| Group thread/topic | `agent:main:<platform>:group:<chat_id>:<thread_id>:<user_id>` | Per-user inside that thread/topic |
|
||||
| Channel | `agent:main:<platform>:channel:<chat_id>:<user_id>` | Per-user inside the channel when the platform exposes a user ID |
|
||||
|
||||
When Hermes cannot get a participant identifier for a shared chat, it falls back to one shared session for that room.
|
||||
|
||||
### Shared vs Isolated Group Sessions
|
||||
|
||||
By default, Hermes uses `group_sessions_per_user: true` in `config.yaml`. That means:
|
||||
|
||||
- Alice and Bob can both talk to Hermes in the same Discord channel without sharing transcript history
|
||||
- one user's long tool-heavy task does not pollute another user's context window
|
||||
- interrupt handling also stays per-user because the running-agent key matches the isolated session key
|
||||
|
||||
If you want one shared "room brain" instead, set:
|
||||
|
||||
```yaml
|
||||
group_sessions_per_user: false
|
||||
```
|
||||
|
||||
That reverts groups/channels to a single shared session per room, which preserves shared conversational context but also shares token costs, interrupt state, and context growth.
|
||||
|
||||
### Session Reset Policies
|
||||
|
||||
Gateway sessions are automatically reset based on configurable policies:
|
||||
|
||||
- **idle** — reset after N minutes of inactivity
|
||||
- **daily** — reset at a specific hour each day
|
||||
- **both** — reset on whichever comes first (idle or daily)
|
||||
- **none** — never auto-reset
|
||||
|
||||
Before a session is auto-reset, the agent is given a turn to save any important memories or skills from the conversation.
|
||||
|
||||
Sessions with **active background processes** are never auto-reset, regardless of policy.
|
||||
|
||||
## Storage Locations
|
||||
|
||||
| What | Path | Description |
|
||||
|------|------|-------------|
|
||||
| SQLite database | `~/.hermes/state.db` | All session metadata + messages with FTS5 |
|
||||
| Gateway transcripts | `~/.hermes/sessions/` | JSONL transcripts per session + sessions.json index |
|
||||
| Gateway index | `~/.hermes/sessions/sessions.json` | Maps session keys to active session IDs |
|
||||
|
||||
The SQLite database uses WAL mode for concurrent readers and a single writer, which suits the gateway's multi-platform architecture well.
|
||||
|
||||
### Database Schema
|
||||
|
||||
Key tables in `state.db`:
|
||||
|
||||
- **sessions** — session metadata (id, source, user_id, model, title, timestamps, token counts). Titles have a unique index (NULL titles allowed, only non-NULL must be unique).
|
||||
- **messages** — full message history (role, content, tool_calls, tool_name, token_count)
|
||||
- **messages_fts** — FTS5 virtual table for full-text search across message content
|
||||
|
||||
## Session Expiry and Cleanup
|
||||
|
||||
### Automatic Cleanup
|
||||
|
||||
- Gateway sessions auto-reset based on the configured reset policy
|
||||
- Before reset, the agent saves memories and skills from the expiring session
|
||||
- Ended sessions remain in the database until pruned
|
||||
|
||||
### Manual Cleanup
|
||||
|
||||
```bash
|
||||
# Prune sessions older than 90 days
|
||||
hermes sessions prune
|
||||
|
||||
# Delete a specific session
|
||||
hermes sessions delete <session_id>
|
||||
|
||||
# Export before pruning (backup)
|
||||
hermes sessions export backup.jsonl
|
||||
hermes sessions prune --older-than 30 --yes
|
||||
```
|
||||
|
||||
:::tip
|
||||
The database grows slowly (typical: 10-15 MB for hundreds of sessions). Pruning is mainly useful for removing old conversations you no longer need for search recall.
|
||||
:::
|
||||
Loading…
Add table
Add a link
Reference in a new issue