feature/api-subagent-browser-isolation #15
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/api-subagent-browser-isolation"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
- Introduced a new cron job system allowing users to schedule automated tasks via the CLI, supporting one-time reminders and recurring jobs. - Added commands for managing cron jobs: `/cron` to list jobs, `/cron add` to create new jobs, and `/cron remove` to delete jobs. - Implemented job storage in `~/.hermes/cron/jobs.json` with output saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md`. - Enhanced the CLI and README documentation to include detailed usage instructions and examples for cron job management. - Integrated cron job tools into the hermes-cli toolset, ensuring they are only available in interactive CLI mode. - Added support for cron expression parsing with the `croniter` package, enabling flexible scheduling options.- Changed the default LLM model in the setup wizard and example environment file to 'anthropic/claude-opus-4.6'. - Updated terminal working directory settings in CLI and related files to use the current directory ('.') instead of '/tmp'. - Enhanced documentation comments for clarity on terminal configuration and working directory behavior.- Updated the README to include a new banner image and changed the title emoji from 🦋 to ⚕. - Modified various CLI outputs and scripts to reflect the new branding, ensuring consistency in the use of the ⚕ emoji. - Added a new banner image asset for enhanced visual appeal during installation and setup processes.On macOS, /etc is a symlink to /private/etc. The _is_write_denied() function resolves the input path with os.path.realpath() but the deny list entries were stored as literal strings ("/etc/shadow"). This meant the resolved path "/private/etc/shadow" never matched, allowing writes to sensitive system files on macOS. Fix: Apply os.path.realpath() to deny list entries at module load time so both sides of the comparison use resolved paths. Adds 19 regression tests in tests/tools/test_write_deny.py.The sudo password was embedded in shell commands via single-quote interpolation: echo '{password}' | sudo -S If the password contained shell metacharacters (single quotes, $(), backticks), they would be interpreted by the shell, enabling arbitrary command execution. Fix: use shlex.quote() which properly escapes all shell-special characters, ensuring the password is always treated as a literal string argument to echo.When running via the gateway (e.g. Telegram), the session_search tool returned: {"error": "session_search must be handled by the agent loop"} Root cause: - gateway/run.py creates AIAgent without passing session_db= - self._session_db is None in the agent instance - The dispatch condition "elif function_name == 'session_search' and self._session_db" skips when _session_db is None, falling through to the generic error This fix: 1. Initializes self._session_db in GatewayRunner.__init__() 2. Passes session_db to all AIAgent instantiations in gateway/run.py 3. Adds defensive fallback in run_agent.py to return a clear error when session_db is unavailable, instead of falling through Fixes #105The OpenAI API returns content: null on assistant messages that only contain tool calls. msg.get('content', '') returns None (not '') when the key exists with value None, causing TypeError on len() and string concatenation in _generate_summary and compress. Fix: msg.get('content') or '' — handles both missing keys and None. Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss). Both PRs had stale branches and couldn't be merged directly. Closes #211The OpenAI API returns content: null on assistant messages with tool calls. msg.get('content', '') returns None when the key exists with value None, causing TypeError on len(), string concatenation, and .strip() in downstream code paths. Fixed 4 locations that process conversation messages: - agent/auxiliary_client.py:84 — None passed to API calls - cli.py:1288 — crash on content[:200] and len(content) - run_agent.py:3444 — crash on None.strip() - honcho_integration/session.py:445 — 'None' rendered in transcript 13 other instances were verified safe (already protected, only process user/tool messages, or use the safe pattern). Pattern: msg.get('content', '') → msg.get('content') or '' Fixes #276The 'for pattern in [f"hermes-*{task_id[:8]}*"]' was a loop over a single-element list — just use a plain variable instead.Connect to external MCP servers via stdio transport, discover their tools at startup, and register them into the hermes-agent tool registry. - New tools/mcp_tool.py: config loading, server connection via background event loop, tool handler factories, discovery, and graceful shutdown - model_tools.py: trigger MCP discovery after built-in tool imports - cli.py: call shutdown_mcp_servers in _run_cleanup - pyproject.toml: add mcp>=1.2.0 as optional dependency - 27 unit tests covering config, schema conversion, handlers, registration, SDK interaction, toolset injection, graceful fallback, and shutdown Config format (in ~/.hermes/config.yaml): mcp_servers: filesystem: command: "npx" args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]- Add threading.Lock protecting all shared state (_servers, _mcp_loop, _mcp_thread) - Fix deadlock in shutdown_mcp_servers: _stop_mcp_loop was called inside a _lock block but also acquires _lock (non-reentrant) - Fix race condition in _ensure_mcp_loop with concurrent callers - Change idempotency to per-server (retry failed servers, skip connected) - Dynamic toolset injection via startswith("hermes-") instead of hardcoded list - Parallel shutdown via asyncio.gather instead of sequential loop - Add tests for partial failure retry, parallel shutdown, dynamic injectionGit for Windows can fail during clone when copying hook template files from the system templates directory. The error: fatal: cannot copy '.../templates/hooks/fsmonitor-watchman.sample' to '.git/hooks/...': Invalid argument The script already set windows.appendAtomically=false but only AFTER clone, which is too late since clone itself triggers the error. Fix: - Set git config --global windows.appendAtomically false BEFORE clone - Add a third fallback: clone with --template='' to skip hook template copying entirely (they're optional .sample files)In _handle_max_iterations, the codex_responses path set tools=None to prevent tool calls during summarization. However, the OpenAI SDK's _make_tools() treats None as a valid value (not its Omit sentinel) and tries to iterate over it, causing TypeError: 'NoneType' object is not iterable. Fix: use codex_kwargs.pop('tools', None) to remove the key entirely, so the SDK never receives it and uses its default omit behavior. Fixes #300Refactored image paste internals for testability: - Extracted _try_attach_clipboard_image() method (clipboard → state) - Extracted _build_multimodal_content() method (images → OpenAI format) - chat() now delegates to these instead of inline logic Tests organized in 4 levels: Level 1 (19 tests): Clipboard module — every platform path with realistic subprocess simulation (tools writing files, timeouts, empty files, cleanup on failure) Level 2 (8 tests): _build_multimodal_content — base64 encoding, MIME types (png/jpg/webp/unknown), missing files, multiple images, default question for empty text Level 3 (5 tests): _try_attach_clipboard_image — state management, counter increment/rollback, naming convention, mixed success/failure Level 4 (5 tests): Queue routing — tuple unpacking, command detection, images-only payloads, text-only payloadsmodel_metadata tests (61 tests, was 39): - Token estimation: concrete value assertions, unicode, tool_call messages, vision multimodal content, additive verification - Context length resolution: cache-over-API priority, no-base_url skips cache, missing context_length key in API response - API metadata fetch: canonical_slug aliasing, TTL expiry with time mock, stale cache fallback on API failure, malformed JSON resilience - Probe tiers: above-max returns 2M, zero returns None - Error parsing: Anthropic format ('X > Y maximum'), LM Studio, empty string, unreasonably large numbers — also fixed parser to handle Anthropic format - Cache: corruption resilience (garbage YAML, wrong structure), value updates, special chars in model names Firecrawl config tests (8 tests, was 4): - Singleton caching (core purpose — verified constructor called once) - Constructor failure recovery (retry after exception) - Return value actually asserted (not just constructor args) - Empty string env vars treated as absent - Proper setup/teardown for env var isolationtest_skill_manager_tool.py (20 weak → 0): - Validation error messages verified against exact strings - Name validation: checks specific invalid name echoed in error - Frontmatter validation: exact error text for missing fields, unclosed markers, empty content, invalid YAML - File path validation: traversal, disallowed dirs, root-level test_memory_tool.py (13 weak → 0): - Security scan tests verify both 'Blocked' prefix AND specific threat pattern ID (prompt_injection, exfil_curl, etc.) - Invisible unicode tests verify exact codepoint strings - Snapshot test verifies type, header, content, and isolationThe ripgrep/grep output parser uses `split(':', 2)` to extract file:lineno:content from match lines. On Windows, absolute paths contain a drive letter colon (e.g. `C:\Users\foo\bar.py:42:content`), so `split(':', 2)` produces `["C", "\Users\...", "42:content"]`. `int(parts[1])` then raises ValueError and the match is silently dropped. All search results are lost on Windows. Same category as #390 — string-based path parsing that fails on Windows. Replace `split()` with a regex that optionally captures the drive letter prefix: `^([A-Za-z]:)?(.*?):(\d+):(.*)$`. Applied to both `_search_with_rg` and `_search_with_grep`.1. cron/jobs.py: respect HERMES_HOME env var for job storage path. scheduler.py already uses os.getenv("HERMES_HOME", ...) but jobs.py hardcodes Path.home() / ".hermes", causing path mismatch when HERMES_HOME is set. 2. gateway/run.py: add Platform.HOMEASSISTANT to default_toolset_map and platform_config_key. The adapter and hermes-homeassistant toolset both exist but the mapping dicts omit it, so HomeAssistant events silently fall back to the Telegram toolset. 3. tools/environments/daytona.py: use time.monotonic() for deadline instead of float subtraction. All other backends (docker, ssh, singularity, local) use monotonic clock for timeout tracking. The accumulator pattern (deadline -= 0.2) drifts because t.join(0.2) + interrupt checks take longer than 0.2s per iteration.Issues found and fixed during deep code path review: 1. CRITICAL: Prefix matching returned wrong prices for dated model names - 'gpt-4o-mini-2024-07-18' matched gpt-4o ($2.50) instead of gpt-4o-mini ($0.15) - Same for o3-mini→o3 (9x), gpt-4.1-mini→gpt-4.1 (5x), gpt-4.1-nano→gpt-4.1 (20x) - Fix: use longest-match-wins strategy instead of first-match - Removed dangerous key.startswith(bare) reverse matching 2. CRITICAL: Top Tools section was empty for CLI sessions - run_agent.py doesn't set tool_name on tool response messages (pre-existing) - Insights now also extracts tool names from tool_calls JSON on assistant messages, which IS populated for all sessions - Uses max() merge strategy to avoid double-counting between sources 3. SELECT * replaced with explicit column list - Skips system_prompt and model_config blobs (can be thousands of chars) - Reduces memory and I/O for large session counts 4. Sets in overview dict converted to sorted lists - models_with_pricing / models_without_pricing were Python sets - Sets aren't JSON-serializable — would crash json.dumps() 5. Negative duration guard - end > start check prevents negative durations from clock drift 6. Model breakdown sort fallback - When all tokens are 0, now sorts by session count instead of arbitrary order 7. Removed unused timedelta import Added 6 new tests: dated model pricing (4), tool_calls JSON extraction, JSON serialization safety. Total: 69 tests.Two bugs fixed: 1. search_messages() crashes with OperationalError when user queries contain FTS5 special characters (+, ", (, {, dangling AND/OR, etc). Added _sanitize_fts5_query() to strip dangerous operators and a fallback try-except for edge cases. 2. _append_to_sqlite() in mirror.py creates a new SessionDB per call but never closes it, leaking SQLite connections. Added finally block to ensure db.close() is always called.config['model'] can be a dict (old format: {default, base_url, provider}) or a string (new format). The setup wizard was showing the raw dict in 'Keep current' and 'Model set to' messages. Now extracts the model name from either format._approval_callback() had no return statement after the timeout break, causing it to return None. Callers expect a string ("once", "session", "always", or "deny"), so None could lead to undefined behavior when approving dangerous commands.Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require: 1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1) 2. A User-Agent header identifying a recognized coding agent Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403 ('only available for Coding Agents') errors. Changes: - Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1 - Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints - Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged - KIMI_BASE_URL env var override still takes priority over auto-detection - Updated .env.example with correct docs and all endpoint options - Fixed doctor.py health check for Kimi Code keys Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)The web_extract_tool was stripping the 'url' key during its output trimming step, but documentation in 3 places claimed it was present. This caused KeyError when accessing result['url'] in execute_code scripts, especially when extracting from multiple URLs. Changes: - web_tools.py: Add 'url' back to trimmed_results output - code_execution_tool.py: Add 'title' to _TOOL_STUBS docstring and _TOOL_DOC_LINES so docs match actual {url, title, content, error} response formatLong-lived gateway sessions can accumulate enough history that every new message rehydrates an oversized transcript, causing repeated truncation failures (finish_reason=length). Add a session hygiene check in _handle_message that runs right after loading the transcript and before invoking the agent: 1. Estimate message count and rough token count of the transcript 2. If above configurable thresholds (default: 200 msgs or 100K tokens), auto-compress the transcript proactively 3. Notify the user about the compression with before/after stats 4. If still above warn threshold (default: 200K tokens) after compression, suggest /reset 5. If compression fails on a dangerously large session, warn the user to use /compress or /reset manually Thresholds are configurable via config.yaml: session_hygiene: auto_compress_tokens: 100000 auto_compress_messages: 200 warn_tokens: 200000 This complements the agent's existing preflight compression (which runs inside run_conversation) by catching pathological sessions at the gateway layer before the agent is even created. Includes 12 tests for threshold detection and token estimation.Two fixes: 1. Gateway CWD override: TERMINAL_CWD from config.yaml was being unconditionally overwritten by the messaging_cwd fallback (line 114). Now explicit paths in config.yaml are respected — only '.' / 'auto' / 'cwd' (or unset) fall back to MESSAGING_CWD or home directory. 2. sandbox_dir config: Added terminal.sandbox_dir to config.yaml bridge in gateway/run.py, cli.py, and hermes_cli/config.py. Maps to TERMINAL_SANDBOX_DIR env var, which get_sandbox_dir() reads to determine where Docker/Singularity sandbox data is stored (default: ~/.hermes/sandboxes/). Users can now set: hermes config set terminal.sandbox_dir /data/hermes-sandboxes- normalize_provider('auto') now returns 'openrouter' (the default) so /model shows the curated model list instead of nothing - CLI /model display uses normalize_provider before looking up labels - Gateway /model handler now uses the same validation logic as CLI: live API probe, provider:model syntax, curated model list display/provider command (CLI + gateway): Shows all providers with auth status (✓/✗), aliases, and active marker. Users can now discover what provider names work with provider:model syntax. Gateway bugs fixed: - Config was saved even when validation.persist=False (told user 'session only' but actually persisted the unvalidated model) - HERMES_INFERENCE_PROVIDER env var not set on provider switch, causing the switch to be silently overridden if that env var was already set parse_model_input hardened: - Colon only treated as provider delimiter if left side is a recognized provider name or alias. 'anthropic/claude-3.5-sonnet:beta' now passes through as a model name instead of trying provider='anthropic/claude-3.5-sonnet'. - HTTP URLs, random colons no longer misinterpreted. 56 tests passing across model validation, CLI commands, and integration.The Codex Responses API (chatgpt.com/backend-api/codex) supports vision via gpt-5.3-codex. This was verified with real API calls using image analysis. Changes to _CodexCompletionsAdapter: - Added _convert_content_for_responses() to translate chat.completions multimodal format to Responses API format: - {type: 'text'} → {type: 'input_text'} - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'} - Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it) - Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them) Provider changes: - Added 'codex' as explicit auxiliary provider option - Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex) since gpt-5.3-codex supports multimodal input - Updated docs with Codex OAuth examples Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed working end-to-end through the full adapter pipeline. Tests: 2459 passed.Adds a simple config option to play the terminal bell (\a) when the agent finishes a response. Useful for long-running tasks — switch to another window and your terminal will ding when done. Works over SSH since the bell character propagates through the connection. Most terminal emulators can be configured to flash the taskbar, play a sound, or show a visual indicator on bell. Config (default: off): display: bell_on_complete: true Closes #318The gateway had a SEPARATE compression system ('session hygiene') with hardcoded thresholds (100k tokens / 200 messages) that were completely disconnected from the model's context length and the user's compression config in config.yaml. This caused premature auto-compression on Telegram/Discord — triggering at ~60k tokens (from the 200-message threshold) or inconsistent token counts. Changes: - Gateway hygiene now reads model name from config.yaml and uses get_model_context_length() to derive the actual context limit - Compression threshold comes from compression.threshold in config.yaml (default 0.85), same as the agent's ContextCompressor - Removed the message-count-based trigger (was redundant and caused false positives in tool-heavy sessions) - Removed the undocumented session_hygiene config section — the standard compression.* config now controls everything - Env var overrides (CONTEXT_COMPRESSION_THRESHOLD, CONTEXT_COMPRESSION_ENABLED) are respected - Warn threshold is now 95% of model context (was hardcoded 200k) - Updated tests to verify model-aware thresholds, scaling across models, and that message count alone no longer triggers compression For claude-opus-4.6 (200k context) at 85% threshold: gateway hygiene now triggers at 170k tokens instead of the old 100k.When the primary model/provider fails after retries (rate limit, overload, auth errors, connection failures), Hermes automatically switches to a configured fallback model for the remainder of the session. Config (in ~/.hermes/config.yaml): fallback_model: provider: openrouter model: anthropic/claude-sonnet-4 Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together, Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and api_key_env overrides. Design principles: - Dead simple: one fallback model, not a chain - One-shot: switches once, doesn't ping-pong back - Zero new dependencies: uses existing OpenAI client - Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway - Three trigger points: max retries exhausted, non-retryable client errors, and invalid response exhaustion Does NOT trigger on context overflow or payload-too-large errors (those are handled by the existing compression system). Addresses #737. 25 new tests, 2492 total passing.Add interactive skill configuration via `hermes skills` command, mirroring the existing `hermes tools` pattern. Changes: - hermes_cli/skills_config.py (new): skills_command() entry point with curses checklist UI + numbered fallback. Supports global and per-platform disable lists, individual skill toggle, and category toggle. - hermes_cli/main.py: register `hermes skills` subcommand - tools/skills_tool.py: add _is_skill_disabled() and filter disabled skills in _find_all_skills(). Resolves platform from argument, HERMES_PLATFORM env var, then falls back to global disabled list. Config schema (config.yaml): skills: disabled: [skill-a] # global platform_disabled: telegram: [skill-b] # per-platform override 22 unit tests, 2489 passed, 0 failed. Closes #642Adds a simple config option to play the terminal bell (\a) when the agent finishes a response. Useful for long-running tasks — switch to another window and your terminal will ding when done. Works over SSH since the bell character propagates through the connection. Most terminal emulators can be configured to flash the taskbar, play a sound, or show a visual indicator on bell. Config (default: off): display: bell_on_complete: true Closes #318The gateway had a SEPARATE compression system ('session hygiene') with hardcoded thresholds (100k tokens / 200 messages) that were completely disconnected from the model's context length and the user's compression config in config.yaml. This caused premature auto-compression on Telegram/Discord — triggering at ~60k tokens (from the 200-message threshold) or inconsistent token counts. Changes: - Gateway hygiene now reads model name from config.yaml and uses get_model_context_length() to derive the actual context limit - Compression threshold comes from compression.threshold in config.yaml (default 0.85), same as the agent's ContextCompressor - Removed the message-count-based trigger (was redundant and caused false positives in tool-heavy sessions) - Removed the undocumented session_hygiene config section — the standard compression.* config now controls everything - Env var overrides (CONTEXT_COMPRESSION_THRESHOLD, CONTEXT_COMPRESSION_ENABLED) are respected - Warn threshold is now 95% of model context (was hardcoded 200k) - Updated tests to verify model-aware thresholds, scaling across models, and that message count alone no longer triggers compression For claude-opus-4.6 (200k context) at 85% threshold: gateway hygiene now triggers at 170k tokens instead of the old 100k.Implements config-driven quick commands for both CLI and gateway that execute locally without invoking the LLM. Config example (~/.hermes/config.yaml): quick_commands: limits: type: exec command: /home/user/.local/bin/hermes-limits dn: type: exec command: echo daily-note Changes: - hermes_cli/config.py: add quick_commands: {} default - cli.py: check quick_commands before skill commands in process_command() - gateway/run.py: check quick_commands before skill commands in _handle_message() - tests/test_quick_commands.py: 11 tests covering exec, timeout, unsupported type, missing command, priority over skills Closes #744New config option: security: redact_secrets: false # default: true When set to false, API keys, tokens, and passwords are shown in full in read_file, search_files, and terminal output. Useful for debugging auth issues where you need to verify the actual key value. Bridged to both CLI and gateway via HERMES_REDACT_SECRETS env var. The check is in redact_sensitive_text() itself, so all call sites (terminal, file tools, log formatter) respect it.hermes model069570d103Users with multiple local servers or custom endpoints can now define them all in config.yaml and switch between them from the model selection menu: custom_providers: - name: 'Local Llama 70B' base_url: 'http://localhost:8000/v1' api_key: 'not-needed' - name: 'RunPod vLLM' base_url: 'https://xyz.runpod.ai/v1' api_key: 'rp_xxxxx' These appear in `hermes model` provider selection alongside the built-in providers. When selected, the endpoint's /models API is probed to show available models in a selection menu. Previously only a single 'Custom endpoint' option existed, requiring manual URL entry each time you wanted to switch between local servers. Requested by @ZiarnoBobu on Twitter.Validate that the username portion of ~username paths contains only valid characters (alphanumeric, dot, hyphen, underscore) before passing to shell echo for expansion. Previously, paths like '~; rm -rf /' would be passed unquoted to self._exec(f'echo {path}'), allowing arbitrary command execution. The approach validates the username rather than using shlex.quote(), which would prevent tilde expansion from working at all since echo '~user' outputs the literal string instead of expanding it. Added tests for injection blocking and valid ~username/path expansion. Credit to @alireza78a for reporting (PR #442, issue #442).Authored by Himess. Replaces split(':', 2) with regex that optionally captures Windows drive-letter prefix in rg/grep output parsing. Fixes search_files returning zero results on Windows where paths like C:\path\file.py:42:content were misparsed by naive colon splitting. No behavior change on Unix/Mac.Authored by alireza78a. Replaces open('w') + json.dump with tempfile.mkstemp + os.replace atomic write pattern, matching the existing pattern in cron/jobs.py. Prevents silent session loss if the process crashes or gets OOM-killed mid-write. Resolved conflict: kept encoding='utf-8' from HEAD in the new fdopen call.build_execute_code_schema(set()) produced "from hermes_tools import , ..." in the code property description — invalid Python syntax shown to the model. This triggers when a user enables only the code_execution toolset without any of the sandbox-allowed tools (e.g. `hermes tools code_execution`), because SANDBOX_ALLOWED_TOOLS & {"execute_code"} = empty set. Also adds 29 unit tests covering build_execute_code_schema, environment variable filtering, execute_code edge cases, and interrupt handling.Authored by alireza78a. Adds atomic_yaml_write() to utils.py (mirrors existing atomic_json_write pattern), replaces bare open('w') in save_config(). Integrated with max_turns normalization and commented sections via extra_content param. 3 new tests for crash safety.- Organize COMMANDS into COMMANDS_BY_CATEGORY dict - Group commands: Session, Configuration, Tools & Skills, Info, Exit - Add visual category headers with spacing - Maintain backwards compat via flat COMMANDS dict - Better visual hierarchy and scannability Before: /help - Show this help message /tools - List available tools ... (dense list) After: ── Session ── /new Start a new conversation /reset Reset conversation only ... ── Configuration ── /config Show current configuration ... Closes #640Two-tier warning system that nudges the LLM as it approaches max_iterations, injected into the last tool result JSON rather than as a separate system message: - Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"} - Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"} For JSON tool results, adds a _budget_warning field to the existing dict. For plain text results, appends the warning as text. Key properties: - No system messages injected mid-conversation - No changes to message structure - Prompt cache stays valid - Configurable thresholds (0.7 / 0.9) - Can be disabled: _budget_pressure_enabled = False Inspired by PR #421 (@Bartok9) and issue #414. 8 tests covering thresholds, edge cases, JSON and text injection.HermesCLI.__init__ never assigned self.config, causing an AttributeError ('HermesCLI object has no attribute config') whenever an unrecognized slash command fell through to the quick_commands check (line 2832). This broke skill slash commands like /x-thread-creation since the quick_commands lookup runs before the skill command check. Set self.config = CLI_CONFIG in __init__, matching the pattern used by the gateway (run.py:199).HermesCLI.__init__ never assigned self.config, causing an AttributeError ('HermesCLI' object has no attribute 'config') whenever an unrecognized slash command fell through to the quick_commands check on line 2838. This affected skill slash commands like /x-thread-creation since the quick_commands lookup runs before the skill command check. Set self.config = CLI_CONFIG in __init__ to match the pattern used by the gateway (run.py:199).save_env_value() used bare open('w') which truncates .env immediately. A crash or OOM kill between truncation and completed write silently wipes every credential in the file. Write now goes to a temp file first, then os.replace() swaps it atomically. Either the old .env exists or the new one does — never a truncated half-write. Same pattern used in cron/jobs.py. Cherry-picked from PR #842 by alireza78a, rebased onto current main with conflict resolution (_secure_file refactor). Co-authored-by: alireza78a <alireza78a@users.noreply.github.com>Three interconnected fixes for auxiliary client infrastructure: 1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py) Add resolve_provider_client(provider, model, async_mode) — a single entry point for creating properly configured clients. Given a provider name and optional model, it handles auth lookup (env vars, OAuth tokens, auth.json), base URL resolution, provider-specific headers, and API format differences (Chat Completions vs Responses API for Codex). All auxiliary consumers should route through this instead of ad-hoc env var lookups. Refactored get_text_auxiliary_client, get_async_text_auxiliary_client, and get_vision_auxiliary_client to use the router internally. 2. FIX CODEX VISION BYPASS (vision_tools.py) vision_tools.py was constructing a raw AsyncOpenAI client from the sync vision client's api_key/base_url, completely bypassing the Codex Responses API adapter. When the vision provider resolved to Codex, the raw client would hit chatgpt.com/backend-api/codex with chat.completions.create() which only supports the Responses API. Fix: Added get_async_vision_auxiliary_client() which properly wraps Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this instead of manual client construction. 3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING - context_compressor.py: Removed _get_fallback_client() which blindly looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth, API-key providers, users without OPENAI_BASE_URL set). Replaced with fallback loop through resolve_provider_client() for each known provider, with same-provider dedup. - vision_tools.py: Added error detection for vision capability failures. Returns clear message to the model when the configured model doesn't support vision, instead of a generic error. Addresses #886Route all remaining ad-hoc auxiliary LLM call sites through resolve_provider_client() so auth, headers, and API format (Chat Completions vs Responses API) are handled consistently in one place. Files changed: - tools/openrouter_client.py: Replace manual AsyncOpenAI construction with resolve_provider_client('openrouter', async_mode=True). The shared client module now delegates entirely to the router. - tools/skills_guard.py: Replace inline OpenAI client construction (hardcoded OpenRouter base_url, manual api_key lookup, manual headers) with resolve_provider_client('openrouter'). Remove unused OPENROUTER_BASE_URL import. - trajectory_compressor.py: Add _detect_provider() to map config base_url to a provider name, then route through resolve_provider_client. Falls back to raw construction for unrecognized custom endpoints. - mini_swe_runner.py: Route default case (no explicit api_key/base_url) through resolve_provider_client('openrouter') with auto-detection fallback. Preserves direct construction when explicit creds are passed via CLI args. - agent/auxiliary_client.py: Fix stale module docstring — vision auto mode now correctly documents that Codex and custom endpoints are tried (not skipped).Model selection now comes exclusively from config.yaml (set via 'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer read or written anywhere in production code. Why: env vars are per-process/per-user and would conflict in multi-agent or multi-tenant setups. Config.yaml is file-based and can be scoped per-user or eventually per-session. Changes: - cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL - hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL to .env - hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...) calls from all provider setup flows - gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works for gateway process runtime) - cron/scheduler.py: Same - agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint model detectionBoth /model and /provider now show the same unified display: Current: anthropic/claude-opus-4.6 via OpenRouter Authenticated providers & models: [openrouter] ← active anthropic/claude-opus-4.6 ← current anthropic/claude-sonnet-4.5 ... [nous] claude-opus-4-6 gemini-3-flash ... [openai-codex] gpt-5.2-codex gpt-5.1-codex-mini ... Not configured: Z.AI / GLM, Kimi / Moonshot, ... Switch model: /model <model-name> Switch provider: /model <provider>:<model-name> Example: /model nous:claude-opus-4-6 Users can see all authenticated providers and their models at a glance, making it easy to switch mid-conversation. Also added curated model lists for Nous Portal and OpenAI Codex to hermes_cli/models.py.* fix: /reasoning command output ordering, display, and inline think extraction Three issues with the /reasoning command: 1. Output interleaving: The command echo used print() while feedback used _cprint(), causing them to render out-of-order under prompt_toolkit's patch_stdout. Changed echo to use _cprint() so all output renders through the same path in correct order. 2. Reasoning display not working: /reasoning show toggled a flag but reasoning never appeared for models that embed thinking in inline <think> blocks rather than structured API fields. Added fallback extraction in _build_assistant_message to capture <think> block content as reasoning when no structured reasoning fields (reasoning, reasoning_content, reasoning_details) are present. This feeds into both the reasoning callback (during tool loops) and the post-response reasoning box display. 3. Feedback clarity: Added checkmarks to confirm actions, persisted show/hide to config (was session-only before), and aligned the status display for readability. Tests: 7 new tests for inline think block extraction (41 total). * feat: add /reasoning command to gateway (Telegram/Discord/etc) The /reasoning command only existed in the CLI — messaging platforms had no way to view or change reasoning settings. This adds: 1. /reasoning command handler in the gateway: - No args: shows current effort level and display state - /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh) - /reasoning show|hide: toggles reasoning display in responses - All changes saved to config.yaml immediately 2. Reasoning display in gateway responses: - When show_reasoning is enabled, prepends a 'Reasoning' block with the model's last_reasoning content before the response - Collapses long reasoning (>15 lines) to keep messages readable - Uses last_reasoning from run_conversation result dict 3. Plumbing: - Added _show_reasoning attribute loaded from config at startup - Propagated last_reasoning through _run_agent return dict - Added /reasoning to help text and known_commands set - Uses getattr for _show_reasoning to handle test stubs* fix: /reasoning command output ordering, display, and inline think extraction Three issues with the /reasoning command: 1. Output interleaving: The command echo used print() while feedback used _cprint(), causing them to render out-of-order under prompt_toolkit's patch_stdout. Changed echo to use _cprint() so all output renders through the same path in correct order. 2. Reasoning display not working: /reasoning show toggled a flag but reasoning never appeared for models that embed thinking in inline <think> blocks rather than structured API fields. Added fallback extraction in _build_assistant_message to capture <think> block content as reasoning when no structured reasoning fields (reasoning, reasoning_content, reasoning_details) are present. This feeds into both the reasoning callback (during tool loops) and the post-response reasoning box display. 3. Feedback clarity: Added checkmarks to confirm actions, persisted show/hide to config (was session-only before), and aligned the status display for readability. Tests: 7 new tests for inline think block extraction (41 total). * feat: add /reasoning command to gateway (Telegram/Discord/etc) The /reasoning command only existed in the CLI — messaging platforms had no way to view or change reasoning settings. This adds: 1. /reasoning command handler in the gateway: - No args: shows current effort level and display state - /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh) - /reasoning show|hide: toggles reasoning display in responses - All changes saved to config.yaml immediately 2. Reasoning display in gateway responses: - When show_reasoning is enabled, prepends a 'Reasoning' block with the model's last_reasoning content before the response - Collapses long reasoning (>15 lines) to keep messages readable - Uses last_reasoning from run_conversation result dict 3. Plumbing: - Added _show_reasoning attribute loaded from config at startup - Propagated last_reasoning through _run_agent return dict - Added /reasoning to help text and known_commands set - Uses getattr for _show_reasoning to handle test stubs * fix: improve Kimi model selection — auto-detect endpoint, add missing models Kimi Coding Plan setup: - New dedicated _model_flow_kimi() replaces the generic API-key flow for kimi-coding. Removes the confusing 'Base URL' prompt entirely — the endpoint is auto-detected from the API key prefix: sk-kimi-* → api.kimi.com/coding/v1 (Kimi Coding Plan) other → api.moonshot.ai/v1 (legacy Moonshot) - Shows appropriate models for each endpoint: Coding Plan: kimi-for-coding, kimi-k2.5, kimi-k2-thinking, kimi-k2-thinking-turbo Moonshot: full model catalog - Clears any stale KIMI_BASE_URL override so runtime auto-detection via _resolve_kimi_base_url() works correctly. Model catalog updates: - Added kimi-for-coding (primary Coding Plan model) and kimi-k2-thinking-turbo to models.py, main.py _PROVIDER_MODELS, and model_metadata.py context windows. - Updated User-Agent from KimiCLI/1.0 to KimiCLI/1.3 (Kimi's coding endpoint whitelists known coding agents via User-Agent sniffing).Root cause: two issues combined to create visual spam on Telegram/Discord: 1. build_tool_preview() preserved newlines from tool arguments. A preview like 'import os\nprint("...")' rendered as 2+ visual lines per progress entry on messaging platforms. This affected execute_code most (code always has newlines), but could also hit terminal, memory, send_message, session_search, and process tools. 2. No deduplication of identical progress messages. When models iterate with execute_code using the same boilerplate code (common pattern), each call produced an identical progress line. 9 calls x 2 visual lines = 18 lines of identical spam in one message bubble. Fixes: - Added _oneline() helper to collapse all whitespace (newlines, tabs) to single spaces. Applied to ALL code paths in build_tool_preview() — both the generic path and every early-return path that touches user content (memory, session_search, send_message, process). - Added dedup in gateway progress_callback: consecutive identical messages are collapsed with a repeat counter, e.g. 'execute_code: ... (x9)' instead of 9 identical lines. The send_progress_messages async loop handles dedup tuples by updating the last progress_line in-place.Three bugs fixed in the Slack adapter: 1. Tool progress messages leaked to main channel instead of thread. Root cause: metadata key mismatch — gateway uses 'thread_id' but Slack adapter checked for 'thread_ts'. Added _resolve_thread_ts() helper that checks both keys with correct precedence. 2. Bot responses could escape threads for replies. Root cause: reply_to was set to the child message's ts, but Slack API needs the parent message's ts for thread_ts. Now metadata thread_id (always the parent ts) takes priority over reply_to. 3. All Slack DMs shared one session key ('agent:main:slack:dm'), so a long-running task blocked all other DM conversations. Fix: DMs with thread_id now get per-thread session keys. Top-level DMs still share one session for conversation continuity. Additional fix: All Slack media methods (send_image, send_voice, send_video, send_document, send_image_file) now accept metadata parameter for thread routing. Previously they only accepted reply_to, which caused media to silently fail to post in threads. Session key behavior after this change: - Slack channel @mention: creates thread, thread = session - Slack thread reply: stays in thread, same session - Slack DM (top-level): one continuous session - Slack DM (threaded): per-thread session - Other platforms: unchangedFeedback fixes: 1. Revert _convert_vision_content — vision is handled by the vision_analyze tool, not by converting image blocks inline in conversation messages. Removed the function and its tests. 2. Add Anthropic to 'hermes model' (cmd_model in main.py): - Added to provider_labels dict - Added to providers selection list - Added _model_flow_anthropic() with Claude Code credential auto-detection, API key prompting, and model selection from catalog. 3. Wire up Anthropic as a vision-capable auxiliary provider: - Added _try_anthropic() to auxiliary_client.py using claude-sonnet-4 as the vision model (Claude natively supports multimodal) - Added to the get_vision_auxiliary_client() auto-detection chain (after OpenRouter/Nous, before Codex/custom) Cache tracking note: the Anthropic cache metrics branch in run_agent.py (cache_read_input_tokens / cache_creation_input_tokens) is in the correct place — it's response-level parsing, same location as the existing OpenRouter cache tracking. auxiliary_client.py has no cache tracking.Both 'hermes model' and 'hermes setup model' now present a clear two-option auth flow when no credentials are found: 1. Claude Pro/Max subscription (setup-token) - Step-by-step instructions to run 'claude setup-token' - User pastes the resulting sk-ant-oat01-... token 2. Anthropic API key (pay-per-token) - Link to console.anthropic.com/settings/keys - User pastes sk-ant-api03-... key Also handles: - Auto-detection of existing Claude Code creds (~/.claude/.credentials.json) - Existing credentials shown with option to update - Consistent UX between 'hermes model' and 'hermes setup model'For Claude 4.6 models (Opus and Sonnet), the Anthropic API rejects budget_tokens when thinking.type is 'adaptive'. This was causing a 400 error: 'thinking.adaptive.budget_tokens: Extra inputs are not permitted'. Changes: - Send thinking: {type: 'adaptive'} without budget_tokens for 4.6 - Move effort control to output_config: {effort: ...} per Anthropic docs - Map Hermes effort levels to Anthropic effort levels (xhigh->max, etc.) - Narrow adaptive detection to 4.6 models only (4.5 still uses manual) - Add tests for adaptive thinking on 4.6 and manual thinking on pre-4.6 Fixes #1126save_job_output() used bare open('w') which truncates the output file immediately. A crash or OOM kill between truncation and the completed write would silently wipe the job output. Write now goes to a temp file first, then os.replace() swaps it atomically — matching the existing save_jobs() pattern in the same file. Preserves _secure_file() permissions and uses safe cleanup on error. Cherry-picked from PR #874 by alireza78a, rebased onto current main with conflict resolution and fixes: - Kept _secure_dir/_secure_file security calls from PR #757 - Used except BaseException (not bare except) to match save_jobs pattern - Wrapped os.unlink in try/except OSError to avoid masking errors Co-authored-by: alireza78a <alireza78a@users.noreply.github.com>Several files resolved paths via Path.home() / ".hermes" or os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME environment variable. This broke isolation when running multiple Hermes instances with distinct HERMES_HOME directories. Replace all hardcoded paths with calls to get_hermes_home() from hermes_cli.config, consistent with the rest of the codebase. Files fixed: - tools/process_registry.py (processes.json) - gateway/pairing.py (pairing/) - gateway/sticker_cache.py (sticker_cache.json) - gateway/channel_directory.py (channel_directory.json, sessions.json) - gateway/config.py (gateway.json, config.yaml, sessions_dir) - gateway/mirror.py (sessions/) - gateway/hooks.py (hooks/) - gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/) - gateway/platforms/whatsapp.py (whatsapp/session) - gateway/delivery.py (cron/output) - agent/auxiliary_client.py (auth.json) - agent/prompt_builder.py (SOUL.md) - cli.py (config.yaml, images/, pastes/, history) - run_agent.py (logs/) - tools/environments/base.py (sandboxes/) - tools/environments/modal.py (modal_snapshots.json) - tools/environments/singularity.py (singularity_snapshots.json) - tools/tts_tool.py (audio_cache) - hermes_cli/status.py (cron/jobs.json, sessions.json) - hermes_cli/gateway.py (logs/, whatsapp session) - hermes_cli/main.py (whatsapp/session) Tests updated to use HERMES_HOME env var instead of patching Path.home(). Closes #892 (cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)- Skip silent recordings before STT call (RMS check in AudioRecorder.stop) - Filter known Whisper hallucinations ("Thank you.", "Bye." etc.) - Continuous mode: Ctrl+R starts loop, Ctrl+R during recording exits it - Wait for TTS to finish before auto-restart to avoid recording speaker - Silence timeout increased to 3s for natural pauses - Tests: hallucination filter, silent recording skip, real speech passthrough- AudioRecorder.start() now catches InputStream errors gracefully with a clear error message about microphone availability - Fix config key mismatch: cli.py was reading "push_to_talk_key" but config.py defines "record_key" -- now consistent - Add format conversion from config format ("ctrl+b") to prompt_toolkit format ("c-b")- Change RTP packet logging from INFO to DEBUG level to reduce noise (SPEAKING events remain at INFO as they are important lifecycle events) - Use per-session chat_id (web_{session_id}) instead of shared "web" to isolate conversation context between simultaneous web users1. Gate _streaming_api_call to chat_completions mode only — Anthropic and Codex fall back to _interruptible_api_call. Preserve Anthropic base_url across all client rebuild paths (interrupt, fallback, 401 refresh). 2. Discord VC synthetic events now use chat_type="channel" instead of defaulting to "dm" — prevents session bleed into DM context. Authorization runs before echoing transcript. Sanitize @everyone/@here in voice transcripts. 3. CLI voice prefix ("[Voice input...]") is now API-call-local only — stripped from returned history so it never persists to session DB or resumed sessions. 4. /voice off now disables base adapter auto-TTS via _auto_tts_disabled_chats set — voice input no longer triggers TTS when voice mode is off.Tests were still mocking imap.search() and imap.fetch() but the implementation was changed to use imap.uid("search", ...) and imap.uid("fetch", ...) for proper UID-based IMAP operations.- detect Telegram getUpdates conflicts and stop polling cleanly instead of retry-spamming forever - add a machine-local token-scoped lock so different HERMES_HOME profiles on the same host can't poll the same bot token at once - persist gateway runtime health/fatal adapter state and surface it in ● hermes-gateway.service - Hermes Agent Gateway - Messaging Platform Integration Loaded: loaded (/home/teknium/.config/systemd/user/hermes-gateway.service; enabled; preset: enabled) Active: active (running) since Sat 2026-03-14 09:25:35 PDT; 2h 45min ago Invocation: 8879379b25994201b98381f4bd80c2af Main PID: 1147926 (python) Tasks: 16 (limit: 76757) Memory: 151.4M (peak: 168.1M) CPU: 47.883s CGroup: /user.slice/user-1000.slice/user@1000.service/app.slice/hermes-gateway.service ├─1147926 /home/teknium/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main gateway run --replace └─1147966 node /home/teknium/.hermes/hermes-agent/scripts/whatsapp-bridge/bridge.js --port 3000 --session /home/teknium/.hermes/whatsapp/session --mode self-chat Mar 14 09:27:03 teknium-dev python[1147926]: 🔄 Retrying API call (2/3)... Mar 14 09:27:04 teknium-dev python[1147926]: [409B blob data] Mar 14 09:27:04 teknium-dev python[1147926]: Content: '' Mar 14 09:27:04 teknium-dev python[1147926]: ❌ Max retries (3) for empty content exceeded. Mar 14 09:27:07 teknium-dev python[1147926]: [1K blob data] Mar 14 09:27:07 teknium-dev python[1147926]: Content: '' Mar 14 09:27:07 teknium-dev python[1147926]: 🔄 Retrying API call (1/3)... Mar 14 09:27:12 teknium-dev python[1147926]: [1.7K blob data] Mar 14 09:27:12 teknium-dev python[1147926]: Content: '' Mar 14 09:27:12 teknium-dev python[1147926]: 🔄 Retrying API call (2/3)... ⚠ Installed gateway service definition is outdated Run: hermes gateway restart # auto-refreshes the unit ✓ Gateway service is running ✓ Systemd linger is enabled (service survives logout) - cleanly exit non-retryable startup conflicts without triggering service restart loops Tests: - gateway status runtime-state helpers - Telegram token-lock and polling-conflict behavior - GatewayRunner clean exit on non-retryable startup conflict - CLI runtime health summaryWhen shutil.which('hermes') returns None, _resolve_hermes_bin() now tries sys.executable -m hermes_cli.main as a fallback. This handles setups where Hermes is launched via a venv or module invocation and the hermes symlink is not on PATH for the gateway process. Fixes #1049`hermes update` crashed with CalledProcessError when run on a local-only branch (e.g. fix/stoicneko) because `git rev-list HEAD..origin/{branch}` fails when origin/{branch} doesn't exist. Now verifies the remote branch exists first and falls back to origin/main.When the Responses API returns tool call arguments as a dict, str(dict) produces Python repr with single quotes (e.g. {'key': 'val'}) which is invalid JSON. Downstream json.loads() fails silently and the tool gets called with empty arguments, losing all parameters. Affects both function_call and custom_tool_call item types in _normalize_codex_response().The fork bomb regex used `()` (empty capture group) and unescaped `{}` instead of literal `\(\)` and `\{\}`. This meant the classic fork bomb `:(){ :|:& };:` was never detected. Also added `\s*` between `:` and `&` and between `;` and trailing `:` to catch whitespace variants.Resolve session IDs by exact match or unique prefix for sessions delete/export/rename so IDs copied from Preview Last Active Src ID ────────────────────────────────────────────────────────────────────────────────────────── Search for GitHub/GitLab source repositories for 11m ago cli 20260315_034720_8e1f [SYSTEM: The user has invoked the "minecraft-atm 1m ago cli 20260315_034035_57b6 1h ago cron cron_job-1_20260315_ [SYSTEM: The user has invoked the "hermes-agent- 9m ago cli 20260315_014304_652a 4h ago cron cron_job-1_20260314_ [The user attached an image. Here's what it cont 4h ago cli 20260314_233806_c8f3 [SYSTEM: The user has invoked the "google-worksp 1h ago cli 20260314_233301_b04f Inspect the opencode codebase for how it sends m 4h ago cli 20260314_232543_0601 Inspect the clawdbot codebase for how it sends m 4h ago cli 20260314_232543_8125 4h ago cron cron_job-1_20260314_ Reply with exactly: smoke-ok 4h ago cli 20260314_231730_aac9 4h ago cron cron_job-1_20260314_ [SYSTEM: The user has invoked the "hermes-agent- 4h ago cli 20260314_231111_3586 [SYSTEM: The user has invoked the "hermes-agent- 4h ago cli 20260314_225551_daff 5h ago cron cron_job-1_20260314_ [SYSTEM: The user has invoked the "google-worksp 4h ago cli 20260314_224629_a9c6 k_sze — 10:34 PM Just ran hermes update and I 5h ago cli 20260314_224243_544e 5h ago cron cron_job-1_20260314_ 5h ago cron cron_job-1_20260314_ 5h ago cron cron_job-1_20260314_ work even when the table view truncates them. Add SessionDB prefix-resolution coverage and a CLI regression test for deleting by listed prefix.When a cronjob is created from within a Telegram or Slack thread, deliver=origin was posting to the parent channel instead of the thread. Root cause: the gateway never set HERMES_SESSION_THREAD_ID in the session environment, so cronjob_tools.py could not capture thread_id into the job's origin metadata — even though the scheduler already reads origin.get('thread_id'). Fix: - gateway/run.py: set HERMES_SESSION_THREAD_ID when thread_id is present on the session context, and clear it in _clear_session_env - tools/cronjob_tools.py: read HERMES_SESSION_THREAD_ID into origin Closes #1219Two changes to align Discord behavior with Slack: 1. Auto-thread on @mention (default: true) - When someone @mentions the bot in a server channel, a thread is automatically created from their message and the response goes there. - Each thread gets its own isolated session (like Slack). - Configurable via discord.auto_thread in config.yaml (default: true) or DISCORD_AUTO_THREAD env var (env takes precedence). - DMs and existing threads are unaffected. 2. Skip @mention in bot-participated threads - Once the bot has responded in a thread (auto-created or manually entered), subsequent messages in that thread no longer require @mention. Users can just type normally. - Tracked via in-memory set (_bot_participated_threads). After a gateway restart, users need to @mention once to re-establish. - Threads the bot hasn't participated in still require @mention. Config change: discord: auto_thread: true # new, added to DEFAULT_CONFIG Tests: 7 new tests covering auto-thread default, disable, bot thread participation tracking, and mention skip logic. All 903 gateway tests pass.Complete the YAML null handling for all three SessionResetPolicy fields. at_hour and idle_minutes already had null coalescing; mode was still using data.get('mode', 'both') which returns None when the key exists with an explicit null value. Add regression test covering all-null input. Based on PR #1120 by stablegenius49.* feat(email): add skip_attachments option via config.yaml Adds a config.yaml-driven option to skip email attachments in the gateway email adapter. Useful for malware protection and bandwidth savings. Configure in config.yaml: platforms: email: skip_attachments: true Based on PR #1521 by @an420eth, changed from env var to config.yaml (via PlatformConfig.extra) to match the project's config-first pattern. * docs: document skip_attachments option for email adapterStage 3 of streaming support. Gateway now streams tokens to messaging platforms: - StreamingConfig dataclass (enabled, transport, edit_interval, buffer_threshold, cursor) on GatewayConfig with from_dict/to_dict serialization - GatewayStreamConsumer: async queue-based consumer that progressively edits a single message on the target platform (edit transport) - on_delta() → queue → run() async task → send_or_edit() with rate limiting - already_sent propagation: when streaming delivered the response, handler returns None so base adapter skips duplicate send() - stream_delta_callback wired into AIAgent constructor in _run_agent - Consumer lifecycle: started as asyncio task, awaited with timeout in finally Config (config.yaml): streaming: enabled: true transport: edit # progressive editMessageText edit_interval: 0.3 # seconds between edits buffer_threshold: 40 # chars before forcing flush cursor: ' ▉' Credit: jobless0x (#774, #1312), OutThisLife (#798), clicksingh (#697).The persistent status bar now shows context %, token counts, and duration but NOT $ cost by default. Cost display is opt-in via: display: show_cost: true in config.yaml, or: hermes config set display.show_cost true The /usage command still shows full cost breakdown since the user explicitly asked for it — this only affects the always-visible bar. Status bar without cost: ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ 15m Status bar with show_cost: true: ⚕ claude-sonnet-4 │ 12K/200K │ 6% │ $0.06 │ 15mPlugin system for extending Hermes with custom tools, hooks, and integrations — no source code changes required. Core system (hermes_cli/plugins.py): - Plugin discovery from ~/.hermes/plugins/, .hermes/plugins/, and pip entry_points (hermes_agent.plugins group) - PluginContext with register_tool() and register_hook() - 6 lifecycle hooks: pre/post tool_call, pre/post llm_call, on_session_start/end - Namespace package handling for relative imports in plugins - Graceful error isolation — broken plugins never crash the agent Integration (model_tools.py): - Plugin discovery runs after built-in + MCP tools - Plugin tools bypass toolset filter via get_plugin_tool_names() - Pre/post tool call hooks fire in handle_function_call() CLI: - /plugins command shows loaded plugins, tool counts, status - Added to COMMANDS dict for autocomplete Docs: - Getting started guide (build-a-hermes-plugin.md) — full tutorial building a calculator plugin step by step - Reference page (features/plugins.md) — quick overview + tables - Covers: file structure, schemas, handlers, hooks, data files, bundled skills, env var gating, pip distribution, common mistakes Tests: 16 tests covering discovery, loading, hooks, tool visibility.Streaming is now off by default for both CLI and gateway. Users opt in: CLI (config.yaml): display: streaming: true Gateway (config.yaml): streaming: enabled: true This lets early adopters test streaming while existing users see zero change. Once we have enough field validation, we flip the default to true in a subsequent release.* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting Anthropic routes OAuth/subscription requests based on Claude Code's identity markers. Without them, requests get intermittent 500 errors (~25% failure rate observed). This matches what pi-ai (clawdbot) and OpenCode both implement for OAuth compatibility. Changes (OAuth tokens only — API key users unaffected): 1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli' 2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI' 3. System prompt sanitization: replace Hermes/Nous references 4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools) 5. Tool name stripping: remove 'mcp_' prefix from response tool calls Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate) After: 16/16 OK, 0 failures, 0 retries (0% error rate) * fix: three gateway issues from user error logs 1. send_animation missing metadata kwarg (base.py) - Base class send_animation lacked the metadata parameter that the call site in base.py line 917 passes. Telegram's override accepted it, but any platform without an override (Discord, Slack, etc.) hit TypeError. Added metadata to base class signature. 2. MarkdownV2 split-inside-inline-code (base.py truncate_message) - truncate_message could split at a space inside an inline code span (e.g. `function(arg1, arg2)`), leaving an unpaired backtick and unescaped parentheses in the chunk. Telegram rejects with 'character ( is reserved'. Added inline code awareness to the split-point finder — detects odd backtick counts and moves the split before the code span. 3. tirith auto-install without cosign (tirith_security.py) - Previously required cosign on PATH for auto-install, blocking install entirely with a warning if missing. Now proceeds with SHA-256 checksum verification only when cosign is unavailable. Cosign is still used for full supply chain verification when present. If cosign IS present but verification explicitly fails, install is still aborted (tampered release).* fix(tools): remove unnecessary crontab requirement from cronjob tool The hermes cron system is internal — it uses a JSON-based scheduler ticked by the gateway (cron/scheduler.py), not system crontab. The check for shutil.which('crontab') was preventing the cronjob tool from being available in environments without crontab installed (e.g. minimal Ubuntu containers). Changes: - Remove shutil.which('crontab') check from check_cronjob_requirements() - Remove unused shutil import - Update docstring to clarify internal scheduler is used - Update tests to reflect new behavior and add coverage for all session modes (interactive, gateway, exec_ask) Fixes #1589 * test: add HERMES_EXEC_ASK coverage for cronjob requirements Adds missing test for the exec_ask session mode, complementing the cherry-picked fix from PR #1633. --------- Co-authored-by: Bartok9 <bartokmagic@proton.me>* add base support * fix: correct skill author attribution to youssefea * fix(tools): chunk long messages in send_message_tool before platform dispatch - Convert BasePlatformAdapter.truncate_message() to @staticmethod - Apply truncate_message() in _send_to_platform() with per-platform max lengths - Remove naive character split in _send_discord() - Attach media files to last chunk only for Telegram - Add regression tests for chunking and media placement --------- Co-authored-by: youssefea <youcefea99@gmail.com> Co-authored-by: llbn <46884939+llbn@users.noreply.github.com>Adds security.website_blocklist config for user-managed domain blocking across URL-capable tools. Enforced at the tool level (not monkey-patching) so it's safe and predictable. - tools/website_policy.py: shared policy loader with domain normalization, wildcard support (*.tracking.example), shared file imports, and structured block metadata - web_extract: pre-fetch URL check + post-redirect recheck - web_crawl: pre-crawl URL check + per-page URL recheck - browser_navigate: pre-navigation URL check - Blocked responses include blocked_by_policy metadata so the agent can explain exactly what was denied Config: security: website_blocklist: enabled: true domains: ["evil.com", "*.tracking.example"] shared_files: ["team-blocklist.txt"] Salvaged from PR #1086 by @kshitijk4poor. Browser post-redirect checks deferred (browser_tool was fully rewritten since the PR branched). Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>Add DingTalk as a messaging platform using the dingtalk-stream SDK for real-time message reception via Stream Mode (no webhook needed). Replies are sent via session webhook using markdown format. Features: - Stream Mode connection (long-lived WebSocket, no public URL needed) - Text and rich text message support - DM and group chat support - Message deduplication with 5-minute window - Auto-reconnection with exponential backoff - Session webhook caching for reply routing Configuration: export DINGTALK_CLIENT_ID=your-app-key export DINGTALK_CLIENT_SECRET=your-app-secret # or in config.yaml: platforms: dingtalk: enabled: true extra: client_id: your-app-key client_secret: your-app-secret Files: - gateway/platforms/dingtalk.py (340 lines) — adapter implementation - gateway/config.py — add DINGTALK to Platform enum - gateway/run.py — add DingTalk to _create_adapter - hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS - hermes_cli/tools_config.py — add dingtalk to PLATFORMS - tests/gateway/test_dingtalk.py — 21 tests* feat(gateway): add DingTalk platform adapter Add DingTalk as a messaging platform using the dingtalk-stream SDK for real-time message reception via Stream Mode (no webhook needed). Replies are sent via session webhook using markdown format. Features: - Stream Mode connection (long-lived WebSocket, no public URL needed) - Text and rich text message support - DM and group chat support - Message deduplication with 5-minute window - Auto-reconnection with exponential backoff - Session webhook caching for reply routing Configuration: export DINGTALK_CLIENT_ID=your-app-key export DINGTALK_CLIENT_SECRET=your-app-secret # or in config.yaml: platforms: dingtalk: enabled: true extra: client_id: your-app-key client_secret: your-app-secret Files: - gateway/platforms/dingtalk.py (340 lines) — adapter implementation - gateway/config.py — add DINGTALK to Platform enum - gateway/run.py — add DingTalk to _create_adapter - hermes_cli/config.py — add env vars to _EXTRA_ENV_KEYS - hermes_cli/tools_config.py — add dingtalk to PLATFORMS - tests/gateway/test_dingtalk.py — 21 tests * docs: add Alibaba Cloud and DingTalk to setup wizard and docs Wire Alibaba Cloud (DashScope) into hermes setup and hermes model provider selection flows. Add DingTalk env vars to documentation. Changes: - setup.py: Add Alibaba Cloud as provider choice (index 11) with DASHSCOPE_API_KEY prompt and model studio link - main.py: Add alibaba to provider_labels, providers list, and model flow dispatch - environment-variables.md: Add DASHSCOPE_API_KEY, DINGTALK_CLIENT_ID, DINGTALK_CLIENT_SECRET, and alibaba to HERMES_INFERENCE_PROVIDERfetch_nous_models() uses keyword-only parameters (the * separator in its signature), but models.py called it with positional args and in the wrong order (api_key first, base_url second). This always raised TypeError, silently caught by except Exception: pass. Result: Nous provider model list was completely broken — /model autocomplete and provider_model_ids('nous') always fell back to the static model catalog instead of fetching live models.1. sms.py: Replace per-send aiohttp.ClientSession with a persistent session created in connect() and closed in disconnect(). Each outbound SMS no longer pays the TCP+TLS handshake cost. Falls back to a temporary session if the persistent one isn't available. 2. matrix.py: Use proper MIME types (image/png, audio/ogg, video/mp4) instead of bare category words (image, audio, video). The gateway's media processing checks startswith('image/') and startswith('audio/') so bare words caused Matrix images to skip vision enrichment and Matrix audio to skip transcription. Now extracts the actual MIME type from the nio event's content info when available.MDX v2+ interprets curly braces in regular markdown as JSX expressions. The headings 'GET /v1/responses/{id}' and 'DELETE /v1/responses/{id}' caused a ReferenceError during Docusaurus static site generation because 'id' is not a defined JavaScript variable. Escaped with backslashes. Co-authored-by: Test <test@test.com>Each configured MCP server now registers as its own toolset in TOOLSETS (e.g. TOOLSETS['github'] = {tools: ['mcp_github_list_files', ...]}), making raw server names resolvable in platform_toolsets overrides. Previously MCP tools were only injected into hermes-* umbrella toolsets, so gateway sessions using raw toolset names like ['terminal', 'github'] in platform_toolsets couldn't resolve MCP tools. Skips server names that collide with built-in toolsets. Also handles idempotent reloads (syncs toolsets even when no new servers connect). Inspired by PR #1876 by @kshitijk4poor. Adds 2 tests (standalone toolset creation + built-in collision guard).find_one is being deprecated. Primary lookup now uses get() with a deterministic sandbox name (hermes-{task_id}). A legacy fallback via list(labels=...) ensures sandboxes created before this migration are still resumable.Authored by Lovre Pešut (rovle). Migrates from deprecated find_one(labels=...) to get(sandbox_name) with deterministic naming (hermes-{task_id}), plus legacy fallback via list(labels=...) for pre-migration sandboxes.Custom endpoints (LM Studio, Ollama, vLLM, llama.cpp) silently fall back to 2M tokens when /v1/models doesn't include context_length. Adds _query_local_context_length() which queries server-specific APIs: - LM Studio: /api/v1/models (max_context_length + loaded instances) - Ollama: /api/show (model_info + num_ctx parameters) - llama.cpp: /props (n_ctx from default_generation_settings) - vLLM: /v1/models/{model} (max_model_len) Prefers loaded instance context over max (e.g., 122K loaded vs 1M max). Results are cached via save_context_length() to avoid repeated queries. Also fixes detect_local_server_type() misidentifying LM Studio as Ollama (LM Studio returns 200 for /api/tags with an error body).When an API call fails, the error output now shows the provider name, model, and endpoint URL so users can immediately identify which service rejected their request. Auth errors (401/403) get actionable guidance: check key validity, model access, and OpenRouter credits link. Before: 'API call failed (attempt 1/3): PermissionDeniedError' After: 'API call failed (attempt 1/3): PermissionDeniedError Provider: openrouter Model: anthropic/claude-sonnet-4 Endpoint: https://openrouter.ai/api/v1 Your API key was rejected by the provider. Check: • Is the key valid? Run: hermes setup • Does your account have access to anthropic/claude-sonnet-4? • Check credits: https://openrouter.ai/settings/credits'Bare strings like "image", "audio", "document" were appended to media_types, but downstream run.py checks mtype.startswith("image/") and mtype.startswith("audio/"), which never matched. This caused all Mattermost file attachments to be silently dropped from vision/STT processing. Use the actual MIME type from file_info instead.When the model produces malformed JSON in tool call arguments, the agent loop was setting args={} and dispatching the tool anyway, wasting an iteration and producing a confusing downstream error. Now the error is returned directly as the tool result so the model can retry with valid JSON. Co-authored-by: alireza78a <alireza78.crypto@gmail.com>The MarkdownV2 format_message conversion left unescaped ( ) { } in edge cases where placeholder processing didn't cover them (e.g. partial link matches, URLs with parens). This caused Telegram to reject the message with 'character ( is reserved and must be escaped' and fall back to plain text — losing all formatting. Added a safety-net pass (step 12) after placeholder restoration that escapes any remaining bare ( ) { } outside code blocks and valid MarkdownV2 link syntax.auth_type was "***" instead of "api_key" and api_key_env_vars was ("OPEN...",) instead of ("OPENCODE_GO_API_KEY",). This was introduced in35d948b6when a secret redaction tool masked these values during the Kilo Code provider commit. OpenCode Go provider was completely broken as a result.CRUD + actions for cron jobs on the existing API server (port 8642): GET /api/jobs — list jobs POST /api/jobs — create job GET /api/jobs/{id} — get job PATCH /api/jobs/{id} — update job DELETE /api/jobs/{id} — delete job POST /api/jobs/{id}/pause — pause job POST /api/jobs/{id}/resume — resume job POST /api/jobs/{id}/run — trigger immediate run All endpoints use existing API_SERVER_KEY auth. Job ID format validated (12 hex chars). Logic ported from PR #2111 by nock4, adapted from FastAPI to aiohttp on the existing API server.imap.uid('search') can return data=[] when the mailbox is empty or has no matching messages. Accessing data[0] without checking len first raises IndexError: list index out of range. Fixed at both call sites in gateway/platforms/email.py: - Line 233 (connect): ALL search on startup - Line 298 (fetch): UNSEEN search in the polling loop Closes #2137Add hermes mcp add/remove/list/test/configure CLI for managing MCP server connections interactively. Discovery-first 'add' flow connects, discovers tools, and lets users select which to enable via curses checklist. Add OAuth 2.1 PKCE authentication for MCP HTTP servers (RFC 7636). Supports browser-based and manual (headless) authorization, token caching with 0600 permissions, automatic refresh. Zero external deps. Add ${ENV_VAR} interpolation in MCP server config values, resolved from os.environ + ~/.hermes/.env at load time. Core OAuth module from PR #2021 by @imnotdev25. CLI and mcp_tool wiring rewritten against current main. Closes #497, #690.Reads auxiliary.vision.timeout from config.yaml (default: 30s) and passes it to async_call_llm. Useful for slow local vision models that need more than 30 seconds. Setting is in config.yaml (not .env) since it's not a secret: auxiliary: vision: timeout: 120 Based on PR #2306. Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>When a session expires (daily schedule or idle timeout) and is automatically reset, send a notification to the user explaining what happened: ◐ Session automatically reset (inactive for 24h). Conversation history cleared. Use /resume to browse and restore a previous session. Adjust reset timing in config.yaml under session_reset. Notifications are suppressed when: - The expired session had no activity (no tokens used) - The platform is excluded (api_server, webhook by default) - notify: false in config Changes: - session.py: _should_reset() returns reason string ('idle'/'daily') instead of bool; SessionEntry gains auto_reset_reason and reset_had_activity fields; old entry's total_tokens checked - config.py: SessionResetPolicy gains notify (bool, default: true) and notify_exclude_platforms (default: api_server, webhook) - run.py: sends notification via adapter.send() before processing the user's message, with activity + platform checks - 13 new tests Config (config.yaml): session_reset: notify: true notify_exclude_platforms: [api_server, webhook]Three bugs in gateway session hygiene pre-compression caused 'Session too large' errors for ~200K context models like GLM-5-turbo on z.ai: 1. Gateway hygiene called get_model_context_length(model) without passing config_context_length, provider, or base_url — so user overrides like model.context_length: 180000 were ignored, and provider-aware detection (models.dev, z.ai endpoint) couldn't fire. The agent's own compressor correctly passed all three (run_agent.py line 1038). 2. The 1.4x safety factor on rough token estimates pushed the compression threshold above the model's actual context limit: 200K * 0.85 * 1.4 = 238K > 200K (model limit) So hygiene never compressed, sessions grew past the limit, and the API rejected the request. 3. Same issue for the warn threshold: 200K * 0.95 * 1.4 = 266K. Fix: - Read model.context_length, provider, and base_url from config.yaml (same as run_agent.py does) and pass them to get_model_context_length() - Resolve provider/base_url from runtime when not in config - Cap the 1.4x-adjusted compress threshold at 95% of context_length - Cap the 1.4x-adjusted warn threshold at context_length Affects: z.ai GLM-5/GLM-5-turbo, any ~200K or smaller context model where the 1.4x factor would push 85% above 100%. Ref: Discord report from Ddox — glm-5-turbo on z.ai coding planPath('~/.hermes/image.png').is_file() returns False because Path doesn't expand tilde. This caused the tool to fall through to URL validation, which also failed, producing a confusing error: 'Invalid image source. Provide an HTTP/HTTPS URL or a valid local file path.' Fix: use os.path.expanduser() before constructing the Path object. Added two tests for tilde expansion (success and nonexistent file).* feat(config): support ${ENV_VAR} substitution in config.yaml * fix: extend env var expansion to CLI and gateway config loaders The original PR (#2680) only wired _expand_env_vars into load_config(), which is used by 'hermes tools' and 'hermes setup'. The two primary config paths were missed: - load_cli_config() in cli.py (interactive CLI) - Module-level _cfg in gateway/run.py (gateway — bridges api_keys to env vars) Also: - Remove redundant 'import re' (already imported at module level) - Add missing blank lines between top-level functions (PEP 8) - Add tests for load_cli_config() expansion --------- Co-authored-by: teyrebaz33 <hakanerten02@hotmail.com>resolve_provider('custom') was silently returning 'openrouter', causing users who set provider: custom in config.yaml to unknowingly route through OpenRouter instead of their local/custom endpoint. The display showed 'via openrouter' even when the user explicitly chose custom. Changes: - auth.py: Split the conditional so 'custom' returns 'custom' as-is - runtime_provider.py: _resolve_named_custom_runtime now returns provider='custom' instead of 'openrouter' - runtime_provider.py: _resolve_openrouter_runtime returns provider='custom' when that was explicitly requested - Add 'no-key-required' placeholder for keyless local servers - Update existing test + add 5 new tests covering the fix Fixes #2562New documentation for features that existed in code but had no docs: New page: - context-references.md: Full docs for @-syntax inline context injection (@file:, @folder:, @diff, @staged, @git:, @url:) with line ranges, CLI autocomplete, size limits, sensitive path blocking, and error handling configuration.md additions: - Environment variable substitution: ${VAR_NAME} syntax in config.yaml with expansion, fallback, and multi-reference support - Gateway streaming: Progressive token delivery on messaging platforms via message editing (StreamingConfig: enabled, transport, edit_interval, buffer_threshold, cursor) with platform support matrix - Web search backends: Three providers (Firecrawl, Parallel, Tavily) with web.backend config key, capability matrix, auto-detection from API keys, self-hosted Firecrawl, and Parallel search modes security.md additions: - SSRF protection: Always-on URL validation blocking private networks, loopback, link-local, CGNAT, cloud metadata hostnames, with fail-closed DNS and redirect chain re-validation - Tirith pre-exec security scanning: Content-level command scanning for homograph URLs, pipe-to-interpreter, terminal injection with auto-install, SHA-256/cosign verification, config options, and fail-open/fail-closed modes sessions.md addition: - Auto-generated session titles: Background LLM-powered title generation after first exchange creating-skills.md additions: - Conditional skill activation: requires_toolsets, requires_tools, fallback_for_toolsets, fallback_for_tools frontmatter fields with matching logic and use cases - Environment variable requirements: required_environment_variables frontmatter for automatic env passthrough to sandboxed execution, plus terminal.env_passthrough user config