BrowserUse_and_ComputerUse_.../tests
Teknium 57be18c026
feat: smart approvals + /stop command (inspired by OpenAI Codex)
* feat: smart approvals — LLM-based risk assessment for dangerous commands

Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.

Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).

Config (config.yaml):
  approvals:
    mode: manual   # manual (default), smart, off

Modes:
- manual — current behavior, always prompt the user
- smart  — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
           or ESCALATE (fall through to manual prompt)
- off    — skip all approval prompts (equivalent to --yolo)

When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.

The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.

* feat: make smart approval model configurable via config.yaml

Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).

Config:
  auxiliary:
    approval:
      provider: auto
      model: ''        # fast/cheap model recommended
      base_url: ''
      api_key: ''

Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.

* feat: add /stop command to kill all background processes

Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.

Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.

Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.
2026-03-16 06:20:11 -07:00
..
acp feat(acp): support slash commands in ACP adapter (#1532) 2026-03-16 05:19:36 -07:00
agent feat(tools): centralize tool emoji metadata in registry + skin integration 2026-03-15 20:21:21 -07:00
cron fix(cron): support per-job runtime overrides 2026-03-14 22:22:31 -07:00
fakes fix: streaming tool call parsing, error handling, and fake HA state mutation 2026-03-14 14:27:20 +03:00
gateway fix(privacy): skip PII redaction on Discord/Slack (mentions need real IDs) 2026-03-16 05:58:34 -07:00
hermes_cli feat: smart approvals + /stop command (inspired by OpenAI Codex) 2026-03-16 06:20:11 -07:00
honcho_integration refactor(honcho): remove local memory mode 2026-03-12 16:23:34 -04:00
integration test(voice): add integration tests with real NaCl crypto and Opus codec 2026-03-15 05:20:17 -07:00
skills fix: persist google oauth pkce for headless auth 2026-03-14 22:11:34 -07:00
tools test: fake minisweagent for docker cwd mount regressions 2026-03-16 05:40:05 -07:00
__init__.py A bit of restructuring for simplicity and organization 2025-10-01 23:29:25 +00:00
conftest.py feat: add direct endpoint overrides for auxiliary and delegation 2026-03-14 21:11:37 -07:00
run_interrupt_test.py fix(honcho): isolate session routing for multi-user gateway (#1500) 2026-03-16 00:23:47 -07:00
test_413_compression.py feat: improve context compaction handoff summaries (#1273) 2026-03-14 02:33:31 -07:00
test_860_dedup.py fix: eliminate 3x SQLite message duplication in gateway sessions (#860) 2026-03-10 15:22:44 -07:00
test_agent_loop.py fix: salvage gateway dedup and executor cleanup from PR #993 2026-03-14 11:03:20 -07:00
test_agent_loop_tool_calling.py fix: skip hanging tests + add global test timeout 2026-03-12 01:23:28 -07:00
test_agent_loop_vllm.py test: restore vllm integration coverage and add dict-args regression 2026-03-15 08:02:29 -07:00
test_anthropic_adapter.py Merge origin/main into hermes/hermes-daa73839 2026-03-14 23:44:47 -07:00
test_anthropic_oauth_flow.py fix: preflight Anthropic auth and prefer Claude store 2026-03-14 19:38:55 -07:00
test_anthropic_provider_persistence.py fix: preflight Anthropic auth and prefer Claude store 2026-03-14 19:38:55 -07:00
test_api_key_providers.py fix: exclude Coding Plan-only models from Moonshot model selection 2026-03-14 20:42:30 -07:00
test_atomic_json_write.py test: cover atomic temp cleanup on interrupts 2026-03-14 22:31:51 -07:00
test_atomic_yaml_write.py test: cover atomic temp cleanup on interrupts 2026-03-14 22:31:51 -07:00
test_auth_codex_provider.py refactor(auth): transition Codex OAuth tokens to Hermes auth store 2026-03-01 19:59:24 -08:00
test_auth_nous_provider.py Fix nous refresh token rotation failure in case where api key mint/retrieval fails 2026-03-02 17:18:15 +11:00
test_auxiliary_config_bridge.py feat: add direct endpoint overrides for auxiliary and delegation 2026-03-14 21:11:37 -07:00
test_batch_runner_checkpoint.py fix: sanitize chat payloads and provider precedence 2026-03-13 23:59:12 -07:00
test_cli_approval_ui.py fix(cli): repair dangerous command approval UI 2026-03-14 11:57:44 -07:00
test_cli_init.py fix: initialize CLI voice state for single-query mode 2026-03-14 06:31:32 -07:00
test_cli_interrupt_subagent.py fix: use session_key instead of chat_id for adapter interrupt lookups 2026-03-12 08:35:45 -07:00
test_cli_loading_indicator.py fix(cli): add loading indicators for slow slash commands 2026-03-10 17:31:00 -07:00
test_cli_mcp_config_watch.py fix: auto-reload MCP tools when mcp_servers config changes without restart (#1474) 2026-03-15 19:03:34 -07:00
test_cli_model_command.py feat: auto-detect provider when switching models via /model (#1506) 2026-03-16 04:34:45 -07:00
test_cli_new_session.py fix(cli): make /new, /reset, and /clear start real fresh sessions 2026-03-13 21:53:54 -07:00
test_cli_plan_command.py fix: save /plan output in workspace (#1381) 2026-03-14 21:28:51 -07:00
test_cli_prefix_matching.py fix(test): add missing session_id and _pending_input to _make_cli fixture 2026-03-14 10:33:58 -07:00
test_cli_preloaded_skills.py feat: preload CLI skills on launch (#1359) 2026-03-14 19:33:59 -07:00
test_cli_provider_resolution.py fix(custom-endpoint): verify /models and suggest working /v1 base URL (#1480) 2026-03-15 20:09:50 -07:00
test_cli_retry.py test: lock retry replacement semantics 2026-03-14 21:19:22 -07:00
test_cli_secret_capture.py feat: secure skill env setup on load (core #688) 2026-03-13 03:14:04 -07:00
test_cli_skin_integration.py fix(test): add missing voice state attrs to CLI stub in skin tests 2026-03-14 15:00:45 +03:00
test_cli_status_bar.py feat: add persistent CLI status bar and usage details (#1522) 2026-03-16 04:42:48 -07:00
test_codex_execution_paths.py feat: simple fallback model for provider resilience 2026-03-08 20:22:33 -07:00
test_codex_models.py fix: add codex forward-compat model listing 2026-03-13 21:34:01 -07:00
test_dict_tool_call_args.py test: restore vllm integration coverage and add dict-args regression 2026-03-15 08:02:29 -07:00
test_display.py fix: add upstream guard for non-dict function_args + tests for build_tool_preview 2026-03-09 21:01:40 -07:00
test_evidence_store.py feat: add OSS Security Forensics skill (Skills Hub) (#1482) 2026-03-15 21:59:53 -07:00
test_external_credential_detection.py refactor(auth): transition Codex OAuth tokens to Hermes auth store 2026-03-01 19:59:24 -08:00
test_fallback_model.py refactor: route main agent client + fallback through centralized router 2026-03-11 21:38:29 -07:00
test_file_permissions.py security: enforce 0600/0700 file permissions on sensitive files (inspired by openclaw) 2026-03-09 02:19:32 -07:00
test_flush_memories_codex.py fix: update all test mocks for call_llm migration 2026-03-11 21:06:54 -07:00
test_hermes_state.py fix(cli): accept session ID prefixes for session actions 2026-03-15 04:01:56 -07:00
test_honcho_client_config.py fix(honcho): auto-enable when API key is present 2026-03-01 03:12:37 -05:00
test_insights.py feat: add persistent CLI status bar and usage details (#1522) 2026-03-16 04:42:48 -07:00
test_interactive_interrupt.py fix(honcho): isolate session routing for multi-user gateway (#1500) 2026-03-16 00:23:47 -07:00
test_interrupt_propagation.py fix: use session_key instead of chat_id for adapter interrupt lookups 2026-03-12 08:35:45 -07:00
test_managed_server_tool_support.py test: fix stale CI assumptions in parser and quick-command coverage (#1236) 2026-03-13 21:56:12 -07:00
test_minisweagent_path.py fix: worktree-aware minisweagent path discovery + clean up requirements check (#1248) 2026-03-13 23:39:51 -07:00
test_model_provider_persistence.py fix: provider selection not persisting when switching via hermes model 2026-03-10 17:12:34 -07:00
test_model_tools.py test: strengthen assertions across 3 more test files (batch 2) 2026-03-05 18:46:30 -08:00
test_openai_client_lifecycle.py fix(honcho): isolate session routing for multi-user gateway (#1500) 2026-03-16 00:23:47 -07:00
test_personality_none.py feat(cli,gateway): add /personality none and custom personality support 2026-03-09 17:31:54 +03:00
test_provider_parity.py fix: convert anthropic image content blocks 2026-03-14 23:41:20 -07:00
test_quick_commands.py fix(gateway): support quick commands from GatewayConfig 2026-03-14 03:51:28 -07:00
test_real_interrupt_subagent.py fix(test): patch correct method in subagent interrupt test 2026-03-12 15:05:42 -04:00
test_reasoning_command.py fix: /reasoning command — add gateway support, fix display, persist settings (#1031) 2026-03-12 05:38:19 -07:00
test_redirect_stdout_issue.py fix: use session_key instead of chat_id for adapter interrupt lookups 2026-03-12 08:35:45 -07:00
test_resume_display.py feat: display previous messages when resuming a session in CLI 2026-03-08 17:45:45 -07:00
test_run_agent.py fix(honcho): isolate session routing for multi-user gateway (#1500) 2026-03-16 00:23:47 -07:00
test_run_agent_codex_responses.py fix: add missing Responses API parameters for Codex provider 2026-03-11 04:28:31 -07:00
test_runtime_provider_resolution.py fix: restore config-saved custom endpoint resolution 2026-03-14 20:58:12 -07:00
test_setup_model_selection.py fix(setup): remove dead code causing is_coding_plan NameError crash 2026-03-13 04:42:26 +03:00
test_timezone.py test(cron): add cross-timezone naive timestamp regression 2026-03-14 10:33:32 -07:00
test_tool_call_parsers.py fix: use non-greedy regex in DeepSeek V3 parser for multi-tool calls (#1300) 2026-03-14 06:19:28 -07:00
test_toolset_distributions.py test: add unit tests for 8 modules (batch 2) 2026-02-26 13:54:20 +03:00
test_toolsets.py fix: add missing Platform.SIGNAL to toolset mappings, update test + config docs 2026-03-09 23:27:19 -07:00
test_trajectory_compressor.py fix: harden trajectory compressor summary content handling 2026-03-14 11:03:25 -07:00
test_worktree.py fix: harden salvaged worktree include checks 2026-03-14 21:51:27 -07:00
test_worktree_security.py fix: harden salvaged worktree include checks 2026-03-14 21:51:27 -07:00