APEX/BrowserUse_and_ComputerUse_skills

Author	SHA1	Message	Date
0xbyt4	143cc68946	fix(test): add /voice to EXPECTED_COMMANDS set in test_commands.py	2026-03-14 14:27:20 +03:00
0xbyt4	46db7aeffd	fix: streaming tool call parsing, error handling, and fake HA state mutation - Fix Gemini streaming tool call merge bug: multiple tool calls with same index but different IDs are now parsed as separate calls instead of concatenating names (e.g. ha_call_serviceha_call_service) - Handle partial results in voice mode: show error and stop continuous mode when agent returns partial/failed results with empty response - Fix error display during streaming TTS: error messages are shown in full response box even when streaming box was already opened - Add duplicate sentence filter in TTS: skip near-duplicate sentences from LLM repetition - Fix fake HA server state mutation: turn_on/turn_off/set_temperature correctly update entity states; temperature sensor simulates change when thermostat is adjusted	2026-03-14 14:27:20 +03:00
0xbyt4	404123aea7	feat: add persistent voice mode status bar below input area Shows voice state (recording, transcribing, TTS/continuous toggles) as a persistent toolbar using prompt_toolkit ConditionalContainer.	2026-03-14 14:27:20 +03:00
0xbyt4	b00c5949fc	fix: suppress verbose logs during streaming TTS, improve hallucination filter, stop continuous mode on errors - Add _vprint() helper to suppress log output when stream_callback is active - Expand Whisper hallucination filter with multi-language phrases and regex pattern for repetitive text - Stop continuous voice mode when agent returns a failed result (e.g. 429 rate limit)	2026-03-14 14:26:55 +03:00
0xbyt4	3a1b35ed92	fix: voice mode race conditions, temp file leak, think tag parsing - Atomic check-and-set for _voice_recording flag with _voice_lock - Guard _voice_stop_and_transcribe against concurrent invocation - Remove premature flag clearing from Ctrl+R handler - Clean up temp WAV files in finally block (_play_via_tempfile) - Use buffer-level regex for <think> block filtering (handles chunked tags) - Prevent /voice on prompt accumulation on repeated calls - Include Groq in STT key error message	2026-03-14 14:26:55 +03:00
0xbyt4	7d4b4e95f1	feat: sync text display with TTS audio playback Move screen output from stream_callback to display_callback called by TTS consumer thread. Text now appears sentence-by-sentence in sync with audio instead of streaming ahead at LLM speed. Removes quiet_mode hack.	2026-03-14 14:26:55 +03:00
0xbyt4	a15fa85248	fix: catch OSError on sounddevice import in voice_mode.py Same PortAudio fix as tts_tool.py — sounddevice raises OSError when the native library is missing on CI runners.	2026-03-14 14:26:30 +03:00
0xbyt4	fd4f229eab	fix: catch OSError on sounddevice import for CI without PortAudio sounddevice raises OSError (not ImportError) when the PortAudio C library is missing. This broke test collection on CI runners that have the Python package installed but lack the native library.	2026-03-14 14:26:30 +03:00
0xbyt4	179d9e1a22	feat: add streaming sentence-by-sentence TTS via ElevenLabs Stream audio to speaker as the agent generates tokens instead of waiting for the full response. First sentence plays within ~1-2s of agent starting to respond. - run_agent: add stream_callback to run_conversation/chat, streaming path in _interruptible_api_call accumulates chunks into mock ChatCompletion while forwarding content deltas to callback - tts_tool: add stream_tts_to_speaker() with sentence buffering, think block filtering, markdown stripping, ElevenLabs pcm_24000 streaming to sounddevice OutputStream - cli: wire up streaming TTS pipeline in chat(), detect elevenlabs provider + sounddevice availability, skip batch TTS when streaming is active, signal stop on interrupt Falls back to batch TTS for Edge/OpenAI providers or when elevenlabs/sounddevice are not available. Zero impact on non-voice mode (callback defaults to None).	2026-03-14 14:26:30 +03:00
0xbyt4	d7425343ee	fix: fix voice recording stuck in continuous mode - Track submitted state locally instead of using racy qsize() check - Allow Ctrl+R to stop recording even while agent is running - Add double-start guard to prevent concurrent recording attempts	2026-03-14 14:26:30 +03:00
0xbyt4	dad865e920	fix: fix silence detection bugs and add Phase 4 voice mode features Fix 3 critical bugs in silence detection: - Micro-pause tolerance now tracks dip duration (not time since speech start) - Peak RMS check in stop() prevents discarding recordings with real speech - Reduced min_speech_duration from 0.5s to 0.3s for reliable speech confirmation Phase 4 features: configurable silence params, visual audio level indicator, voice system prompt, tool call audio cues, TTS interrupt, continuous mode auto-restart, interruptable playback via Popen tracking.	2026-03-14 14:26:30 +03:00
0xbyt4	32b033c11c	feat: add silence filter, hallucination guard, and continuous mode control - Skip silent recordings before STT call (RMS check in AudioRecorder.stop) - Filter known Whisper hallucinations ("Thank you.", "Bye." etc.) - Continuous mode: Ctrl+R starts loop, Ctrl+R during recording exits it - Wait for TTS to finish before auto-restart to avoid recording speaker - Silence timeout increased to 3s for natural pauses - Tests: hallucination filter, silent recording skip, real speech passthrough	2026-03-14 14:25:28 +03:00
0xbyt4	bfd9c97705	feat: add Phase 4 low-latency features for voice mode - Audio cues: beep on record start (880Hz), double beep on stop (660Hz) - Silence detection: auto-stop recording after 3s of silence (RMS-based) - Continuous mode: auto-restart recording after agent responds - Ctrl+R starts continuous mode, Ctrl+R during recording exits it - Waits for TTS to finish before restarting to avoid recording speaker - Tests: 7 new tests for beep generation and silence detection	2026-03-14 14:25:28 +03:00
0xbyt4	a69bd55b5a	fix: isolate GROQ_API_KEY in test_missing_stt_key test The test was failing because GROQ_API_KEY leaked from the environment. Now both VOICE_TOOLS_OPENAI_KEY and GROQ_API_KEY are removed to properly test the "no STT key" scenario.	2026-03-14 14:25:28 +03:00
0xbyt4	c23928d089	fix: improve voice mode robustness and add integration tests - Show TTS errors to user instead of silently logging - Improve markdown stripping: code blocks, URLs, links, horizontal rules - Fix stripping order: process markdown links before removing URLs - Add threading.Lock for voice state variables (cross-thread safety) - Add 14 CLI integration tests (markdown stripping, command parsing, thread safety) - Total: 47 voice-related tests	2026-03-14 14:25:28 +03:00
0xbyt4	37b01ab964	test: add transcription_tools tests for multi-provider STT - Provider resolution: OpenAI priority, Groq fallback, no keys - Model auto-correction: Groq corrects OpenAI models and vice versa - Success path: transcription, API errors, whitespace stripping - 12 new tests, 33 total voice-related tests	2026-03-14 14:25:28 +03:00
0xbyt4	ea5b89825a	fix: voice mode TTS playback and keybinding issues - Change record key from c-@ to c-r (Ctrl+R) for macOS compatibility - Add missing tempfile and time imports that caused silent TTS crash - Use MP3 output for CLI TTS playback (afplay doesn't handle OGG well) - Strip markdown formatting from text before sending to TTS - Remove duplicate transcript echo in voice pipeline	2026-03-14 14:25:28 +03:00
0xbyt4	ec32e9a540	feat: add Groq STT support and fix voice mode keybinding - Add multi-provider STT support (OpenAI > Groq fallback) in transcription_tools - Auto-correct model selection when provider doesn't support the configured model - Change voice record key from Ctrl+Space to Ctrl+R (macOS compatibility) - Fix duplicate transcript echo in voice pipeline - Add GROQ_API_KEY to .env.example	2026-03-14 14:25:28 +03:00
0xbyt4	1a6fbef8a9	feat: add voice mode with push-to-talk and TTS output for CLI Implements Issue #314 Phase 2 & 3: - /voice command to toggle voice mode (on/off/tts/status) - Ctrl+Space push-to-talk recording via sounddevice - Whisper STT transcription via existing transcription_tools - Optional TTS response playback via existing tts_tool - Visual indicators in prompt (recording/transcribing/voice) - 21 unit tests, all mocked (no real mic/API) - Optional deps: sounddevice, numpy (pip install hermes-agent[voice])	2026-03-14 14:25:28 +03:00
Teknium	1a857123b3	feat(skills): add optional telephony skill with Twilio, SMS, and AI calls (#1289 ) * feat: improve context compaction handoff summaries Adapt PR #916 onto current main by replacing the old context summary marker with a clearer handoff wrapper, updating the summarization prompt for resume-oriented summaries, and preserving the current call_llm-based compression path. * fix: clearer error when docker backend is unavailable * fix: preserve docker discovery in backend preflight Follow up on salvaged PR #940 by reusing find_docker() during the new availability check so non-PATH Docker Desktop installs still work. Add a regression test covering the resolved executable path. * test: make gateway async tests xdist-safe Replace sync test usage of asyncio.get_event_loop().run_until_complete() with asyncio.run() so tests do not depend on an ambient current event loop. Also create the email disconnect poll task inside a running loop. This fixes xdist/CI failures where workers have no current loop in MainThread. * feat(skills): add phone-calls skill for outbound AI voice calls Reformulated from core tool (PR #847 feedback) into a skill with a standalone helper script. No new dependencies — uses only Python stdlib. Two providers supported: - Bland.ai (default): simple setup, one API key - Vapi: flexible, better voice quality via ElevenLabs/Deepgram + Twilio Includes: - SKILL.md with full procedure, safety rules, provider docs, pitfalls - scripts/phone_call.py CLI helper (call, status, diagnose commands) * feat(skills): expand phone-calls into optional telephony skill Follow up on salvaged PR #965 by moving the capability into optional-skills and broadening it from outbound AI calling to a full telephony skill. Add Twilio number provisioning, env/state persistence, SMS/MMS, inbound SMS polling, Vapi import helpers, and a provider decision tree while keeping telephony out of core runtime code. * docs(skills): clarify Hermes TTS telephony workflow --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com> Co-authored-by: mormio <morganemoss@gmai.com>	2026-03-14 04:16:48 -07:00
Teknium	02752c83b4	Merge pull request #1287 from NousResearch/hermes/hermes-cc060dd9 fix(gateway): avoid slash-command crash with GatewayConfig	2026-03-14 04:13:56 -07:00
Teknium	a48ebc68f4	Merge pull request #1288 from NousResearch/hermes/hermes-de3d4e49-pr976 fix: reliably notify gateway users when updates finish	2026-03-14 04:13:13 -07:00
Teknium	b42ee3050e	Merge pull request #1290 from NousResearch/hermes/hermes-f48b210a fix(send_message): salvage and complete MEDIA delivery from #971	2026-03-14 04:12:54 -07:00
teknium1	5c9a84219d	fix: complete send_message MEDIA delivery salvage - prevent raw MEDIA tag leakage outside the gateway pipeline - make extract_media handle quoted/backticked paths and optional whitespace - send Telegram media natively with explicit error/warning handling - add regression tests for Telegram media dispatch and MEDIA parsing	2026-03-14 04:02:03 -07:00
quabug	50d6659392	fix: handle MEDIA tags in send_message tool for native file delivery The send_message tool's _send_telegram() sent MEDIA:<path> tags as literal text instead of delivering actual files. This fixes it by extracting MEDIA tags via BasePlatformAdapter.extract_media() and routing files to the appropriate Telegram Bot API method by extension. Changes: - send_message_tool: extract MEDIA tags and send files natively as photo/video/voice/audio/document based on file extension - send_message_tool: add per-file error handling and missing-file logging - send_message_tool: use cleaned text in fallback to avoid leaking tags - base.py extract_media: handle optional space after MEDIA: colon - base.py extract_media: strip surrounding backticks/quotes from paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 04:02:03 -07:00
Teknium	9525db913f	feat(skills): add X/Twitter xitter skill via upstream x-cli (#1285 ) * feat(skills): salvage xitter skill from PR #1065 Adapt the X/Twitter skill onto current main without vendoring an external CLI. Use upstream x-cli installation instructions, add a social-media category, and align credential/setup guidance with Hermes conventions. * docs(skills): explain X credential requirements in xitter skill Clarify why the official X flow needs five credentials and call out the setup/cost friction explicitly.	2026-03-14 04:00:27 -07:00
clabbe-bot	3126c60885	fix: notify gateway users when updates finish or fail	2026-03-14 03:59:05 -07:00
Teknium	cac238c2a3	Merge pull request #1286 from NousResearch/hermes/hermes-315847fd fix(patch): avoid corrupting pipe chars in v4a patch apply	2026-03-14 03:58:27 -07:00
teknium1	7e52e8eb54	fix(gateway): bridge quick commands into GatewayConfig runtime Follow-up on salvaged PR #975. Bridge quick_commands from config.yaml into load_gateway_config(), normalize non-dict quick command config at runtime, and add coverage for GatewayConfig round-trips plus config.yaml bridging. This makes the GatewayConfig quick-command fix complete for the real user-facing config path implicated by issue #973.	2026-03-14 03:57:25 -07:00
teknium1	96c250e538	test: cover pipe characters in v4a patch apply Add a regression test for apply_v4a_operations when read content contains a literal pipe character outside a line-number prefix.	2026-03-14 03:54:46 -07:00
stablegenius49	ce56b45514	fix(gateway): support quick commands from GatewayConfig	2026-03-14 03:51:28 -07:00
alireza78a	1182aeea00	fix(patch): use regex to detect line-number prefix to avoid corrupting pipe chars	2026-03-14 03:47:13 -07:00
Teknium	cf3dceafe1	Merge pull request #1284 from NousResearch/hermes/hermes-de3d4e49-pr964 fix: show effective model and provider in status	2026-03-14 03:42:16 -07:00
teknium1	b5a7e807d0	test: cover provider label formatting	2026-03-14 03:39:12 -07:00
luisv-1	c2c37ef158	Show configured model and provider in status output Made-with: Cursor	2026-03-14 03:35:37 -07:00
Teknium	2f8dbe4e77	Merge pull request #1283 from NousResearch/hermes/hermes-f48b210a fix(setup): salvage keep-current provider handling from #951	2026-03-14 03:26:44 -07:00
Teknium	95d49401ee	Merge pull request #1282 from NousResearch/hermes/hermes-cc060dd9 fix(cli): make TUI prompt and accent output skin-aware	2026-03-14 03:24:24 -07:00
StefanIsMe	26f8b790c9	fix(setup): persist provider when switching model endpoints	2026-03-14 03:21:46 -07:00
Teknium	7901d863dd	Merge pull request #1280 from NousResearch/hermes/hermes-de3d4e49-pr944 fix: make session log writes reuse shared atomic JSON helper	2026-03-14 03:15:52 -07:00
teknium1	e9a7441c9b	test: restore default event loop for sync tests	2026-03-14 03:14:34 -07:00
Wayne	41f22de20f	fix(cli): make TUI prompt and accent output skin-aware Salvaged from PR #932 by Wayne onto current main. Apply skin-aware prompt symbols and live prompt_toolkit color refresh, replace lingering hardcoded accent output with active-skin colors, keep ANSI-safe response rendering, preserve secret-capture and approval-prompt state handling, and add integration coverage for prompt state and style refresh behavior.	2026-03-14 03:12:52 -07:00
Teknium	b91cac7b4b	test: make gateway async tests xdist-safe (#1281 ) * feat: improve context compaction handoff summaries Adapt PR #916 onto current main by replacing the old context summary marker with a clearer handoff wrapper, updating the summarization prompt for resume-oriented summaries, and preserving the current call_llm-based compression path. * fix: clearer error when docker backend is unavailable * fix: preserve docker discovery in backend preflight Follow up on salvaged PR #940 by reusing find_docker() during the new availability check so non-PATH Docker Desktop installs still work. Add a regression test covering the resolved executable path. * test: make gateway async tests xdist-safe Replace sync test usage of asyncio.get_event_loop().run_until_complete() with asyncio.run() so tests do not depend on an ambient current event loop. Also create the email disconnect poll task inside a running loop. This fixes xdist/CI failures where workers have no current loop in MainThread. --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com>	2026-03-14 03:12:15 -07:00
Teknium	29312a23d9	Merge pull request #1279 from NousResearch/hermes/hermes-315847fd refactor: salvage adapter and CLI cleanup from PR #939	2026-03-14 03:10:01 -07:00
kshitij	0bb7ed1d95	refactor: salvage adapter and CLI cleanup from PR #939 Salvaged from PR #939 by kshitij. - deduplicate Discord slash command dispatch and local file send helpers - deduplicate Slack file uploads while preserving thread metadata - extract shared CLI session relative-time formatting - hoist browser PATH cleanup constants and throttle screenshot pruning - tidy small type and import cleanups	2026-03-14 03:07:11 -07:00
Teknium	f279bb004f	Merge pull request #1278 from NousResearch/hermes/hermes-f48b210a test: fix gateway async tests without implicit event loop	2026-03-14 02:57:47 -07:00
teknium1	cbbba87099	fix: reuse shared atomic session log helper	2026-03-14 02:56:13 -07:00
Teknium	6036793f60	fix: clearer docker backend preflight errors (#1276 ) * feat: improve context compaction handoff summaries Adapt PR #916 onto current main by replacing the old context summary marker with a clearer handoff wrapper, updating the summarization prompt for resume-oriented summaries, and preserving the current call_llm-based compression path. * fix: clearer error when docker backend is unavailable * fix: preserve docker discovery in backend preflight Follow up on salvaged PR #940 by reusing find_docker() during the new availability check so non-PATH Docker Desktop installs still work. Add a regression test covering the resolved executable path. --------- Co-authored-by: aydnOktay <xaydinoktay@gmail.com>	2026-03-14 02:53:02 -07:00
alireza78a	f685741481	fix(agent): use atomic write in _save_session_log to prevent data loss	2026-03-14 02:53:01 -07:00
teknium1	115dd17b3c	test: fix gateway async test event loop usage Use asyncio.run in sync tests that were relying on an implicit current event loop. This makes the gateway send-image and Slack connect tests pass reliably under Python 3.11+ and xdist workers.	2026-03-14 02:52:47 -07:00
Teknium	486cb772b8	Merge pull request #1275 from NousResearch/hermes/hermes-f48b210a feat(gateway): salvage reasoning hot reload from #938	2026-03-14 02:47:11 -07:00

... 5 6 7 8 9 ...

1924 commits