fix: use session_key instead of chat_id for adapter interrupt lookups

* fix: use session_key instead of chat_id for adapter interrupt lookups

monitor_for_interrupt() in _run_agent was using source.chat_id to query
the adapter's has_pending_interrupt() and get_pending_message() methods.
But the adapter stores interrupt events under build_session_key(source),
which produces a different string (e.g. 'agent:main:telegram:dm' vs '123456').

This key mismatch meant the interrupt was never detected through the
adapter path, which is the only active interrupt path for all adapter-based
platforms (Telegram, Discord, Slack, etc.). The gateway-level interrupt
path (in dispatch_message) is unreachable because the adapter intercepts
the 2nd message in handle_message() before it reaches dispatch_message().

Result: sending a new message while subagents were running had no effect —
the interrupt was silently lost.

Fix: replace all source.chat_id references in the interrupt-related code
within _run_agent() with the session_key parameter, which matches the
adapter's storage keys.

Also adds regression tests verifying session_key vs chat_id consistency.

* debug: add file-based logging to CLI interrupt path

Temporary instrumentation to diagnose why message-based interrupts
don't seem to work during subagent execution. Logs to
~/.hermes/interrupt_debug.log (immune to redirect_stdout).

Two log points:
1. When Enter handler puts message into _interrupt_queue
2. When chat() reads it and calls agent.interrupt()

This will reveal whether the message reaches the queue and
whether the interrupt is actually fired.
This commit is contained in:
Teknium 2026-03-12 08:35:45 -07:00 committed by GitHub
parent 5c54128475
commit e004c094ea
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
9 changed files with 1045 additions and 9 deletions

View file

@ -3418,17 +3418,19 @@ class GatewayRunner:
# Monitor for interrupts from the adapter (new messages arriving)
async def monitor_for_interrupt():
adapter = self.adapters.get(source.platform)
if not adapter:
if not adapter or not session_key:
return
chat_id = source.chat_id
while True:
await asyncio.sleep(0.2) # Check every 200ms
# Check if adapter has a pending interrupt for this session
if hasattr(adapter, 'has_pending_interrupt') and adapter.has_pending_interrupt(chat_id):
# Check if adapter has a pending interrupt for this session.
# Must use session_key (build_session_key output) — NOT
# source.chat_id — because the adapter stores interrupt events
# under the full session key.
if hasattr(adapter, 'has_pending_interrupt') and adapter.has_pending_interrupt(session_key):
agent = agent_holder[0]
if agent:
pending_event = adapter.get_pending_message(chat_id)
pending_event = adapter.get_pending_message(session_key)
pending_text = pending_event.text if pending_event else None
logger.debug("Interrupt detected from adapter, signaling agent...")
agent.interrupt(pending_text)
@ -3445,10 +3447,11 @@ class GatewayRunner:
result = result_holder[0]
adapter = self.adapters.get(source.platform)
# Get pending message from adapter if interrupted
# Get pending message from adapter if interrupted.
# Use session_key (not source.chat_id) to match adapter's storage keys.
pending = None
if result and result.get("interrupted") and adapter:
pending_event = adapter.get_pending_message(source.chat_id)
pending_event = adapter.get_pending_message(session_key) if session_key else None
if pending_event:
pending = pending_event.text
elif result.get("interrupt_message"):
@ -3460,8 +3463,8 @@ class GatewayRunner:
# Clear the adapter's interrupt event so the next _run_agent call
# doesn't immediately re-trigger the interrupt before the new agent
# even makes its first API call (this was causing an infinite loop).
if adapter and hasattr(adapter, '_active_sessions') and source.chat_id in adapter._active_sessions:
adapter._active_sessions[source.chat_id].clear()
if adapter and hasattr(adapter, '_active_sessions') and session_key and session_key in adapter._active_sessions:
adapter._active_sessions[session_key].clear()
# Don't send the interrupted response to the user — it's just noise
# like "Operation interrupted." They already know they sent a new