fix: audit fixes — 5 bugs found and resolved

Thorough code review found 5 issues across run_agent.py, cli.py, and gateway/: 1. CRITICAL — Gateway stream consumer task never started: stream_consumer_holder was checked BEFORE run_sync populated it. Fixed with async polling pattern (same as track_agent). 2. MEDIUM-HIGH — Streaming fallback after partial delivery caused double-response: if streaming failed after some tokens were delivered, the fallback would re-deliver the full response. Now tracks deltas_were_sent and only falls back when no tokens reached consumers yet. 3. MEDIUM — Codex mode lost on_first_delta spinner callback: _run_codex_stream now accepts on_first_delta parameter, fires it on first text delta. Passed through from _interruptible_streaming_api_call via _codex_on_first_delta instance attribute. 4. MEDIUM — CLI close-tag after-text bypassed tag filtering: text after a reasoning close tag was sent directly to _emit_stream_text, skipping open-tag detection. Now routes through _stream_delta for full filtering. 5. LOW — Removed 140 lines of dead code: old _streaming_api_call method (superseded by _interruptible_streaming_api_call). Updated 13 tests in test_run_agent.py and test_openai_client_lifecycle.py to use the new method name and signature. 4573 tests passing.
2026-03-16 06:35:46 -07:00 · 2026-03-16 06:35:46 -07:00 · 8e07f9ca56
commit 8e07f9ca56
parent 99369b926c
5 changed files with 75 additions and 176 deletions
--- a/cli.py
+++ b/cli.py
@ -1474,9 +1474,10 @@ class HermesCLI:
                    self._in_reasoning_block = False
                    after = self._stream_prefilt[idx + len(tag):]
                    self._stream_prefilt = ""
-                    # Process remaining text after close tag
+                    # Process remaining text after close tag through full
+                    # filtering (it could contain another open tag)
                    if after:
-                        self._emit_stream_text(after)
+                        self._stream_delta(after)
                    return
            # Still inside reasoning block — keep only the tail that could
            # be a partial close tag prefix (save memory on long blocks).