fix: audit fixes — 5 bugs found and resolved

Thorough code review found 5 issues across run_agent.py, cli.py, and gateway/:

1. CRITICAL — Gateway stream consumer task never started: stream_consumer_holder
   was checked BEFORE run_sync populated it. Fixed with async polling pattern
   (same as track_agent).

2. MEDIUM-HIGH — Streaming fallback after partial delivery caused double-response:
   if streaming failed after some tokens were delivered, the fallback would
   re-deliver the full response. Now tracks deltas_were_sent and only falls
   back when no tokens reached consumers yet.

3. MEDIUM — Codex mode lost on_first_delta spinner callback: _run_codex_stream
   now accepts on_first_delta parameter, fires it on first text delta. Passed
   through from _interruptible_streaming_api_call via _codex_on_first_delta
   instance attribute.

4. MEDIUM — CLI close-tag after-text bypassed tag filtering: text after a
   reasoning close tag was sent directly to _emit_stream_text, skipping
   open-tag detection. Now routes through _stream_delta for full filtering.

5. LOW — Removed 140 lines of dead code: old _streaming_api_call method
   (superseded by _interruptible_streaming_api_call). Updated 13 tests in
   test_run_agent.py and test_openai_client_lifecycle.py to use the new
   method name and signature.

4573 tests passing.
This commit is contained in:
teknium1 2026-03-16 06:35:46 -07:00
parent 99369b926c
commit 8e07f9ca56
5 changed files with 75 additions and 176 deletions

5
cli.py
View file

@ -1474,9 +1474,10 @@ class HermesCLI:
self._in_reasoning_block = False
after = self._stream_prefilt[idx + len(tag):]
self._stream_prefilt = ""
# Process remaining text after close tag
# Process remaining text after close tag through full
# filtering (it could contain another open tag)
if after:
self._emit_stream_text(after)
self._stream_delta(after)
return
# Still inside reasoning block — keep only the tail that could
# be a partial close tag prefix (save memory on long blocks).