fix(codex): handle reasoning-only responses and replay path (#2070)

* fix(codex): treat reasoning-only responses as incomplete, not stop When a Codex Responses API response contains only reasoning items (encrypted thinking state) with no message text or tool calls, the _normalize_codex_response method was setting finish_reason='stop'. This sent the response into the empty-content retry loop, which burned 3 retries and then failed — exactly the pattern Nester reported in Discord. Two fixes: 1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw non-empty but no final_text) now get finish_reason='incomplete', routing them to the Codex continuation path instead of the retry loop. 2. Incomplete handling: also checks for codex_reasoning_items when deciding whether to preserve an interim message, so encrypted reasoning state is not silently dropped when there is no visible reasoning text. Adds 4 regression tests covering: - Unit: reasoning-only → incomplete, reasoning+content → stop - E2E: reasoning-only → continuation → final answer succeeds - E2E: encrypted reasoning items preserved in interim messages * fix(codex): ensure reasoning items have required following item in API input Follow-up to the reasoning-only response fix. Three additional issues found by tracing the full replay path: 1. _chat_messages_to_responses_input: when a reasoning-only interim message was converted to Responses API input, the reasoning items were emitted as the last items with no following item. The Responses API requires a following item after each reasoning item (otherwise: 'missing_following_item' error, as seen in OpenHands #11406). Now emits an empty assistant message as the required following item when content is empty but reasoning items were added. 2. Duplicate detection: two consecutive reasoning-only incomplete messages with identical empty content/reasoning but different encrypted codex_reasoning_items were incorrectly treated as duplicates, silently dropping the second response's reasoning state. Now includes codex_reasoning_items in the duplicate comparison. 3. Added tests for both the API input conversion path and the duplicate detection edge case. Research context: verified against OpenCode (uses Vercel AI SDK, no retry loop so avoids the issue), Clawdbot (drops orphaned reasoning blocks entirely), and OpenHands (hit the missing_following_item error). Our approach preserves reasoning continuity while satisfying the API constraint. --------- Co-authored-by: Test <test@test.com>
2026-03-19 10:34:44 -07:00 · 2026-03-19 10:34:44 -07:00 · e84d952dc0
commit e84d952dc0
parent 388130a122
2 changed files with 236 additions and 1 deletions
--- a/run_agent.py
+++ b/run_agent.py
@ -2356,13 +2356,22 @@ class AIAgent:
                    # Replay encrypted reasoning items from previous turns
                    # so the API can maintain coherent reasoning chains.
                    codex_reasoning = msg.get("codex_reasoning_items")
+                    has_codex_reasoning = False
                    if isinstance(codex_reasoning, list):
                        for ri in codex_reasoning:
                            if isinstance(ri, dict) and ri.get("encrypted_content"):
                                items.append(ri)
+                                has_codex_reasoning = True

                    if content_text.strip():
                        items.append({"role": "assistant", "content": content_text})
+                    elif has_codex_reasoning:
+                        # The Responses API requires a following item after each
+                        # reasoning item (otherwise: missing_following_item error).
+                        # When the assistant produced only reasoning with no visible
+                        # content, emit an empty assistant message as the required
+                        # following item.
+                        items.append({"role": "assistant", "content": ""})

                    tool_calls = msg.get("tool_calls")
                    if isinstance(tool_calls, list):
@ -2804,6 +2813,14 @@ class AIAgent:
            finish_reason = "tool_calls"
        elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
            finish_reason = "incomplete"
+        elif reasoning_items_raw and not final_text:
+            # Response contains only reasoning (encrypted thinking state) with
+            # no visible content or tool calls.  The model is still thinking and
+            # needs another turn to produce the actual answer.  Marking this as
+            # "stop" would send it into the empty-content retry loop which burns
+            # 3 retries then fails — treat it as incomplete instead so the Codex
+            # continuation path handles it correctly.
+            finish_reason = "incomplete"
        else:
            finish_reason = "stop"
        return assistant_message, finish_reason
@ -6214,15 +6231,24 @@ class AIAgent:
                    interim_msg = self._build_assistant_message(assistant_message, finish_reason)
                    interim_has_content = bool((interim_msg.get("content") or "").strip())
                    interim_has_reasoning = bool(interim_msg.get("reasoning", "").strip()) if isinstance(interim_msg.get("reasoning"), str) else False
+                    interim_has_codex_reasoning = bool(interim_msg.get("codex_reasoning_items"))

-                    if interim_has_content or interim_has_reasoning:
+                    if interim_has_content or interim_has_reasoning or interim_has_codex_reasoning:
                        last_msg = messages[-1] if messages else None
+                        # Duplicate detection: two consecutive incomplete assistant
+                        # messages with identical content AND reasoning are collapsed.
+                        # For reasoning-only messages (codex_reasoning_items differ but
+                        # visible content/reasoning are both empty), we also compare
+                        # the encrypted items to avoid silently dropping new state.
+                        last_codex_items = last_msg.get("codex_reasoning_items") if isinstance(last_msg, dict) else None
+                        interim_codex_items = interim_msg.get("codex_reasoning_items")
                        duplicate_interim = (
                            isinstance(last_msg, dict)
                            and last_msg.get("role") == "assistant"
                            and last_msg.get("finish_reason") == "incomplete"
                            and (last_msg.get("content") or "") == (interim_msg.get("content") or "")
                            and (last_msg.get("reasoning") or "") == (interim_msg.get("reasoning") or "")
+                            and last_codex_items == interim_codex_items
                        )
                        if not duplicate_interim:
                            messages.append(interim_msg)