The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.
Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
multimodal format to Responses API format:
- {type: 'text'} → {type: 'input_text'}
- {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)
Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples
Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.
Tests: 2459 passed.
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| auxiliary_client.py | ||
| context_compressor.py | ||
| display.py | ||
| insights.py | ||
| model_metadata.py | ||
| prompt_builder.py | ||
| prompt_caching.py | ||
| redact.py | ||
| skill_commands.py | ||
| trajectory.py | ||