Root cause of aggressive gateway compression vs CLI: - CLI: single AIAgent persists across conversation, uses real API-reported prompt_tokens for compression decisions — accurate - Gateway: each message creates fresh AIAgent, token count discarded after, next message pre-check falls back to rough str(msg)//4 estimate which overestimates 30-50% on tool-heavy conversations Fix: - Add last_prompt_tokens field to SessionEntry — stores the actual API-reported prompt token count from the most recent agent turn - After run_conversation(), extract context_compressor.last_prompt_tokens and persist it via update_session() - Gateway pre-check now uses stored actual tokens when available (exact same accuracy as CLI), falling back to rough estimate with 1.4x safety factor only for the first message of a session This makes gateway compression behave identically to CLI compression for all turns after the first. Reported by TigerHix. |
||
|---|---|---|
| .. | ||
| platforms | ||
| __init__.py | ||
| channel_directory.py | ||
| config.py | ||
| delivery.py | ||
| hooks.py | ||
| mirror.py | ||
| pairing.py | ||
| run.py | ||
| session.py | ||
| status.py | ||
| sticker_cache.py | ||