fix: context counter shows cached token count in status bar

Anthropic prompt caching splits input into cache_read_input_tokens, cache_creation_input_tokens, and non-cached input_tokens. The context counter only read input_tokens (non-cached portion), showing ~3 tokens instead of the real ~18K total. Now includes cached portions for Anthropic native provider only — other providers (OpenAI, OpenRouter, Codex) already include cached tokens in their prompt_tokens field. Before: 3/200K | 0% After: 17.7K/200K | 9%
2026-03-17 04:39:11 +03:00 · 2026-03-17 04:39:11 +03:00 · 8d0a96a8bf
commit 8d0a96a8bf
parent cfa87e77a9
2 changed files with 124 additions and 0 deletions
--- a/run_agent.py
+++ b/run_agent.py
@ -5256,6 +5256,15 @@ class AIAgent:
                    if hasattr(response, 'usage') and response.usage:
                        if self.api_mode in ("codex_responses", "anthropic_messages"):
                            prompt_tokens = getattr(response.usage, 'input_tokens', 0) or 0
+                            if self.api_mode == "anthropic_messages":
+                                # Anthropic splits input into cache_read + cache_creation
+                                # + non-cached input_tokens. Without adding the cached
+                                # portions, the context bar shows only the tiny non-cached
+                                # portion (e.g. 3 tokens) instead of the real total (~18K).
+                                # Other providers (OpenAI/Codex) already include cached
+                                # tokens in their input_tokens/prompt_tokens field.
+                                prompt_tokens += getattr(response.usage, 'cache_read_input_tokens', 0) or 0
+                                prompt_tokens += getattr(response.usage, 'cache_creation_input_tokens', 0) or 0
                            completion_tokens = getattr(response.usage, 'output_tokens', 0) or 0
                            total_tokens = (
                                getattr(response.usage, 'total_tokens', None)