feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>
This commit is contained in:
parent
b7b585656b
commit
88643a1ba9
13 changed files with 662 additions and 246 deletions
|
|
@ -23,6 +23,7 @@ hermes-agent/
|
||||||
│ ├── prompt_caching.py # Anthropic prompt caching
|
│ ├── prompt_caching.py # Anthropic prompt caching
|
||||||
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
|
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
|
||||||
│ ├── model_metadata.py # Model context lengths, token estimation
|
│ ├── model_metadata.py # Model context lengths, token estimation
|
||||||
|
│ ├── models_dev.py # models.dev registry integration (provider-aware context)
|
||||||
│ ├── display.py # KawaiiSpinner, tool preview formatting
|
│ ├── display.py # KawaiiSpinner, tool preview formatting
|
||||||
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
|
│ ├── skill_commands.py # Skill slash commands (shared CLI/gateway)
|
||||||
│ └── trajectory.py # Trajectory saving helpers
|
│ └── trajectory.py # Trajectory saving helpers
|
||||||
|
|
|
||||||
|
|
@ -47,10 +47,12 @@ class ContextCompressor:
|
||||||
base_url: str = "",
|
base_url: str = "",
|
||||||
api_key: str = "",
|
api_key: str = "",
|
||||||
config_context_length: int | None = None,
|
config_context_length: int | None = None,
|
||||||
|
provider: str = "",
|
||||||
):
|
):
|
||||||
self.model = model
|
self.model = model
|
||||||
self.base_url = base_url
|
self.base_url = base_url
|
||||||
self.api_key = api_key
|
self.api_key = api_key
|
||||||
|
self.provider = provider
|
||||||
self.threshold_percent = threshold_percent
|
self.threshold_percent = threshold_percent
|
||||||
self.protect_first_n = protect_first_n
|
self.protect_first_n = protect_first_n
|
||||||
self.protect_last_n = protect_last_n
|
self.protect_last_n = protect_last_n
|
||||||
|
|
@ -60,6 +62,7 @@ class ContextCompressor:
|
||||||
self.context_length = get_model_context_length(
|
self.context_length = get_model_context_length(
|
||||||
model, base_url=base_url, api_key=api_key,
|
model, base_url=base_url, api_key=api_key,
|
||||||
config_context_length=config_context_length,
|
config_context_length=config_context_length,
|
||||||
|
provider=provider,
|
||||||
)
|
)
|
||||||
self.threshold_tokens = int(self.context_length * threshold_percent)
|
self.threshold_tokens = int(self.context_length * threshold_percent)
|
||||||
self.compression_count = 0
|
self.compression_count = 0
|
||||||
|
|
|
||||||
|
|
@ -55,104 +55,52 @@ _endpoint_model_metadata_cache_time: Dict[str, float] = {}
|
||||||
_ENDPOINT_MODEL_CACHE_TTL = 300
|
_ENDPOINT_MODEL_CACHE_TTL = 300
|
||||||
|
|
||||||
# Descending tiers for context length probing when the model is unknown.
|
# Descending tiers for context length probing when the model is unknown.
|
||||||
# We start high and step down on context-length errors until one works.
|
# We start at 128K (a safe default for most modern models) and step down
|
||||||
|
# on context-length errors until one works.
|
||||||
CONTEXT_PROBE_TIERS = [
|
CONTEXT_PROBE_TIERS = [
|
||||||
2_000_000,
|
|
||||||
1_000_000,
|
|
||||||
512_000,
|
|
||||||
200_000,
|
|
||||||
128_000,
|
128_000,
|
||||||
64_000,
|
64_000,
|
||||||
32_000,
|
32_000,
|
||||||
|
16_000,
|
||||||
|
8_000,
|
||||||
]
|
]
|
||||||
|
|
||||||
|
# Default context length when no detection method succeeds.
|
||||||
|
DEFAULT_FALLBACK_CONTEXT = CONTEXT_PROBE_TIERS[0]
|
||||||
|
|
||||||
|
# Thin fallback defaults — only broad model family patterns.
|
||||||
|
# These fire only when provider is unknown AND models.dev/OpenRouter/Anthropic
|
||||||
|
# all miss. Replaced the previous 80+ entry dict.
|
||||||
|
# For provider-specific context lengths, models.dev is the primary source.
|
||||||
DEFAULT_CONTEXT_LENGTHS = {
|
DEFAULT_CONTEXT_LENGTHS = {
|
||||||
"anthropic/claude-opus-4": 200000,
|
# Anthropic Claude 4.6 (1M context) — bare IDs only to avoid
|
||||||
"anthropic/claude-opus-4.5": 200000,
|
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
|
||||||
"anthropic/claude-opus-4.6": 1000000,
|
# substring of "anthropic/claude-sonnet-4.6").
|
||||||
"anthropic/claude-sonnet-4": 200000,
|
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
|
||||||
"anthropic/claude-sonnet-4-20250514": 200000,
|
|
||||||
"anthropic/claude-sonnet-4.5": 200000,
|
|
||||||
"anthropic/claude-sonnet-4.6": 1000000,
|
|
||||||
"anthropic/claude-haiku-4.5": 200000,
|
|
||||||
# Bare Anthropic model IDs (for native API provider)
|
|
||||||
"claude-opus-4-6": 1000000,
|
"claude-opus-4-6": 1000000,
|
||||||
"claude-sonnet-4-6": 1000000,
|
"claude-sonnet-4-6": 1000000,
|
||||||
"claude-opus-4-5-20251101": 200000,
|
"claude-opus-4.6": 1000000,
|
||||||
"claude-sonnet-4-5-20250929": 200000,
|
"claude-sonnet-4.6": 1000000,
|
||||||
"claude-opus-4-1-20250805": 200000,
|
# Catch-all for older Claude models (must sort after specific entries)
|
||||||
"claude-opus-4-20250514": 200000,
|
"claude": 200000,
|
||||||
"claude-sonnet-4-20250514": 200000,
|
# OpenAI
|
||||||
"claude-haiku-4-5-20251001": 200000,
|
"gpt-4.1": 1047576,
|
||||||
"openai/gpt-5": 128000,
|
|
||||||
"openai/gpt-4.1": 1047576,
|
|
||||||
"openai/gpt-4.1-mini": 1047576,
|
|
||||||
"openai/gpt-4o": 128000,
|
|
||||||
"openai/gpt-4-turbo": 128000,
|
|
||||||
"openai/gpt-4o-mini": 128000,
|
|
||||||
"google/gemini-3-pro-preview": 1048576,
|
|
||||||
"google/gemini-3-flash": 1048576,
|
|
||||||
"google/gemini-2.5-flash": 1048576,
|
|
||||||
"google/gemini-2.0-flash": 1048576,
|
|
||||||
"google/gemini-2.5-pro": 1048576,
|
|
||||||
"deepseek/deepseek-v3.2": 65536,
|
|
||||||
"meta-llama/llama-3.3-70b-instruct": 131072,
|
|
||||||
"deepseek/deepseek-chat-v3": 65536,
|
|
||||||
"qwen/qwen-2.5-72b-instruct": 32768,
|
|
||||||
"glm-4.7": 202752,
|
|
||||||
"glm-5": 202752,
|
|
||||||
"glm-4.5": 131072,
|
|
||||||
"glm-4.5-flash": 131072,
|
|
||||||
"kimi-for-coding": 262144,
|
|
||||||
"kimi-k2.5": 262144,
|
|
||||||
"kimi-k2-thinking": 262144,
|
|
||||||
"kimi-k2-thinking-turbo": 262144,
|
|
||||||
"kimi-k2-turbo-preview": 262144,
|
|
||||||
"kimi-k2-0905-preview": 131072,
|
|
||||||
"MiniMax-M2.7": 204800,
|
|
||||||
"MiniMax-M2.7-highspeed": 204800,
|
|
||||||
"MiniMax-M2.5": 204800,
|
|
||||||
"MiniMax-M2.5-highspeed": 204800,
|
|
||||||
"MiniMax-M2.1": 204800,
|
|
||||||
# OpenCode Zen models
|
|
||||||
"gpt-5.4-pro": 128000,
|
|
||||||
"gpt-5.4": 128000,
|
|
||||||
"gpt-5.3-codex": 128000,
|
|
||||||
"gpt-5.3-codex-spark": 128000,
|
|
||||||
"gpt-5.2": 128000,
|
|
||||||
"gpt-5.2-codex": 128000,
|
|
||||||
"gpt-5.1": 128000,
|
|
||||||
"gpt-5.1-codex": 128000,
|
|
||||||
"gpt-5.1-codex-max": 128000,
|
|
||||||
"gpt-5.1-codex-mini": 128000,
|
|
||||||
"gpt-5": 128000,
|
"gpt-5": 128000,
|
||||||
"gpt-5-codex": 128000,
|
"gpt-4": 128000,
|
||||||
"gpt-5-nano": 128000,
|
# Google
|
||||||
# Bare model IDs without provider prefix (avoid duplicates with entries above)
|
"gemini": 1048576,
|
||||||
"claude-opus-4-5": 200000,
|
# DeepSeek
|
||||||
"claude-opus-4-1": 200000,
|
"deepseek": 128000,
|
||||||
"claude-sonnet-4-5": 200000,
|
# Meta
|
||||||
"claude-sonnet-4": 200000,
|
"llama": 131072,
|
||||||
"claude-haiku-4-5": 200000,
|
# Qwen
|
||||||
"claude-3-5-haiku": 200000,
|
"qwen": 131072,
|
||||||
"gemini-3.1-pro": 1048576,
|
# MiniMax
|
||||||
"gemini-3-pro": 1048576,
|
"minimax": 204800,
|
||||||
"gemini-3-flash": 1048576,
|
# GLM
|
||||||
"minimax-m2.5": 204800,
|
"glm": 202752,
|
||||||
"minimax-m2.5-free": 204800,
|
# Kimi
|
||||||
"minimax-m2.1": 204800,
|
"kimi": 262144,
|
||||||
"glm-4.6": 202752,
|
|
||||||
"kimi-k2": 262144,
|
|
||||||
"qwen3-coder": 32768,
|
|
||||||
"big-pickle": 128000,
|
|
||||||
# Alibaba Cloud / DashScope Qwen models
|
|
||||||
"qwen3.5-plus": 131072,
|
|
||||||
"qwen3-max": 131072,
|
|
||||||
"qwen3-coder-plus": 131072,
|
|
||||||
"qwen3-coder-next": 131072,
|
|
||||||
"qwen-plus-latest": 131072,
|
|
||||||
"qwen3.5-flash": 131072,
|
|
||||||
"qwen-vl-max": 32768,
|
|
||||||
}
|
}
|
||||||
|
|
||||||
_CONTEXT_LENGTH_KEYS = (
|
_CONTEXT_LENGTH_KEYS = (
|
||||||
|
|
@ -693,22 +641,100 @@ def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_model_version(model: str) -> str:
|
||||||
|
"""Normalize version separators for matching.
|
||||||
|
|
||||||
|
Nous uses dashes: claude-opus-4-6, claude-sonnet-4-5
|
||||||
|
OpenRouter uses dots: claude-opus-4.6, claude-sonnet-4.5
|
||||||
|
Normalize both to dashes for comparison.
|
||||||
|
"""
|
||||||
|
return model.replace(".", "-")
|
||||||
|
|
||||||
|
|
||||||
|
def _query_anthropic_context_length(model: str, base_url: str, api_key: str) -> Optional[int]:
|
||||||
|
"""Query Anthropic's /v1/models endpoint for context length.
|
||||||
|
|
||||||
|
Only works with regular ANTHROPIC_API_KEY (sk-ant-api*).
|
||||||
|
OAuth tokens (sk-ant-oat*) from Claude Code return 401.
|
||||||
|
"""
|
||||||
|
if not api_key or api_key.startswith("sk-ant-oat"):
|
||||||
|
return None # OAuth tokens can't access /v1/models
|
||||||
|
try:
|
||||||
|
base = base_url.rstrip("/")
|
||||||
|
if base.endswith("/v1"):
|
||||||
|
base = base[:-3]
|
||||||
|
url = f"{base}/v1/models?limit=1000"
|
||||||
|
headers = {
|
||||||
|
"x-api-key": api_key,
|
||||||
|
"anthropic-version": "2023-06-01",
|
||||||
|
}
|
||||||
|
resp = requests.get(url, headers=headers, timeout=10)
|
||||||
|
if resp.status_code != 200:
|
||||||
|
return None
|
||||||
|
data = resp.json()
|
||||||
|
for m in data.get("data", []):
|
||||||
|
if m.get("id") == model:
|
||||||
|
ctx = m.get("max_input_tokens")
|
||||||
|
if isinstance(ctx, int) and ctx > 0:
|
||||||
|
return ctx
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("Anthropic /v1/models query failed: %s", e)
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_nous_context_length(model: str) -> Optional[int]:
|
||||||
|
"""Resolve Nous Portal model context length via OpenRouter metadata.
|
||||||
|
|
||||||
|
Nous model IDs are bare (e.g. 'claude-opus-4-6') while OpenRouter uses
|
||||||
|
prefixed IDs (e.g. 'anthropic/claude-opus-4.6'). Try suffix matching
|
||||||
|
with version normalization (dot↔dash).
|
||||||
|
"""
|
||||||
|
metadata = fetch_model_metadata() # OpenRouter cache
|
||||||
|
# Exact match first
|
||||||
|
if model in metadata:
|
||||||
|
return metadata[model].get("context_length")
|
||||||
|
|
||||||
|
normalized = _normalize_model_version(model).lower()
|
||||||
|
|
||||||
|
for or_id, entry in metadata.items():
|
||||||
|
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
|
||||||
|
if bare.lower() == model.lower() or _normalize_model_version(bare).lower() == normalized:
|
||||||
|
return entry.get("context_length")
|
||||||
|
|
||||||
|
# Partial prefix match for cases like gemini-3-flash → gemini-3-flash-preview
|
||||||
|
# Require match to be at a word boundary (followed by -, :, or end of string)
|
||||||
|
model_lower = model.lower()
|
||||||
|
for or_id, entry in metadata.items():
|
||||||
|
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
|
||||||
|
for candidate, query in [(bare.lower(), model_lower), (_normalize_model_version(bare).lower(), normalized)]:
|
||||||
|
if candidate.startswith(query) and (
|
||||||
|
len(candidate) == len(query) or candidate[len(query)] in "-:."
|
||||||
|
):
|
||||||
|
return entry.get("context_length")
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
def get_model_context_length(
|
def get_model_context_length(
|
||||||
model: str,
|
model: str,
|
||||||
base_url: str = "",
|
base_url: str = "",
|
||||||
api_key: str = "",
|
api_key: str = "",
|
||||||
config_context_length: int | None = None,
|
config_context_length: int | None = None,
|
||||||
|
provider: str = "",
|
||||||
) -> int:
|
) -> int:
|
||||||
"""Get the context length for a model.
|
"""Get the context length for a model.
|
||||||
|
|
||||||
Resolution order:
|
Resolution order:
|
||||||
0. Explicit config override (model.context_length in config.yaml)
|
0. Explicit config override (model.context_length or custom_providers per-model)
|
||||||
1. Persistent cache (previously discovered via probing)
|
1. Persistent cache (previously discovered via probing)
|
||||||
2. Active endpoint metadata (/models for explicit custom endpoints)
|
2. Active endpoint metadata (/models for explicit custom endpoints)
|
||||||
3. Local server query (for local endpoints when model not in /models list)
|
3. Local server query (for local endpoints)
|
||||||
4. OpenRouter API metadata
|
4. Anthropic /v1/models API (API-key users only, not OAuth)
|
||||||
5. Hardcoded DEFAULT_CONTEXT_LENGTHS (fuzzy match for hosted routes only)
|
5. OpenRouter live API metadata
|
||||||
6. First probe tier (2M) — will be narrowed on first context error
|
6. Nous suffix-match via OpenRouter cache
|
||||||
|
7. models.dev registry lookup (provider-aware)
|
||||||
|
8. Thin hardcoded defaults (broad family patterns)
|
||||||
|
9. Default fallback (128K)
|
||||||
"""
|
"""
|
||||||
# 0. Explicit config override — user knows best
|
# 0. Explicit config override — user knows best
|
||||||
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
|
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
|
||||||
|
|
@ -744,9 +770,7 @@ def get_model_context_length(
|
||||||
if isinstance(context_length, int):
|
if isinstance(context_length, int):
|
||||||
return context_length
|
return context_length
|
||||||
if not _is_known_provider_base_url(base_url):
|
if not _is_known_provider_base_url(base_url):
|
||||||
# Explicit third-party endpoints should not borrow fuzzy global
|
# 3. Try querying local server directly
|
||||||
# defaults from unrelated providers with similarly named models.
|
|
||||||
# But first try querying the local server directly.
|
|
||||||
if is_local_endpoint(base_url):
|
if is_local_endpoint(base_url):
|
||||||
local_ctx = _query_local_context_length(model, base_url)
|
local_ctx = _query_local_context_length(model, base_url)
|
||||||
if local_ctx and local_ctx > 0:
|
if local_ctx and local_ctx > 0:
|
||||||
|
|
@ -756,31 +780,53 @@ def get_model_context_length(
|
||||||
"Could not detect context length for model %r at %s — "
|
"Could not detect context length for model %r at %s — "
|
||||||
"defaulting to %s tokens (probe-down). Set model.context_length "
|
"defaulting to %s tokens (probe-down). Set model.context_length "
|
||||||
"in config.yaml to override.",
|
"in config.yaml to override.",
|
||||||
model, base_url, f"{CONTEXT_PROBE_TIERS[0]:,}",
|
model, base_url, f"{DEFAULT_FALLBACK_CONTEXT:,}",
|
||||||
)
|
)
|
||||||
return CONTEXT_PROBE_TIERS[0]
|
return DEFAULT_FALLBACK_CONTEXT
|
||||||
|
|
||||||
# 3. OpenRouter API metadata
|
# 4. Anthropic /v1/models API (only for regular API keys, not OAuth)
|
||||||
|
if provider == "anthropic" or (
|
||||||
|
base_url and "api.anthropic.com" in base_url
|
||||||
|
):
|
||||||
|
ctx = _query_anthropic_context_length(model, base_url or "https://api.anthropic.com", api_key)
|
||||||
|
if ctx:
|
||||||
|
return ctx
|
||||||
|
|
||||||
|
# 5. Provider-aware lookups (before generic OpenRouter cache)
|
||||||
|
# These are provider-specific and take priority over the generic OR cache,
|
||||||
|
# since the same model can have different context limits per provider
|
||||||
|
# (e.g. claude-opus-4.6 is 1M on Anthropic but 128K on GitHub Copilot).
|
||||||
|
if provider == "nous":
|
||||||
|
ctx = _resolve_nous_context_length(model)
|
||||||
|
if ctx:
|
||||||
|
return ctx
|
||||||
|
elif provider:
|
||||||
|
from agent.models_dev import lookup_models_dev_context
|
||||||
|
ctx = lookup_models_dev_context(provider, model)
|
||||||
|
if ctx:
|
||||||
|
return ctx
|
||||||
|
|
||||||
|
# 6. OpenRouter live API metadata (provider-unaware fallback)
|
||||||
metadata = fetch_model_metadata()
|
metadata = fetch_model_metadata()
|
||||||
if model in metadata:
|
if model in metadata:
|
||||||
return metadata[model].get("context_length", 128000)
|
return metadata[model].get("context_length", 128000)
|
||||||
|
|
||||||
# 4. Hardcoded defaults (fuzzy match — longest key first for specificity)
|
# 8. Hardcoded defaults (fuzzy match — longest key first for specificity)
|
||||||
for default_model, length in sorted(
|
for default_model, length in sorted(
|
||||||
DEFAULT_CONTEXT_LENGTHS.items(), key=lambda x: len(x[0]), reverse=True
|
DEFAULT_CONTEXT_LENGTHS.items(), key=lambda x: len(x[0]), reverse=True
|
||||||
):
|
):
|
||||||
if default_model in model or model in default_model:
|
if default_model in model or model in default_model:
|
||||||
return length
|
return length
|
||||||
|
|
||||||
# 5. Query local server for unknown models before defaulting to 2M
|
# 9. Query local server as last resort
|
||||||
if base_url and is_local_endpoint(base_url):
|
if base_url and is_local_endpoint(base_url):
|
||||||
local_ctx = _query_local_context_length(model, base_url)
|
local_ctx = _query_local_context_length(model, base_url)
|
||||||
if local_ctx and local_ctx > 0:
|
if local_ctx and local_ctx > 0:
|
||||||
save_context_length(model, base_url, local_ctx)
|
save_context_length(model, base_url, local_ctx)
|
||||||
return local_ctx
|
return local_ctx
|
||||||
|
|
||||||
# 6. Unknown model — start at highest probe tier
|
# 10. Default fallback — 128K
|
||||||
return CONTEXT_PROBE_TIERS[0]
|
return DEFAULT_FALLBACK_CONTEXT
|
||||||
|
|
||||||
|
|
||||||
def estimate_tokens_rough(text: str) -> int:
|
def estimate_tokens_rough(text: str) -> int:
|
||||||
|
|
|
||||||
170
agent/models_dev.py
Normal file
170
agent/models_dev.py
Normal file
|
|
@ -0,0 +1,170 @@
|
||||||
|
"""Models.dev registry integration for provider-aware context length detection.
|
||||||
|
|
||||||
|
Fetches model metadata from https://models.dev/api.json — a community-maintained
|
||||||
|
database of 3800+ models across 100+ providers, including per-provider context
|
||||||
|
windows, pricing, and capabilities.
|
||||||
|
|
||||||
|
Data is cached in memory (1hr TTL) and on disk (~/.hermes/models_dev_cache.json)
|
||||||
|
to avoid cold-start network latency.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, Optional
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
MODELS_DEV_URL = "https://models.dev/api.json"
|
||||||
|
_MODELS_DEV_CACHE_TTL = 3600 # 1 hour in-memory
|
||||||
|
|
||||||
|
# In-memory cache
|
||||||
|
_models_dev_cache: Dict[str, Any] = {}
|
||||||
|
_models_dev_cache_time: float = 0
|
||||||
|
|
||||||
|
# Provider ID mapping: Hermes provider names → models.dev provider IDs
|
||||||
|
PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
|
||||||
|
"openrouter": "openrouter",
|
||||||
|
"anthropic": "anthropic",
|
||||||
|
"zai": "zai",
|
||||||
|
"kimi-coding": "kimi-for-coding",
|
||||||
|
"minimax": "minimax",
|
||||||
|
"minimax-cn": "minimax-cn",
|
||||||
|
"deepseek": "deepseek",
|
||||||
|
"alibaba": "alibaba",
|
||||||
|
"copilot": "github-copilot",
|
||||||
|
"ai-gateway": "vercel",
|
||||||
|
"opencode-zen": "opencode",
|
||||||
|
"opencode-go": "opencode-go",
|
||||||
|
"kilocode": "kilo",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _get_cache_path() -> Path:
|
||||||
|
"""Return path to disk cache file."""
|
||||||
|
env_val = os.environ.get("HERMES_HOME", "")
|
||||||
|
hermes_home = Path(env_val) if env_val else Path.home() / ".hermes"
|
||||||
|
return hermes_home / "models_dev_cache.json"
|
||||||
|
|
||||||
|
|
||||||
|
def _load_disk_cache() -> Dict[str, Any]:
|
||||||
|
"""Load models.dev data from disk cache."""
|
||||||
|
try:
|
||||||
|
cache_path = _get_cache_path()
|
||||||
|
if cache_path.exists():
|
||||||
|
with open(cache_path, encoding="utf-8") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("Failed to load models.dev disk cache: %s", e)
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def _save_disk_cache(data: Dict[str, Any]) -> None:
|
||||||
|
"""Save models.dev data to disk cache."""
|
||||||
|
try:
|
||||||
|
cache_path = _get_cache_path()
|
||||||
|
cache_path.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
with open(cache_path, "w", encoding="utf-8") as f:
|
||||||
|
json.dump(data, f, separators=(",", ":"))
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("Failed to save models.dev disk cache: %s", e)
|
||||||
|
|
||||||
|
|
||||||
|
def fetch_models_dev(force_refresh: bool = False) -> Dict[str, Any]:
|
||||||
|
"""Fetch models.dev registry. In-memory cache (1hr) + disk fallback.
|
||||||
|
|
||||||
|
Returns the full registry dict keyed by provider ID, or empty dict on failure.
|
||||||
|
"""
|
||||||
|
global _models_dev_cache, _models_dev_cache_time
|
||||||
|
|
||||||
|
# Check in-memory cache
|
||||||
|
if (
|
||||||
|
not force_refresh
|
||||||
|
and _models_dev_cache
|
||||||
|
and (time.time() - _models_dev_cache_time) < _MODELS_DEV_CACHE_TTL
|
||||||
|
):
|
||||||
|
return _models_dev_cache
|
||||||
|
|
||||||
|
# Try network fetch
|
||||||
|
try:
|
||||||
|
response = requests.get(MODELS_DEV_URL, timeout=15)
|
||||||
|
response.raise_for_status()
|
||||||
|
data = response.json()
|
||||||
|
if isinstance(data, dict) and len(data) > 0:
|
||||||
|
_models_dev_cache = data
|
||||||
|
_models_dev_cache_time = time.time()
|
||||||
|
_save_disk_cache(data)
|
||||||
|
logger.debug(
|
||||||
|
"Fetched models.dev registry: %d providers, %d total models",
|
||||||
|
len(data),
|
||||||
|
sum(len(p.get("models", {})) for p in data.values() if isinstance(p, dict)),
|
||||||
|
)
|
||||||
|
return data
|
||||||
|
except Exception as e:
|
||||||
|
logger.debug("Failed to fetch models.dev: %s", e)
|
||||||
|
|
||||||
|
# Fall back to disk cache
|
||||||
|
if not _models_dev_cache:
|
||||||
|
_models_dev_cache = _load_disk_cache()
|
||||||
|
if _models_dev_cache:
|
||||||
|
_models_dev_cache_time = time.time()
|
||||||
|
logger.debug("Loaded models.dev from disk cache (%d providers)", len(_models_dev_cache))
|
||||||
|
|
||||||
|
return _models_dev_cache
|
||||||
|
|
||||||
|
|
||||||
|
def lookup_models_dev_context(provider: str, model: str) -> Optional[int]:
|
||||||
|
"""Look up context_length for a provider+model combo in models.dev.
|
||||||
|
|
||||||
|
Returns the context window in tokens, or None if not found.
|
||||||
|
Handles case-insensitive matching and filters out context=0 entries.
|
||||||
|
"""
|
||||||
|
mdev_provider_id = PROVIDER_TO_MODELS_DEV.get(provider)
|
||||||
|
if not mdev_provider_id:
|
||||||
|
return None
|
||||||
|
|
||||||
|
data = fetch_models_dev()
|
||||||
|
provider_data = data.get(mdev_provider_id)
|
||||||
|
if not isinstance(provider_data, dict):
|
||||||
|
return None
|
||||||
|
|
||||||
|
models = provider_data.get("models", {})
|
||||||
|
if not isinstance(models, dict):
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Exact match
|
||||||
|
entry = models.get(model)
|
||||||
|
if entry:
|
||||||
|
ctx = _extract_context(entry)
|
||||||
|
if ctx:
|
||||||
|
return ctx
|
||||||
|
|
||||||
|
# Case-insensitive match
|
||||||
|
model_lower = model.lower()
|
||||||
|
for mid, mdata in models.items():
|
||||||
|
if mid.lower() == model_lower:
|
||||||
|
ctx = _extract_context(mdata)
|
||||||
|
if ctx:
|
||||||
|
return ctx
|
||||||
|
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_context(entry: Dict[str, Any]) -> Optional[int]:
|
||||||
|
"""Extract context_length from a models.dev model entry.
|
||||||
|
|
||||||
|
Returns None for invalid/zero values (some audio/image models have context=0).
|
||||||
|
"""
|
||||||
|
if not isinstance(entry, dict):
|
||||||
|
return None
|
||||||
|
limit = entry.get("limit")
|
||||||
|
if not isinstance(limit, dict):
|
||||||
|
return None
|
||||||
|
ctx = limit.get("context")
|
||||||
|
if isinstance(ctx, (int, float)) and ctx > 0:
|
||||||
|
return int(ctx)
|
||||||
|
return None
|
||||||
|
|
@ -1137,10 +1137,21 @@ def _model_flow_custom(config):
|
||||||
base_url = input(f"API base URL [{current_url or 'e.g. https://api.example.com/v1'}]: ").strip()
|
base_url = input(f"API base URL [{current_url or 'e.g. https://api.example.com/v1'}]: ").strip()
|
||||||
api_key = input(f"API key [{current_key[:8] + '...' if current_key else 'optional'}]: ").strip()
|
api_key = input(f"API key [{current_key[:8] + '...' if current_key else 'optional'}]: ").strip()
|
||||||
model_name = input("Model name (e.g. gpt-4, llama-3-70b): ").strip()
|
model_name = input("Model name (e.g. gpt-4, llama-3-70b): ").strip()
|
||||||
|
context_length_str = input("Context length in tokens [leave blank for auto-detect]: ").strip()
|
||||||
except (KeyboardInterrupt, EOFError):
|
except (KeyboardInterrupt, EOFError):
|
||||||
print("\nCancelled.")
|
print("\nCancelled.")
|
||||||
return
|
return
|
||||||
|
|
||||||
|
context_length = None
|
||||||
|
if context_length_str:
|
||||||
|
try:
|
||||||
|
context_length = int(context_length_str.replace(",", "").replace("k", "000").replace("K", "000"))
|
||||||
|
if context_length <= 0:
|
||||||
|
context_length = None
|
||||||
|
except ValueError:
|
||||||
|
print(f"Invalid context length: {context_length_str} — will auto-detect.")
|
||||||
|
context_length = None
|
||||||
|
|
||||||
if not base_url and not current_url:
|
if not base_url and not current_url:
|
||||||
print("No URL provided. Cancelled.")
|
print("No URL provided. Cancelled.")
|
||||||
return
|
return
|
||||||
|
|
@ -1203,14 +1214,14 @@ def _model_flow_custom(config):
|
||||||
print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")
|
print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")
|
||||||
|
|
||||||
# Auto-save to custom_providers so it appears in the menu next time
|
# Auto-save to custom_providers so it appears in the menu next time
|
||||||
_save_custom_provider(effective_url, effective_key, model_name or "")
|
_save_custom_provider(effective_url, effective_key, model_name or "", context_length=context_length)
|
||||||
|
|
||||||
|
|
||||||
def _save_custom_provider(base_url, api_key="", model=""):
|
def _save_custom_provider(base_url, api_key="", model="", context_length=None):
|
||||||
"""Save a custom endpoint to custom_providers in config.yaml.
|
"""Save a custom endpoint to custom_providers in config.yaml.
|
||||||
|
|
||||||
Deduplicates by base_url — if the URL already exists, updates the
|
Deduplicates by base_url — if the URL already exists, updates the
|
||||||
model name but doesn't add a duplicate entry.
|
model name and context_length but doesn't add a duplicate entry.
|
||||||
Auto-generates a display name from the URL hostname.
|
Auto-generates a display name from the URL hostname.
|
||||||
"""
|
"""
|
||||||
from hermes_cli.config import load_config, save_config
|
from hermes_cli.config import load_config, save_config
|
||||||
|
|
@ -1220,14 +1231,24 @@ def _save_custom_provider(base_url, api_key="", model=""):
|
||||||
if not isinstance(providers, list):
|
if not isinstance(providers, list):
|
||||||
providers = []
|
providers = []
|
||||||
|
|
||||||
# Check if this URL is already saved — update model if so
|
# Check if this URL is already saved — update model/context_length if so
|
||||||
for entry in providers:
|
for entry in providers:
|
||||||
if isinstance(entry, dict) and entry.get("base_url", "").rstrip("/") == base_url.rstrip("/"):
|
if isinstance(entry, dict) and entry.get("base_url", "").rstrip("/") == base_url.rstrip("/"):
|
||||||
|
changed = False
|
||||||
if model and entry.get("model") != model:
|
if model and entry.get("model") != model:
|
||||||
entry["model"] = model
|
entry["model"] = model
|
||||||
|
changed = True
|
||||||
|
if model and context_length:
|
||||||
|
models_cfg = entry.get("models", {})
|
||||||
|
if not isinstance(models_cfg, dict):
|
||||||
|
models_cfg = {}
|
||||||
|
models_cfg[model] = {"context_length": context_length}
|
||||||
|
entry["models"] = models_cfg
|
||||||
|
changed = True
|
||||||
|
if changed:
|
||||||
cfg["custom_providers"] = providers
|
cfg["custom_providers"] = providers
|
||||||
save_config(cfg)
|
save_config(cfg)
|
||||||
return # already saved, updated model if needed
|
return # already saved, updated if needed
|
||||||
|
|
||||||
# Auto-generate a name from the URL
|
# Auto-generate a name from the URL
|
||||||
import re
|
import re
|
||||||
|
|
@ -1249,6 +1270,8 @@ def _save_custom_provider(base_url, api_key="", model=""):
|
||||||
entry["api_key"] = api_key
|
entry["api_key"] = api_key
|
||||||
if model:
|
if model:
|
||||||
entry["model"] = model
|
entry["model"] = model
|
||||||
|
if model and context_length:
|
||||||
|
entry["models"] = {model: {"context_length": context_length}}
|
||||||
|
|
||||||
providers.append(entry)
|
providers.append(entry)
|
||||||
cfg["custom_providers"] = providers
|
cfg["custom_providers"] = providers
|
||||||
|
|
|
||||||
|
|
@ -1045,93 +1045,17 @@ def setup_model_provider(config: dict):
|
||||||
print()
|
print()
|
||||||
print_header("Custom OpenAI-Compatible Endpoint")
|
print_header("Custom OpenAI-Compatible Endpoint")
|
||||||
print_info("Works with any API that follows OpenAI's chat completions spec")
|
print_info("Works with any API that follows OpenAI's chat completions spec")
|
||||||
|
print()
|
||||||
|
|
||||||
current_url = get_env_value("OPENAI_BASE_URL") or ""
|
# Reuse the shared custom endpoint flow from `hermes model`.
|
||||||
current_key = get_env_value("OPENAI_API_KEY")
|
# This handles: URL/key/model/context-length prompts, endpoint probing,
|
||||||
_raw_model = config.get("model", "")
|
# env saving, config.yaml updates, and custom_providers persistence.
|
||||||
current_model = (
|
from hermes_cli.main import _model_flow_custom
|
||||||
_raw_model.get("default", "")
|
_model_flow_custom(config)
|
||||||
if isinstance(_raw_model, dict)
|
# _model_flow_custom handles model selection, config, env vars,
|
||||||
else (_raw_model or "")
|
# and custom_providers. Keep selected_provider = "custom" so
|
||||||
)
|
# the model selection step below is skipped (line 1631 check)
|
||||||
|
# but vision and TTS setup still run.
|
||||||
if current_url:
|
|
||||||
print_info(f" Current URL: {current_url}")
|
|
||||||
if current_key:
|
|
||||||
print_info(f" Current key: {current_key[:8]}... (configured)")
|
|
||||||
|
|
||||||
base_url = prompt(
|
|
||||||
" API base URL (e.g., https://api.example.com/v1)", current_url
|
|
||||||
).strip()
|
|
||||||
api_key = prompt(" API key", password=True)
|
|
||||||
model_name = prompt(" Model name (e.g., gpt-4, claude-3-opus)", current_model)
|
|
||||||
|
|
||||||
if base_url:
|
|
||||||
from hermes_cli.models import probe_api_models
|
|
||||||
|
|
||||||
probe = probe_api_models(api_key, base_url)
|
|
||||||
if probe.get("used_fallback") and probe.get("resolved_base_url"):
|
|
||||||
print_warning(
|
|
||||||
f"Endpoint verification worked at {probe['resolved_base_url']}/models, "
|
|
||||||
f"not the exact URL you entered. Saving the working base URL instead."
|
|
||||||
)
|
|
||||||
base_url = probe["resolved_base_url"]
|
|
||||||
elif probe.get("models") is not None:
|
|
||||||
print_success(
|
|
||||||
f"Verified endpoint via {probe.get('probed_url')} "
|
|
||||||
f"({len(probe.get('models') or [])} model(s) visible)"
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
print_warning(
|
|
||||||
f"Could not verify this endpoint via {probe.get('probed_url')}. "
|
|
||||||
f"Hermes will still save it."
|
|
||||||
)
|
|
||||||
if probe.get("suggested_base_url"):
|
|
||||||
print_info(
|
|
||||||
f" If this server expects /v1, try base URL: {probe['suggested_base_url']}"
|
|
||||||
)
|
|
||||||
|
|
||||||
save_env_value("OPENAI_BASE_URL", base_url)
|
|
||||||
if api_key:
|
|
||||||
save_env_value("OPENAI_API_KEY", api_key)
|
|
||||||
if model_name:
|
|
||||||
_set_default_model(config, model_name)
|
|
||||||
|
|
||||||
try:
|
|
||||||
from hermes_cli.auth import deactivate_provider
|
|
||||||
|
|
||||||
deactivate_provider()
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
# Save provider and base_url to config.yaml so the gateway and CLI
|
|
||||||
# both resolve the correct provider without relying on env-var heuristics.
|
|
||||||
if base_url:
|
|
||||||
import yaml
|
|
||||||
|
|
||||||
config_path = (
|
|
||||||
Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
|
|
||||||
/ "config.yaml"
|
|
||||||
)
|
|
||||||
try:
|
|
||||||
disk_cfg = {}
|
|
||||||
if config_path.exists():
|
|
||||||
disk_cfg = yaml.safe_load(config_path.read_text()) or {}
|
|
||||||
model_section = disk_cfg.get("model", {})
|
|
||||||
if isinstance(model_section, str):
|
|
||||||
model_section = {"default": model_section}
|
|
||||||
model_section["provider"] = "custom"
|
|
||||||
model_section["base_url"] = base_url.rstrip("/")
|
|
||||||
if model_name:
|
|
||||||
model_section["default"] = model_name
|
|
||||||
disk_cfg["model"] = model_section
|
|
||||||
config_path.write_text(yaml.safe_dump(disk_cfg, sort_keys=False))
|
|
||||||
except Exception as e:
|
|
||||||
logger.debug("Could not save provider to config.yaml: %s", e)
|
|
||||||
|
|
||||||
_set_model_provider(config, "custom", base_url)
|
|
||||||
|
|
||||||
print_success("Custom endpoint configured")
|
|
||||||
|
|
||||||
elif provider_idx == 4: # Z.AI / GLM
|
elif provider_idx == 4: # Z.AI / GLM
|
||||||
selected_provider = "zai"
|
selected_provider = "zai"
|
||||||
|
|
|
||||||
22
run_agent.py
22
run_agent.py
|
|
@ -992,6 +992,27 @@ class AIAgent:
|
||||||
except (TypeError, ValueError):
|
except (TypeError, ValueError):
|
||||||
_config_context_length = None
|
_config_context_length = None
|
||||||
|
|
||||||
|
# Check custom_providers per-model context_length
|
||||||
|
if _config_context_length is None:
|
||||||
|
_custom_providers = _agent_cfg.get("custom_providers")
|
||||||
|
if isinstance(_custom_providers, list):
|
||||||
|
for _cp_entry in _custom_providers:
|
||||||
|
if not isinstance(_cp_entry, dict):
|
||||||
|
continue
|
||||||
|
_cp_url = (_cp_entry.get("base_url") or "").rstrip("/")
|
||||||
|
if _cp_url and _cp_url == self.base_url.rstrip("/"):
|
||||||
|
_cp_models = _cp_entry.get("models", {})
|
||||||
|
if isinstance(_cp_models, dict):
|
||||||
|
_cp_model_cfg = _cp_models.get(self.model, {})
|
||||||
|
if isinstance(_cp_model_cfg, dict):
|
||||||
|
_cp_ctx = _cp_model_cfg.get("context_length")
|
||||||
|
if _cp_ctx is not None:
|
||||||
|
try:
|
||||||
|
_config_context_length = int(_cp_ctx)
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
pass
|
||||||
|
break
|
||||||
|
|
||||||
self.context_compressor = ContextCompressor(
|
self.context_compressor = ContextCompressor(
|
||||||
model=self.model,
|
model=self.model,
|
||||||
threshold_percent=compression_threshold,
|
threshold_percent=compression_threshold,
|
||||||
|
|
@ -1003,6 +1024,7 @@ class AIAgent:
|
||||||
base_url=self.base_url,
|
base_url=self.base_url,
|
||||||
api_key=getattr(self, "api_key", ""),
|
api_key=getattr(self, "api_key", ""),
|
||||||
config_context_length=_config_context_length,
|
config_context_length=_config_context_length,
|
||||||
|
provider=self.provider,
|
||||||
)
|
)
|
||||||
self.compression_enabled = compression_enabled
|
self.compression_enabled = compression_enabled
|
||||||
self._user_turn_count = 0
|
self._user_turn_count = 0
|
||||||
|
|
|
||||||
|
|
@ -472,35 +472,35 @@ class TestContextProbeTiers:
|
||||||
for i in range(len(CONTEXT_PROBE_TIERS) - 1):
|
for i in range(len(CONTEXT_PROBE_TIERS) - 1):
|
||||||
assert CONTEXT_PROBE_TIERS[i] > CONTEXT_PROBE_TIERS[i + 1]
|
assert CONTEXT_PROBE_TIERS[i] > CONTEXT_PROBE_TIERS[i + 1]
|
||||||
|
|
||||||
def test_first_tier_is_2m(self):
|
def test_first_tier_is_128k(self):
|
||||||
assert CONTEXT_PROBE_TIERS[0] == 2_000_000
|
assert CONTEXT_PROBE_TIERS[0] == 128_000
|
||||||
|
|
||||||
def test_last_tier_is_32k(self):
|
def test_last_tier_is_8k(self):
|
||||||
assert CONTEXT_PROBE_TIERS[-1] == 32_000
|
assert CONTEXT_PROBE_TIERS[-1] == 8_000
|
||||||
|
|
||||||
|
|
||||||
class TestGetNextProbeTier:
|
class TestGetNextProbeTier:
|
||||||
def test_from_2m(self):
|
|
||||||
assert get_next_probe_tier(2_000_000) == 1_000_000
|
|
||||||
|
|
||||||
def test_from_1m(self):
|
|
||||||
assert get_next_probe_tier(1_000_000) == 512_000
|
|
||||||
|
|
||||||
def test_from_128k(self):
|
def test_from_128k(self):
|
||||||
assert get_next_probe_tier(128_000) == 64_000
|
assert get_next_probe_tier(128_000) == 64_000
|
||||||
|
|
||||||
def test_from_32k_returns_none(self):
|
def test_from_64k(self):
|
||||||
assert get_next_probe_tier(32_000) is None
|
assert get_next_probe_tier(64_000) == 32_000
|
||||||
|
|
||||||
|
def test_from_32k(self):
|
||||||
|
assert get_next_probe_tier(32_000) == 16_000
|
||||||
|
|
||||||
|
def test_from_8k_returns_none(self):
|
||||||
|
assert get_next_probe_tier(8_000) is None
|
||||||
|
|
||||||
def test_from_below_min_returns_none(self):
|
def test_from_below_min_returns_none(self):
|
||||||
assert get_next_probe_tier(16_000) is None
|
assert get_next_probe_tier(4_000) is None
|
||||||
|
|
||||||
def test_from_arbitrary_value(self):
|
def test_from_arbitrary_value(self):
|
||||||
assert get_next_probe_tier(300_000) == 200_000
|
assert get_next_probe_tier(100_000) == 64_000
|
||||||
|
|
||||||
def test_above_max_tier(self):
|
def test_above_max_tier(self):
|
||||||
"""Value above 2M should return 2M."""
|
"""Value above 128K should return 128K."""
|
||||||
assert get_next_probe_tier(5_000_000) == 2_000_000
|
assert get_next_probe_tier(500_000) == 128_000
|
||||||
|
|
||||||
def test_zero_returns_none(self):
|
def test_zero_returns_none(self):
|
||||||
assert get_next_probe_tier(0) is None
|
assert get_next_probe_tier(0) is None
|
||||||
|
|
|
||||||
197
tests/agent/test_models_dev.py
Normal file
197
tests/agent/test_models_dev.py
Normal file
|
|
@ -0,0 +1,197 @@
|
||||||
|
"""Tests for agent.models_dev — models.dev registry integration."""
|
||||||
|
import json
|
||||||
|
from unittest.mock import patch, MagicMock
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from agent.models_dev import (
|
||||||
|
PROVIDER_TO_MODELS_DEV,
|
||||||
|
_extract_context,
|
||||||
|
fetch_models_dev,
|
||||||
|
lookup_models_dev_context,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
SAMPLE_REGISTRY = {
|
||||||
|
"anthropic": {
|
||||||
|
"id": "anthropic",
|
||||||
|
"name": "Anthropic",
|
||||||
|
"models": {
|
||||||
|
"claude-opus-4-6": {
|
||||||
|
"id": "claude-opus-4-6",
|
||||||
|
"limit": {"context": 1000000, "output": 128000},
|
||||||
|
},
|
||||||
|
"claude-sonnet-4-6": {
|
||||||
|
"id": "claude-sonnet-4-6",
|
||||||
|
"limit": {"context": 1000000, "output": 64000},
|
||||||
|
},
|
||||||
|
"claude-sonnet-4-0": {
|
||||||
|
"id": "claude-sonnet-4-0",
|
||||||
|
"limit": {"context": 200000, "output": 64000},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"github-copilot": {
|
||||||
|
"id": "github-copilot",
|
||||||
|
"name": "GitHub Copilot",
|
||||||
|
"models": {
|
||||||
|
"claude-opus-4.6": {
|
||||||
|
"id": "claude-opus-4.6",
|
||||||
|
"limit": {"context": 128000, "output": 32000},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"kilo": {
|
||||||
|
"id": "kilo",
|
||||||
|
"name": "Kilo Gateway",
|
||||||
|
"models": {
|
||||||
|
"anthropic/claude-sonnet-4.6": {
|
||||||
|
"id": "anthropic/claude-sonnet-4.6",
|
||||||
|
"limit": {"context": 1000000, "output": 128000},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"deepseek": {
|
||||||
|
"id": "deepseek",
|
||||||
|
"name": "DeepSeek",
|
||||||
|
"models": {
|
||||||
|
"deepseek-chat": {
|
||||||
|
"id": "deepseek-chat",
|
||||||
|
"limit": {"context": 128000, "output": 8192},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
"audio-only": {
|
||||||
|
"id": "audio-only",
|
||||||
|
"models": {
|
||||||
|
"tts-model": {
|
||||||
|
"id": "tts-model",
|
||||||
|
"limit": {"context": 0, "output": 0},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class TestProviderMapping:
|
||||||
|
def test_all_mapped_providers_are_strings(self):
|
||||||
|
for hermes_id, mdev_id in PROVIDER_TO_MODELS_DEV.items():
|
||||||
|
assert isinstance(hermes_id, str)
|
||||||
|
assert isinstance(mdev_id, str)
|
||||||
|
|
||||||
|
def test_known_providers_mapped(self):
|
||||||
|
assert PROVIDER_TO_MODELS_DEV["anthropic"] == "anthropic"
|
||||||
|
assert PROVIDER_TO_MODELS_DEV["copilot"] == "github-copilot"
|
||||||
|
assert PROVIDER_TO_MODELS_DEV["kilocode"] == "kilo"
|
||||||
|
assert PROVIDER_TO_MODELS_DEV["ai-gateway"] == "vercel"
|
||||||
|
|
||||||
|
def test_unmapped_provider_not_in_dict(self):
|
||||||
|
assert "nous" not in PROVIDER_TO_MODELS_DEV
|
||||||
|
assert "openai-codex" not in PROVIDER_TO_MODELS_DEV
|
||||||
|
|
||||||
|
|
||||||
|
class TestExtractContext:
|
||||||
|
def test_valid_entry(self):
|
||||||
|
assert _extract_context({"limit": {"context": 128000}}) == 128000
|
||||||
|
|
||||||
|
def test_zero_context_returns_none(self):
|
||||||
|
assert _extract_context({"limit": {"context": 0}}) is None
|
||||||
|
|
||||||
|
def test_missing_limit_returns_none(self):
|
||||||
|
assert _extract_context({"id": "test"}) is None
|
||||||
|
|
||||||
|
def test_missing_context_returns_none(self):
|
||||||
|
assert _extract_context({"limit": {"output": 8192}}) is None
|
||||||
|
|
||||||
|
def test_non_dict_returns_none(self):
|
||||||
|
assert _extract_context("not a dict") is None
|
||||||
|
|
||||||
|
def test_float_context_coerced_to_int(self):
|
||||||
|
assert _extract_context({"limit": {"context": 131072.0}}) == 131072
|
||||||
|
|
||||||
|
|
||||||
|
class TestLookupModelsDevContext:
|
||||||
|
@patch("agent.models_dev.fetch_models_dev")
|
||||||
|
def test_exact_match(self, mock_fetch):
|
||||||
|
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||||
|
assert lookup_models_dev_context("anthropic", "claude-opus-4-6") == 1000000
|
||||||
|
|
||||||
|
@patch("agent.models_dev.fetch_models_dev")
|
||||||
|
def test_case_insensitive_match(self, mock_fetch):
|
||||||
|
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||||
|
assert lookup_models_dev_context("anthropic", "Claude-Opus-4-6") == 1000000
|
||||||
|
|
||||||
|
@patch("agent.models_dev.fetch_models_dev")
|
||||||
|
def test_provider_not_mapped(self, mock_fetch):
|
||||||
|
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||||
|
assert lookup_models_dev_context("nous", "some-model") is None
|
||||||
|
|
||||||
|
@patch("agent.models_dev.fetch_models_dev")
|
||||||
|
def test_model_not_found(self, mock_fetch):
|
||||||
|
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||||
|
assert lookup_models_dev_context("anthropic", "nonexistent-model") is None
|
||||||
|
|
||||||
|
@patch("agent.models_dev.fetch_models_dev")
|
||||||
|
def test_provider_aware_context(self, mock_fetch):
|
||||||
|
"""Same model, different context per provider."""
|
||||||
|
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||||
|
# Anthropic direct: 1M
|
||||||
|
assert lookup_models_dev_context("anthropic", "claude-opus-4-6") == 1000000
|
||||||
|
# GitHub Copilot: only 128K for same model
|
||||||
|
assert lookup_models_dev_context("copilot", "claude-opus-4.6") == 128000
|
||||||
|
|
||||||
|
@patch("agent.models_dev.fetch_models_dev")
|
||||||
|
def test_zero_context_filtered(self, mock_fetch):
|
||||||
|
mock_fetch.return_value = SAMPLE_REGISTRY
|
||||||
|
# audio-only is not a mapped provider, but test the filtering directly
|
||||||
|
data = SAMPLE_REGISTRY["audio-only"]["models"]["tts-model"]
|
||||||
|
assert _extract_context(data) is None
|
||||||
|
|
||||||
|
@patch("agent.models_dev.fetch_models_dev")
|
||||||
|
def test_empty_registry(self, mock_fetch):
|
||||||
|
mock_fetch.return_value = {}
|
||||||
|
assert lookup_models_dev_context("anthropic", "claude-opus-4-6") is None
|
||||||
|
|
||||||
|
|
||||||
|
class TestFetchModelsDev:
|
||||||
|
@patch("agent.models_dev.requests.get")
|
||||||
|
def test_fetch_success(self, mock_get):
|
||||||
|
mock_resp = MagicMock()
|
||||||
|
mock_resp.status_code = 200
|
||||||
|
mock_resp.json.return_value = SAMPLE_REGISTRY
|
||||||
|
mock_resp.raise_for_status = MagicMock()
|
||||||
|
mock_get.return_value = mock_resp
|
||||||
|
|
||||||
|
# Clear caches
|
||||||
|
import agent.models_dev as md
|
||||||
|
md._models_dev_cache = {}
|
||||||
|
md._models_dev_cache_time = 0
|
||||||
|
|
||||||
|
with patch.object(md, "_save_disk_cache"):
|
||||||
|
result = fetch_models_dev(force_refresh=True)
|
||||||
|
|
||||||
|
assert "anthropic" in result
|
||||||
|
assert len(result) == len(SAMPLE_REGISTRY)
|
||||||
|
|
||||||
|
@patch("agent.models_dev.requests.get")
|
||||||
|
def test_fetch_failure_returns_stale_cache(self, mock_get):
|
||||||
|
mock_get.side_effect = Exception("network error")
|
||||||
|
|
||||||
|
import agent.models_dev as md
|
||||||
|
md._models_dev_cache = SAMPLE_REGISTRY
|
||||||
|
md._models_dev_cache_time = 0 # expired
|
||||||
|
|
||||||
|
with patch.object(md, "_load_disk_cache", return_value=SAMPLE_REGISTRY):
|
||||||
|
result = fetch_models_dev(force_refresh=True)
|
||||||
|
|
||||||
|
assert "anthropic" in result
|
||||||
|
|
||||||
|
@patch("agent.models_dev.requests.get")
|
||||||
|
def test_in_memory_cache_used(self, mock_get):
|
||||||
|
import agent.models_dev as md
|
||||||
|
import time
|
||||||
|
md._models_dev_cache = SAMPLE_REGISTRY
|
||||||
|
md._models_dev_cache_time = time.time() # fresh
|
||||||
|
|
||||||
|
result = fetch_models_dev()
|
||||||
|
mock_get.assert_not_called()
|
||||||
|
assert result == SAMPLE_REGISTRY
|
||||||
|
|
@ -97,30 +97,32 @@ def test_custom_setup_clears_active_oauth_provider(tmp_path, monkeypatch):
|
||||||
|
|
||||||
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
||||||
|
|
||||||
prompt_values = iter(
|
# _model_flow_custom uses builtins.input (URL, key, model, context_length)
|
||||||
[
|
input_values = iter([
|
||||||
"https://custom.example/v1",
|
"https://custom.example/v1",
|
||||||
"custom-api-key",
|
"custom-api-key",
|
||||||
"custom/model",
|
"custom/model",
|
||||||
]
|
"", # context_length (blank = auto-detect)
|
||||||
)
|
])
|
||||||
monkeypatch.setattr(
|
monkeypatch.setattr("builtins.input", lambda _prompt="": next(input_values))
|
||||||
"hermes_cli.setup.prompt",
|
|
||||||
lambda *args, **kwargs: next(prompt_values),
|
|
||||||
)
|
|
||||||
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
||||||
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
||||||
|
monkeypatch.setattr("hermes_cli.main._save_custom_provider", lambda *args, **kwargs: None)
|
||||||
|
monkeypatch.setattr(
|
||||||
|
"hermes_cli.models.probe_api_models",
|
||||||
|
lambda api_key, base_url: {"models": ["m"], "probed_url": base_url + "/models"},
|
||||||
|
)
|
||||||
|
|
||||||
setup_model_provider(config)
|
setup_model_provider(config)
|
||||||
save_config(config)
|
|
||||||
|
|
||||||
reloaded = load_config()
|
|
||||||
|
|
||||||
|
# Core assertion: switching to custom endpoint clears OAuth provider
|
||||||
assert get_active_provider() is None
|
assert get_active_provider() is None
|
||||||
assert isinstance(reloaded["model"], dict)
|
|
||||||
assert reloaded["model"]["provider"] == "custom"
|
# _model_flow_custom writes config via its own load/save cycle
|
||||||
assert reloaded["model"]["base_url"] == "https://custom.example/v1"
|
reloaded = load_config()
|
||||||
assert reloaded["model"]["default"] == "custom/model"
|
if isinstance(reloaded.get("model"), dict):
|
||||||
|
assert reloaded["model"].get("provider") == "custom"
|
||||||
|
assert reloaded["model"].get("default") == "custom/model"
|
||||||
|
|
||||||
|
|
||||||
def test_codex_setup_uses_runtime_access_token_for_live_model_list(tmp_path, monkeypatch):
|
def test_codex_setup_uses_runtime_access_token_for_live_model_list(tmp_path, monkeypatch):
|
||||||
|
|
|
||||||
|
|
@ -99,21 +99,21 @@ def test_setup_custom_endpoint_saves_working_v1_base_url(tmp_path, monkeypatch):
|
||||||
return tts_idx
|
return tts_idx
|
||||||
raise AssertionError(f"Unexpected prompt_choice call: {question}")
|
raise AssertionError(f"Unexpected prompt_choice call: {question}")
|
||||||
|
|
||||||
def fake_prompt(message, current=None, **kwargs):
|
# _model_flow_custom uses builtins.input (URL, key, model, context_length)
|
||||||
if "API base URL" in message:
|
input_values = iter([
|
||||||
return "http://localhost:8000"
|
"http://localhost:8000",
|
||||||
if "API key" in message:
|
"local-key",
|
||||||
return "local-key"
|
"llm",
|
||||||
if "Model name" in message:
|
"", # context_length (blank = auto-detect)
|
||||||
return "llm"
|
])
|
||||||
return ""
|
monkeypatch.setattr("builtins.input", lambda _prompt="": next(input_values))
|
||||||
|
|
||||||
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
monkeypatch.setattr("hermes_cli.setup.prompt_choice", fake_prompt_choice)
|
||||||
monkeypatch.setattr("hermes_cli.setup.prompt", fake_prompt)
|
|
||||||
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
monkeypatch.setattr("hermes_cli.setup.prompt_yes_no", lambda *args, **kwargs: False)
|
||||||
monkeypatch.setattr("hermes_cli.auth.get_active_provider", lambda: None)
|
monkeypatch.setattr("hermes_cli.auth.get_active_provider", lambda: None)
|
||||||
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
monkeypatch.setattr("hermes_cli.auth.detect_external_credentials", lambda: [])
|
||||||
monkeypatch.setattr("agent.auxiliary_client.get_available_vision_backends", lambda: [])
|
monkeypatch.setattr("agent.auxiliary_client.get_available_vision_backends", lambda: [])
|
||||||
|
monkeypatch.setattr("hermes_cli.main._save_custom_provider", lambda *args, **kwargs: None)
|
||||||
monkeypatch.setattr(
|
monkeypatch.setattr(
|
||||||
"hermes_cli.models.probe_api_models",
|
"hermes_cli.models.probe_api_models",
|
||||||
lambda api_key, base_url: {
|
lambda api_key, base_url: {
|
||||||
|
|
@ -126,16 +126,19 @@ def test_setup_custom_endpoint_saves_working_v1_base_url(tmp_path, monkeypatch):
|
||||||
)
|
)
|
||||||
|
|
||||||
setup_model_provider(config)
|
setup_model_provider(config)
|
||||||
save_config(config)
|
|
||||||
|
|
||||||
env = _read_env(tmp_path)
|
env = _read_env(tmp_path)
|
||||||
reloaded = load_config()
|
|
||||||
|
|
||||||
|
# _model_flow_custom saves env vars and config to disk
|
||||||
assert env.get("OPENAI_BASE_URL") == "http://localhost:8000/v1"
|
assert env.get("OPENAI_BASE_URL") == "http://localhost:8000/v1"
|
||||||
assert env.get("OPENAI_API_KEY") == "local-key"
|
assert env.get("OPENAI_API_KEY") == "local-key"
|
||||||
assert reloaded["model"]["provider"] == "custom"
|
|
||||||
assert reloaded["model"]["base_url"] == "http://localhost:8000/v1"
|
# The model config is saved as a dict by _model_flow_custom
|
||||||
assert reloaded["model"]["default"] == "llm"
|
reloaded = load_config()
|
||||||
|
model_cfg = reloaded.get("model", {})
|
||||||
|
if isinstance(model_cfg, dict):
|
||||||
|
assert model_cfg.get("provider") == "custom"
|
||||||
|
assert model_cfg.get("default") == "llm"
|
||||||
|
|
||||||
|
|
||||||
def test_setup_keep_current_config_provider_uses_provider_specific_model_menu(tmp_path, monkeypatch):
|
def test_setup_keep_current_config_provider_uses_provider_specific_model_menu(tmp_path, monkeypatch):
|
||||||
|
|
|
||||||
|
|
@ -459,7 +459,7 @@ def test_model_flow_custom_saves_verified_v1_base_url(monkeypatch, capsys):
|
||||||
)
|
)
|
||||||
monkeypatch.setattr("hermes_cli.config.save_config", lambda cfg: None)
|
monkeypatch.setattr("hermes_cli.config.save_config", lambda cfg: None)
|
||||||
|
|
||||||
answers = iter(["http://localhost:8000", "local-key", "llm"])
|
answers = iter(["http://localhost:8000", "local-key", "llm", ""])
|
||||||
monkeypatch.setattr("builtins.input", lambda _prompt="": next(answers))
|
monkeypatch.setattr("builtins.input", lambda _prompt="": next(answers))
|
||||||
|
|
||||||
hermes_main._model_flow_custom({})
|
hermes_main._model_flow_custom({})
|
||||||
|
|
|
||||||
|
|
@ -416,7 +416,19 @@ LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo
|
||||||
|
|
||||||
### Context Length Detection
|
### Context Length Detection
|
||||||
|
|
||||||
Hermes automatically detects your model's context length by querying the endpoint's `/v1/models` response. For most setups this works out of the box. If detection fails (the model name doesn't match, the endpoint doesn't expose `/v1/models`, etc.), Hermes falls back to a high default and probes downward on context-length errors.
|
Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:
|
||||||
|
|
||||||
|
1. **Config override** — `model.context_length` in config.yaml (highest priority)
|
||||||
|
2. **Custom provider per-model** — `custom_providers[].models.<id>.context_length`
|
||||||
|
3. **Persistent cache** — previously discovered values (survives restarts)
|
||||||
|
4. **Endpoint `/models`** — queries your server's API (local/custom endpoints)
|
||||||
|
5. **Anthropic `/v1/models`** — queries Anthropic's API for `max_input_tokens` (API-key users only)
|
||||||
|
6. **OpenRouter API** — live model metadata from OpenRouter
|
||||||
|
7. **Nous Portal** — suffix-matches Nous model IDs against OpenRouter metadata
|
||||||
|
8. **[models.dev](https://models.dev)** — community-maintained registry with provider-specific context lengths for 3800+ models across 100+ providers
|
||||||
|
9. **Fallback defaults** — broad model family patterns (128K default)
|
||||||
|
|
||||||
|
For most setups this works out of the box. The system is provider-aware — the same model can have different context limits depending on who serves it (e.g., `claude-opus-4.6` is 1M on Anthropic direct but 128K on GitHub Copilot).
|
||||||
|
|
||||||
To set the context length explicitly, add `context_length` to your model config:
|
To set the context length explicitly, add `context_length` to your model config:
|
||||||
|
|
||||||
|
|
@ -427,10 +439,23 @@ model:
|
||||||
context_length: 131072 # tokens
|
context_length: 131072 # tokens
|
||||||
```
|
```
|
||||||
|
|
||||||
This takes highest priority — it overrides auto-detection, cached values, and hardcoded defaults.
|
For custom endpoints, you can also set context length per model:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
custom_providers:
|
||||||
|
- name: "My Local LLM"
|
||||||
|
base_url: "http://localhost:11434/v1"
|
||||||
|
models:
|
||||||
|
qwen3.5:27b:
|
||||||
|
context_length: 32768
|
||||||
|
deepseek-r1:70b:
|
||||||
|
context_length: 65536
|
||||||
|
```
|
||||||
|
|
||||||
|
`hermes model` will prompt for context length when configuring a custom endpoint. Leave it blank for auto-detection.
|
||||||
|
|
||||||
:::tip When to set this manually
|
:::tip When to set this manually
|
||||||
- Your model shows "2M context" in the status bar (detection failed)
|
- You're using Ollama with a custom `num_ctx` that's lower than the model's maximum
|
||||||
- You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
|
- You want to limit context below the model's maximum (e.g., 8k on a 128k model to save VRAM)
|
||||||
- You're running behind a proxy that doesn't expose `/v1/models`
|
- You're running behind a proxy that doesn't expose `/v1/models`
|
||||||
:::
|
:::
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue