Instead of defaulting to 2M for unknown local models, query the server API for the real context length. Supports Ollama (/api/show), vLLM (max_model_len), and LM Studio (/v1/models). Results are cached to avoid repeated queries. |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| anthropic_adapter.py | ||
| auxiliary_client.py | ||
| context_compressor.py | ||
| copilot_acp_client.py | ||
| display.py | ||
| insights.py | ||
| model_metadata.py | ||
| prompt_builder.py | ||
| prompt_caching.py | ||
| redact.py | ||
| skill_commands.py | ||
| smart_model_routing.py | ||
| title_generator.py | ||
| trajectory.py | ||
| usage_pricing.py | ||