fix(anthropic): retry 429/529 errors and surface error details to users

- 429 rate limit and 529 overloaded were incorrectly treated as
  non-retryable client errors, causing immediate failure instead of
  exponential backoff retry. Users hitting Anthropic rate limits got
  silent failures or no response at all.
- Generic "Sorry, I encountered an unexpected error" now includes
  error type, details, and status-specific hints (auth, rate limit,
  overloaded).
- Failed agent with final_response=None now surfaces the actual
  error message instead of returning an empty response.
This commit is contained in:
0xbyt4 2026-03-17 00:30:26 +03:00
parent 5e5c92663d
commit d998cac319
3 changed files with 506 additions and 4 deletions

View file

@ -5396,10 +5396,13 @@ class AIAgent:
# These indicate a problem with the request itself (bad model ID,
# invalid API key, forbidden, etc.) and will never succeed on retry.
# Note: 413 and context-length errors are excluded — handled above.
# 429 (rate limit) is transient and MUST be retried with backoff.
# 529 (Anthropic overloaded) is also transient.
# Also catch local validation errors (ValueError, TypeError) — these
# are programming bugs, not transient failures.
_RETRYABLE_STATUS_CODES = {413, 429, 529}
is_local_validation_error = isinstance(api_error, (ValueError, TypeError))
is_client_status_error = isinstance(status_code, int) and 400 <= status_code < 500 and status_code != 413
is_client_status_error = isinstance(status_code, int) and 400 <= status_code < 500 and status_code not in _RETRYABLE_STATUS_CODES
is_client_error = (is_local_validation_error or is_client_status_error or any(phrase in error_msg for phrase in [
'error code: 401', 'error code: 403',
'error code: 404', 'error code: 422',