Diagnose and fix AI provider issues. Config: services/ai/config.yaml
Help:
{ "command": "core playbooks show domain_ai-troubleshooting" }viasystemprompt_helpRequires: Active session -> See Session
Diagnostic Checklist
{ "command": "infra services status" } { "command": "admin config show --section ai" } { "command": "cloud secrets list" } { "command": "infra logs --context ai --limit 50" } { "command": "plugins mcp status" }
Issue: Provider Authentication Failed
Symptoms: "Invalid API key", "Unauthorized", requests fail immediately
Step 1: Check secret is set
{ "command": "cloud secrets list" }
Step 2: Check config uses ${VAR_NAME} syntax
{ "command": "admin config show --section ai" }
Step 3: View error logs
{ "command": "infra logs --context ai --level error --limit 20" }
Solutions:
Secret not set:
{ "command": "cloud secrets set ANTHROPIC_API_KEY "sk-ant-api03-..."" }
Secret invalid (update with correct key):
{ "command": "cloud secrets set ANTHROPIC_API_KEY "correct-api-key"" }
Key prefixes:
- Anthropic:
sk-ant- - OpenAI:
sk- - Gemini:
AIza
Issue: Rate Limiting
Symptoms: "Rate limit exceeded", "Too many requests", 429 errors
Step 1: Check request volume
{ "command": "analytics ai --period hour" }
Step 2: Check fallback config
{ "command": "admin config show --section ai" }
Solutions:
Enable fallback in services/ai/config.yaml:
ai:
sampling:
fallback_enabled: true
providers:
anthropic:
enabled: true
gemini:
enabled: true
Issue: Model Not Available
Symptoms: "Model not found", "Invalid model"
{ "command": "admin config show --section ai" }
Valid model names:
providers:
anthropic:
default_model: claude-sonnet-4-20250514
openai:
default_model: gpt-4-turbo
gemini:
default_model: gemini-2.5-flash
Anthropic: claude-opus-4-20250514, claude-sonnet-4-20250514, claude-haiku-3-20240307
OpenAI: gpt-4-turbo, gpt-4o, gpt-4o-mini, gpt-3.5-turbo
Gemini: gemini-2.5-flash, gemini-2.5-pro, gemini-1.5-flash
Issue: Token Limit Exceeded
Symptoms: "Token limit exceeded", "Input too long", responses cut off
{ "command": "admin config show --section ai" }
Token limits:
| Provider | Model | Input | Output |
|---|---|---|---|
| Anthropic | claude-opus-4 | 200K | 32K |
| Anthropic | claude-sonnet-4 | 200K | 16K |
| OpenAI | gpt-4-turbo | 128K | 4K |
| Gemini | gemini-2.5-flash | 1M | 8K |
Solutions:
Increase output limit in services/ai/config.yaml:
ai:
default_max_output_tokens: 16384
Use larger context model:
providers:
gemini:
enabled: true
default_model: gemini-2.5-flash
Issue: Tool Execution Timeout
Symptoms: "Tool execution timed out", long delays
Step 1: Check timeout config
{ "command": "admin config show --section ai" }
Step 2: Check MCP status
{ "command": "plugins mcp status" }
Step 3: View MCP logs
{ "command": "plugins mcp logs <server_name>" }
Solutions:
Increase timeout in services/ai/config.yaml:
ai:
mcp:
execution_timeout_ms: 60000
retry_attempts: 3
Restart MCP:
{ "command": "plugins mcp restart <server_name>" }
Issue: No Providers Available
Symptoms: "No providers available", all requests fail
{ "command": "admin config show --section ai" } { "command": "cloud secrets list" }
Solution: Enable at least one provider:
providers:
anthropic:
enabled: true
api_key: ${ANTHROPIC_API_KEY}
{ "command": "cloud secrets set ANTHROPIC_API_KEY "sk-ant-..."" }
Issue: Smart Routing Not Working
Symptoms: Requests always go to default provider
{ "command": "admin config show --section ai" }
Solution: Enable smart routing with multiple providers:
ai:
sampling:
enable_smart_routing: true
providers:
anthropic:
enabled: true
openai:
enabled: true
gemini:
enabled: true
Issue: Fallback Not Working
Symptoms: Primary fails, no fallback occurs
{ "command": "admin config show --section ai" }
Solution: Enable fallback with backup providers:
ai:
sampling:
fallback_enabled: true
providers:
anthropic:
enabled: true
api_key: ${ANTHROPIC_API_KEY}
openai:
enabled: true
api_key: ${OPENAI_API_KEY}
Set all keys:
{ "command": "cloud secrets set ANTHROPIC_API_KEY "..."" } { "command": "cloud secrets set OPENAI_API_KEY "..."" }
Issue: Slow Responses
Symptoms: Long response times, timeouts on complex queries
{ "command": "infra logs --context ai --limit 50" } { "command": "analytics ai --period hour" }
Solutions:
Use faster model:
providers:
gemini:
enabled: true
default_model: gemini-2.5-flash
Enable smart routing:
sampling:
enable_smart_routing: true
Log Messages
| Message | Meaning |
|---|---|
Provider request started |
Request sent |
Provider response received |
Success |
Provider error: rate_limit |
Rate limited |
Provider error: auth_failed |
Invalid key |
Tool execution started |
MCP tool called |
Tool execution timeout |
Tool too slow |
Fallback triggered |
Trying backup |
{ "command": "infra logs --context ai --level error" } { "command": "infra logs --follow" }
Quick Reference
| Problem | First Command |
|---|---|
| Auth failures | cloud secrets list |
| Rate limiting | analytics ai --period hour |
| Model errors | admin config show --section ai |
| Token limits | admin config show --section ai |
| Tool timeouts | plugins mcp status |
| No providers | admin config show --section ai |
| Any issue | infra logs --context ai --level error |
Related
-> See AI Providers -> See MCP Troubleshooting -> See Agent Troubleshooting -> See AI Service