LLM Providers
GoClaw supports multiple LLM providers through a unified registry system. This enables flexible model selection, automatic failover, and purpose-specific provider chains.
Supported Providers
| Provider | Type | Use Cases |
|---|---|---|
| Anthropic | Cloud | Agent responses (Claude), extended thinking, prompt caching |
| OpenAI | Cloud/Local | GPT models, OpenAI-compatible APIs (LM Studio, LocalAI) |
| Ollama | Local | Local inference, embeddings, summarization |
| xAI | Cloud | Grok models, stateful conversations, server-side tools |
| Hugot | Local | Built-in embeddings-only provider for semantic search |
Quick Setup
Minimal Config (Single Provider)
For basic usage with Anthropic:
{
"llm": {
"providers": {
"anthropic": {
"driver": "anthropic",
"apiKey": "sk-ant-...",
"promptCaching": true
}
},
"agent": {
"models": ["anthropic/claude-sonnet-4-20250514"]
}
}
}
If you leave the embeddings chain empty, GoClaw automatically restores the built-in hugot-local provider and seeds the default local embeddings model.
Multi-Provider Setup
For advanced setups with multiple providers and purpose-specific chains:
{
"llm": {
"providers": {
"claude": {
"driver": "anthropic",
"apiKey": "sk-ant-...",
"promptCaching": true
},
"ollama-qwen": {
"driver": "ollama",
"url": "http://localhost:11434"
},
"hugot-local": {
"driver": "hugot",
"embeddingOnly": true
}
},
"agent": {
"models": ["claude/claude-sonnet-4-20250514"]
},
"summarization": {
"models": ["ollama-qwen/qwen2.5:7b", "claude/claude-3-haiku-20240307"]
},
"embeddings": {
"models": ["hugot-local/KnightsAnalytics/all-MiniLM-L6-v2"]
}
}
}
Purpose Chains
GoClaw routes LLM requests based on purpose:
| Purpose | Config Key | Used For |
|---|---|---|
agent | agent | Main conversation, tool use |
summarization | summarization | Compaction summaries, checkpoints |
embeddings | embeddings | Semantic search (memory, transcripts, Memory Graph) |
heartbeat | heartbeat | Periodic heartbeat tasks |
cron | cron | Scheduled cron jobs |
hass | hass | Home Assistant queries |
memory_extraction | memoryExtraction | Memory Graph entity extraction |
If a purpose has no models configured, it falls back to the agent chain.
Each purpose has a model chain — the first model is primary, others are fallbacks:
{
"llm": {
"summarization": {
"models": [
"ollama-qwen/qwen2.5:7b",
"claude/claude-3-haiku-20240307"
]
}
}
}
The first model (ollama-qwen/qwen2.5:7b) is tried first. If it fails, the next model in the chain is used as fallback.
Automatic Failover
When a provider fails:
- Error is classified (rate limit, auth, timeout, server error)
- Provider enters cooldown with exponential backoff
- Next model in chain is tried
- After cooldown expires, original provider is tried again
Check provider status with /llm command:
LLM Provider Status
claude: healthy
ollama-qwen: cooldown (rate_limit), retry in 2m30s
ollama-embed: healthy
Thinking Levels
Extended thinking/reasoning can be enabled for supported models. This tells the LLM to “think through” complex problems before responding.
Available Levels
| Level | Description | Anthropic Tokens |
|---|---|---|
off | No extended thinking | 0 |
minimal | Quick responses | 1,024 |
low | Light reasoning | 4,096 |
medium | Balanced (default) | 10,000 |
high | Deep reasoning | 25,000 |
xhigh | Maximum effort | 50,000 |
Configuration
Per-user in users.json:
{
"users": [
{
"name": "Alice",
"role": "owner",
"thinking": true,
"thinkingLevel": "medium"
}
]
}
Or dynamically via Telegram/TUI settings.
Provider Support
| Provider | Thinking Support |
|---|---|
| Anthropic | Yes (Claude 3.5+), token budget |
| OpenAI | Via OpenRouter reasoning |
| Ollama | Model-dependent |
| xAI | Yes (grok-3-mini), effort levels |
Provider Configuration
Common Options
All providers support:
{
"driver": "anthropic", // Required: provider driver
"apiKey": "...", // API key (or env var)
"maxTokens": 8192, // Output limit override
"contextTokens": 200000, // Context window override
"timeoutSeconds": 300, // Request timeout
"trace": true, // Enable request tracing
"dumpOnSuccess": false // Keep request dumps on success
}
Provider-Specific Options
Anthropic:
{
"driver": "anthropic",
"promptCaching": true // Enable prompt caching (reduces cost)
}
OpenAI:
{
"driver": "openai",
"baseURL": "https://api.openai.com/v1" // Or compatible endpoint
}
Ollama:
{
"driver": "ollama",
"url": "http://localhost:11434",
"embeddingOnly": true // Skip chat availability check
}
Hugot (embeddings only):
{
"driver": "hugot",
"embeddingOnly": true
}
Hugot is the built-in local embeddings provider. It is intended for the embeddings purpose, not for agent or summarization.
xAI:
{
"driver": "xai",
"serverToolsAllowed": ["web_search"], // Server-side tools
"maxTurns": 5 // Max agentic turns
}
Model Reference Format
Models are referenced as provider/model:
claude/claude-sonnet-4-20250514
ollama-qwen/qwen2.5:7b
openai/gpt-4o
xai/grok-3
hugot-local/KnightsAnalytics/all-MiniLM-L6-v2
The provider name is the key from your providers config, not the provider type.
Cooldown Management
View Status
/llm
Shows all providers, their status, and any cooldowns.
Reset Cooldowns
/llm reset
Clears active provider cooldowns so model chains can retry immediately.
Cooldown Behavior
| Error Type | Initial Cooldown | Max Cooldown |
|---|---|---|
| Rate limit | 30s | 5 min |
| Auth error | 1 hour | 1 hour |
| Server error | 1 min | 10 min |
| Timeout | 30s | 5 min |
Cooldowns use exponential backoff within these ranges.
See Also
- Anthropic Provider — Claude models, prompt caching
- OpenAI Provider — GPT and compatible APIs
- Ollama Provider — Local inference
- xAI Provider — Grok models
- Embeddings — Embeddings purpose configuration
- Memory Graph — Memory extraction purpose
- Configuration — Full config reference
- Session Management — Summarization config