LLM Providers

GoClaw supports multiple LLM providers through a unified registry system. This enables flexible model selection, automatic failover, and purpose-specific provider chains.

Supported Providers

ProviderTypeUse Cases
AnthropicCloudAgent responses (Claude), extended thinking, prompt caching
OpenAICloud/LocalGPT models, OpenAI-compatible APIs (LM Studio, LocalAI)
OllamaLocalLocal inference, embeddings, summarization
xAICloudGrok models, stateful conversations, server-side tools
HugotLocalBuilt-in embeddings-only provider for semantic search

Quick Setup

Minimal Config (Single Provider)

For basic usage with Anthropic:

{
  "llm": {
    "providers": {
      "anthropic": {
        "driver": "anthropic",
        "apiKey": "sk-ant-...",
        "promptCaching": true
      }
    },
    "agent": {
      "models": ["anthropic/claude-sonnet-4-20250514"]
    }
  }
}

If you leave the embeddings chain empty, GoClaw automatically restores the built-in hugot-local provider and seeds the default local embeddings model.

Multi-Provider Setup

For advanced setups with multiple providers and purpose-specific chains:

{
  "llm": {
    "providers": {
      "claude": {
        "driver": "anthropic",
        "apiKey": "sk-ant-...",
        "promptCaching": true
      },
      "ollama-qwen": {
        "driver": "ollama",
        "url": "http://localhost:11434"
      },
      "hugot-local": {
        "driver": "hugot",
        "embeddingOnly": true
      }
    },
    "agent": {
      "models": ["claude/claude-sonnet-4-20250514"]
    },
    "summarization": {
      "models": ["ollama-qwen/qwen2.5:7b", "claude/claude-3-haiku-20240307"]
    },
    "embeddings": {
      "models": ["hugot-local/KnightsAnalytics/all-MiniLM-L6-v2"]
    }
  }
}

Purpose Chains

GoClaw routes LLM requests based on purpose:

PurposeConfig KeyUsed For
agentagentMain conversation, tool use
summarizationsummarizationCompaction summaries, checkpoints
embeddingsembeddingsSemantic search (memory, transcripts, Memory Graph)
heartbeatheartbeatPeriodic heartbeat tasks
croncronScheduled cron jobs
hasshassHome Assistant queries
memory_extractionmemoryExtractionMemory Graph entity extraction

If a purpose has no models configured, it falls back to the agent chain.

Each purpose has a model chain — the first model is primary, others are fallbacks:

{
  "llm": {
    "summarization": {
      "models": [
        "ollama-qwen/qwen2.5:7b",
        "claude/claude-3-haiku-20240307"
      ]
    }
  }
}

The first model (ollama-qwen/qwen2.5:7b) is tried first. If it fails, the next model in the chain is used as fallback.

Automatic Failover

When a provider fails:

  1. Error is classified (rate limit, auth, timeout, server error)
  2. Provider enters cooldown with exponential backoff
  3. Next model in chain is tried
  4. After cooldown expires, original provider is tried again

Check provider status with /llm command:

LLM Provider Status

claude: healthy
ollama-qwen: cooldown (rate_limit), retry in 2m30s
ollama-embed: healthy

Thinking Levels

Extended thinking/reasoning can be enabled for supported models. This tells the LLM to “think through” complex problems before responding.

Available Levels

LevelDescriptionAnthropic Tokens
offNo extended thinking0
minimalQuick responses1,024
lowLight reasoning4,096
mediumBalanced (default)10,000
highDeep reasoning25,000
xhighMaximum effort50,000

Configuration

Per-user in users.json:

{
  "users": [
    {
      "name": "Alice",
      "role": "owner",
      "thinking": true,
      "thinkingLevel": "medium"
    }
  ]
}

Or dynamically via Telegram/TUI settings.

Provider Support

ProviderThinking Support
AnthropicYes (Claude 3.5+), token budget
OpenAIVia OpenRouter reasoning
OllamaModel-dependent
xAIYes (grok-3-mini), effort levels

Provider Configuration

Common Options

All providers support:

{
  "driver": "anthropic",       // Required: provider driver
  "apiKey": "...",             // API key (or env var)
  "maxTokens": 8192,           // Output limit override
  "contextTokens": 200000,     // Context window override
  "timeoutSeconds": 300,       // Request timeout
  "trace": true,               // Enable request tracing
  "dumpOnSuccess": false       // Keep request dumps on success
}

Provider-Specific Options

Anthropic:

{
  "driver": "anthropic",
  "promptCaching": true        // Enable prompt caching (reduces cost)
}

OpenAI:

{
  "driver": "openai",
  "baseURL": "https://api.openai.com/v1"  // Or compatible endpoint
}

Ollama:

{
  "driver": "ollama",
  "url": "http://localhost:11434",
  "embeddingOnly": true        // Skip chat availability check
}

Hugot (embeddings only):

{
  "driver": "hugot",
  "embeddingOnly": true
}

Hugot is the built-in local embeddings provider. It is intended for the embeddings purpose, not for agent or summarization.

xAI:

{
  "driver": "xai",
  "serverToolsAllowed": ["web_search"],  // Server-side tools
  "maxTurns": 5                // Max agentic turns
}

Model Reference Format

Models are referenced as provider/model:

claude/claude-sonnet-4-20250514
ollama-qwen/qwen2.5:7b
openai/gpt-4o
xai/grok-3
hugot-local/KnightsAnalytics/all-MiniLM-L6-v2

The provider name is the key from your providers config, not the provider type.


Cooldown Management

View Status

/llm

Shows all providers, their status, and any cooldowns.

Reset Cooldowns

/llm reset

Clears active provider cooldowns so model chains can retry immediately.

Cooldown Behavior

Error TypeInitial CooldownMax Cooldown
Rate limit30s5 min
Auth error1 hour1 hour
Server error1 min10 min
Timeout30s5 min

Cooldowns use exponential backoff within these ranges.


See Also