Ollama Provider
The Ollama provider connects GoClaw to locally-running Ollama for inference, embeddings, and summarization.
Configuration
{
"llm": {
"providers": {
"ollama": {
"driver": "ollama",
"url": "http://localhost:11434"
}
},
"summarization": {
"models": ["ollama/qwen2.5:7b"]
},
"embeddings": {
"models": ["ollama/nomic-embed-text"]
}
}
}
Options
| Field | Type | Default | Description |
|---|---|---|---|
url | string | - | Ollama server URL |
maxTokens | int | - | Output token limit |
contextTokens | int | auto | Context window override (queried from Ollama) |
timeoutSeconds | int | 300 | Request timeout |
embeddingOnly | bool | false | Use only for embeddings |
Use Cases
Summarization
Ollama is commonly used for compaction summaries to avoid cloud API costs:
{
"llm": {
"providers": {
"ollama-summarize": {
"driver": "ollama",
"url": "http://localhost:11434"
},
"claude": {
"driver": "anthropic",
"apiKey": "YOUR_API_KEY"
}
},
"agent": {
"models": ["claude/claude-sonnet-4-20250514"]
},
"summarization": {
"models": ["ollama-summarize/qwen2.5:7b", "claude/claude-3-haiku-20240307"]
}
}
}
This uses Ollama for summarization (free, local) with Anthropic as fallback.
Embeddings
For semantic search (memory_search and transcript):
{
"memory": {
"enabled": true,
"query": {
"maxResults": 6,
"minScore": 0.35
}
}
}
Or via the LLM config:
{
"llm": {
"providers": {
"ollama-embed": {
"driver": "ollama",
"url": "http://localhost:11434",
"embeddingOnly": true
}
},
"embeddings": {
"models": ["ollama-embed/nomic-embed-text"]
}
}
}
Agent (Local-Only)
For fully local operation:
{
"llm": {
"providers": {
"ollama": {
"driver": "ollama",
"url": "http://localhost:11434",
"contextTokens": 131072
}
},
"agent": {
"models": ["ollama/qwen2.5:32b"]
},
"summarization": {
"models": ["ollama/qwen2.5:7b"]
},
"embeddings": {
"models": ["ollama/nomic-embed-text"]
}
}
}
Recommended Models
| Use Case | Model | Notes |
|---|---|---|
| Summarization | qwen2.5:7b | Good balance of speed and quality |
| Summarization | llama3.2:3b | Faster, lower quality |
| Embeddings | nomic-embed-text | Best for semantic search |
| Embeddings | all-minilm | Faster, smaller vectors |
| Agent | qwen2.5:32b | Large context, good tool use |
Context Window
Ollama queries the model’s context size automatically. Override with contextTokens if needed:
{
"providers": {
"ollama": {
"driver": "ollama",
"url": "http://localhost:11434",
"contextTokens": 131072
}
}
}
Troubleshooting
“Ollama not available”
- Check Ollama is running:
curl http://localhost:11434/api/tags - Start Ollama:
ollama serve - Verify URL in config matches server address
“context deadline exceeded”
Increase timeout or use a smaller model:
{
"providers": {
"ollama": {
"driver": "ollama",
"url": "http://localhost:11434",
"timeoutSeconds": 600
}
}
}
Model Not Found
Pull the model first:
ollama pull qwen2.5:7b
ollama pull nomic-embed-text
Slow Performance
- Use GPU acceleration if available
- Try smaller models (
7binstead of14b) - Reduce
contextTokensif not needed
See Also
- LLM Providers — Provider overview
- Embeddings — Embedding configuration
- Session Management — Summarization config