Voice

GoClaw supports real-time voice conversations through the HTTP Voice channel. Talk to your agent naturally using your microphone and hear responses spoken aloud.

Overview

Voice uses the xAI Realtime Voice API to provide:

Real-time conversation — Low-latency voice interaction
Natural interruption — Speak while the agent is talking to interrupt
Voice activity detection — Automatic detection of when you start/stop speaking
Tool support — Agent can use tools during voice conversations

Prerequisites

xAI API key with realtime voice access
Modern browser with microphone support (Chrome, Firefox, Safari, Edge)
HTTPS — Microphone access requires secure context (localhost works for development)

Setup

1. Configure VoiceLLM

In goclaw.json:

{
  "voicellm": {
    "enabled": true,
    "default": "xai",
    "serverVAD": true,
    "idleTimeout": 300,
    "providers": {
      "xai": {
        "driver": "xai",
        "apiKey": "xai-...",
        "voice": "Eve"
      }
    }
  }
}

Or use the setup wizard (initial setup includes a voice step):

goclaw setup        # Initial setup includes voice configuration
goclaw setup edit   # Edit existing config

2. Enable HTTP Channel

Voice is accessed through the HTTP channel, which must be enabled:

{
  "channels": {
    "http": {
      "enabled": true,
      "port": 1337
    }
  }
}

3. Access Voice Interface

Open your browser to:

http://localhost:1337/voice

Click the microphone button and start talking.

Configuration

Global Settings

Field	Default	Description
`enabled`	`false`	Enable voice functionality
`default`	-	Default provider name
`serverVAD`	`true`	Server-side voice activity detection
`idleTimeout`	`300`	Session timeout in seconds

Provider Settings

Field	Default	Description
`driver`	-	`xai`
`apiKey`	-	xAI API key
`voice`	`Eve`	Voice to use
`sampleRate`	`48000`	Audio sample rate in Hz

Audio Effects

GoClaw can apply optional real-time effects to spoken output.

{
  "voicellm": {
    "effects": {
      "mode": "both",
      "ring": {
        "carrierFreq": 200,
        "mix": 0.7
      },
      "bitcrush": {
        "bitDepth": 8,
        "downsample": 2
      }
    }
  }
}

The setup editor and web config page expose both preset-based and custom controls for audio effects. Built-in presets include:

None
Battlestar Galactica
Dalek
Metallic
Lo-Fi Radio
Custom

Field	Default	Description
`effects.mode`	`none`	`none`, `ring`, `bitcrush`, or `both`
`effects.ring.carrierFreq`	`200`	Ring-modulation carrier frequency in Hz
`effects.ring.mix`	`0.7`	Ring-modulation wet/dry mix
`effects.bitcrush.bitDepth`	`8`	Target bit depth for crunchy output
`effects.bitcrush.downsample`	`2`	Downsample factor for lo-fi output

Available Voices (xAI)

Voice	Description
`Eve`	Female, energetic (default)
`Ara`	Female, warm and friendly
`Rex`	Male, confident and clear
`Sal`	Neutral, smooth and balanced
`Leo`	Male, authoritative

Voice Activity Detection

GoClaw supports two VAD modes:

Server VAD (serverVAD: true, default)

xAI detects when you start/stop speaking
Lower latency
Works best with clear audio

Client VAD (serverVAD: false)

Browser detects voice activity
More control over sensitivity
Useful if server VAD has issues with your microphone

Prompt Customization

Customize how the agent responds in voice conversations:

{
  "voicellm": {
    "prompt": {
      "language": "English",
      "maxSentences": 3,
      "pronunciations": {
        "GoClaw": "go-claw",
        "API": "A.P.I."
      },
      "additionalInstructions": "Speak casually and use contractions."
    }
  }
}

Field	Default	Description
`language`	-	Preferred response language
`maxSentences`	`3`	Keep responses concise for voice
`pronunciations`	-	Custom word pronunciations
`additionalInstructions`	-	Extra instructions for voice mode

How It Works

Browser                        GoClaw                       xAI
   │                             │                            │
   │ ◄─── WebSocket ───────────► │                            │
   │     (audio chunks)          │ ◄─── WebSocket ──────────► │
   │                             │     (xAI realtime API)     │
   │                             │                            │
   │  mic audio ────────────────►│ ───────────────────────────►
   │ ◄───────────────────────────│ ◄────────────────────────── 
   │  speaker audio              │         agent audio        │

Browser captures microphone audio and sends to GoClaw via WebSocket
GoClaw forwards audio to xAI’s realtime API
xAI processes speech, generates response, returns audio
GoClaw streams response audio back to browser
Browser plays audio through speakers

Interruption

Voice supports natural interruption:

Start speaking while the agent is talking
Agent stops its current response
Your speech is processed immediately
Agent responds to your interruption

This creates natural, conversational flow.

Tools in Voice Mode

The agent can use tools during voice conversations. When a tool is called:

Agent announces what it’s doing (e.g., “Let me check that…”)
Tool executes
Agent speaks the result

Tool calls may cause brief pauses in the conversation.

Voice Page Notes

The /voice page includes:

a live microphone visualizer
browser-side audio playback for assistant responses
a small hidden BSG-themed easter egg on the connect button and visualizer

These browser-only extras do not change the VoiceLLM runtime configuration.

Browser Requirements

Browser	Support
Chrome	Full support
Firefox	Full support
Safari	Full support (macOS/iOS)
Edge	Full support

Required permissions:

Microphone access
Audio playback (usually automatic)

Security

Voice sessions use the same authentication as HTTP chat
Only authenticated users can access /voice
Audio is streamed directly to xAI — not stored by GoClaw
Session ends after idleTimeout seconds of inactivity

About

Getting Started

LLM Providers

Channels

Tools

Agent Memory

Advanced

Security

Voice

Overview

Prerequisites

Setup

1. Configure VoiceLLM

2. Enable HTTP Channel

3. Access Voice Interface

Configuration

Global Settings

Provider Settings

Audio Effects

Available Voices (xAI)

Voice Activity Detection

Prompt Customization

How It Works

Interruption

Tools in Voice Mode

Voice Page Notes

Browser Requirements

Security

See Also