Documentation Index
Fetch the complete documentation index at: https://voxray-cac3ed72.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites before you begin:
- Go 1.25+ — verify with
go version. The default WebSocket build has no other system dependencies; no C compiler or CGO needed. - A provider API key — the fastest path is a Groq API key (
GROQ_API_KEY), which has a free tier and covers STT, LLM, and TTS in a single account. Get one at console.groq.com.
Clone the repository
Terminal
config.example.json, a web/ directory with a browser client, and a Makefile with common build targets.Configure your providers
Copy the example config and open it for editing:Then set at minimum these four fields. The example below uses Groq for all three pipeline stages — STT, LLM, and TTS — which requires only one API key:Environment variable alternative — if you prefer not to write API keys into You can also point Voxray at a different config file path using the
Terminal
config.json
config.json, export them as environment variables instead. Voxray resolves all config values from env vars automatically:- macOS / Linux
- Windows (PowerShell)
Terminal
VOXRAY_CONFIG environment variable instead of the -config flag.Build and run
- Using Make (recommended)
- Using go run (no build step)
- Manual build
- Windows (PowerShell)
Terminal
config.json:Terminal
-config, -transport (websocket, smallwebrtc, both, daily, twilio, telnyx, plivo, exotel), -port, -proxy (public hostname for telephony webhooks), -dialin (Daily PSTN). Use -init to scaffold config.json and directories then exit.Connect and speak
With the server running, you have two ways to connect:Option A — browser client (easiest)Open The page will prompt your browser for microphone access. Click Connect, then speak. You will hear the agent respond in real time.Option B — raw WebSocket clientConnect any WebSocket client to:Voxray exchanges JSON frames on this connection. Send audio as binary frames; the server sends back
web/index.html directly in your browser (no server required for the HTML file itself):Terminal
TranscriptionFrame and AudioFrame messages. You can also connect with ?format=protobuf for binary frame encoding or ?rtvi=1 for RTVI protocol compatibility.Available endpoints once running:| Endpoint | Description |
|---|---|
GET /ws | WebSocket transport (upgrade) |
GET /health | Liveness check — returns 200 OK |
GET /ready | Readiness check |
GET /metrics | Prometheus metrics scrape endpoint |
GET /swagger/ | Swagger UI (when built with make swagger) |
Troubleshooting
Port already in use — address already in use: bind 0.0.0.0:3042
Port already in use — address already in use: bind 0.0.0.0:3042
Another process is already bound to port 3042. Change the port in Or override it at startup without editing the file:To find what is using the port:
config.json:config.json
Terminal
Terminal
Missing API key — provider returned 401 or authentication error
Missing API key — provider returned 401 or authentication error
Voxray will start successfully even if an API key is missing or wrong, but STT or LLM calls will fail at runtime when a voice session begins.Set the key in Or export it as an environment variable before starting the server. For Groq:For OpenAI use
config.json:config.json
Terminal
OPENAI_API_KEY, for ElevenLabs use ELEVENLABS_API_KEY, and so on. Provider env var names follow the pattern <PROVIDER_NAME>_API_KEY in uppercase.Connection refused — browser or client cannot reach the server
Connection refused — browser or client cannot reach the server
If you are connecting from a different machine, a container, or the browser is on a different network than the server process, If browsers on a different origin are connecting, also add the origin to
localhost in config.json will only accept loopback connections.Change host to bind on all interfaces:config.json
cors_allowed_origins:config.json
No response from agent — I speak but nothing happens
No response from agent — I speak but nothing happens
The two most common causes are missing turn detection config and a mic volume that falls below the VAD threshold.First, make sure Then verify:
turn_detection is set to "silence" and turn_stop_secs is at least 2.0:config.json
- Speak for at least 1–2 seconds — the VAD needs a sustained speech segment before it triggers STT.
- Check your microphone — the browser must have microphone permission granted. Look for a camera/mic icon in the address bar.
- Lower VAD volume threshold — if your microphone is quiet, reduce
vad_min_volumeto0.15or0.10inconfig.json. - Check server logs — if audio is arriving you will see log lines with
vadorstt. If you see no log activity after speaking, the audio is not reaching the server.
WebRTC-specific error — opus encoder unavailable (build without cgo)
WebRTC-specific error — opus encoder unavailable (build without cgo)
Next Steps
Architecture
Understand the pipeline internals: runner, transport, processors, VAD, and how frames flow between stages.
Core Concepts
Config reference, provider matrix, turn detection modes, plugin system, and recording setup.
WebRTC Quickstart
Add WebRTC transport for browser-native audio with Opus encoding and lower latency.
Telephony
Connect Twilio, Telnyx, Plivo, or Exotel for inbound and outbound phone call agents.