Skip to main content

Documentation Index

Fetch the complete documentation index at: https://voxray-cac3ed72.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites

Before installing Voxray, confirm the following tools are available on your machine.
RequirementCommandNotes
Go 1.25+go versionRequired for all builds
Gitgit --versionRequired to clone the repository
C compiler (gcc or clang)gcc --versionRequired only for WebRTC builds with Opus audio
The default WebSocket-only build has no C compiler dependency. You only need a C compiler if you plan to use WebRTC transport with TTS audio output via Opus encoding.

Install Go and System Dependencies

Install Go using Homebrew:
brew install go
Verify the installation:
go version
# Expected: go version go1.25.x darwin/arm64 (or amd64)
The CGO toolchain (Clang) ships with Xcode Command Line Tools. Install or confirm it is present:
xcode-select --install
If already installed, this command exits immediately. Verify:
clang --version
# Expected: Apple clang version 15.x.x

Clone the Repository

git clone https://github.com/voxray-ai/voxray-ai.git
cd voxray-ai
Download Go module dependencies:
go mod tidy

Build Variants

Voxray ships two primary build targets. Choose based on which transport you need.
CommandCGOTransport SupportBinary Size
make buildDisabledWebSocket only~15 MB
make build-voiceEnabledWebSocket + WebRTC (Opus)~20 MB

Default build — WebSocket only

No C compiler required. CGO is explicitly disabled so the binary is fully static and portable.
make build
# Equivalent to: go build -o voxray ./cmd/voxray

Voice build — WebSocket and WebRTC with Opus

Requires gcc or clang on your PATH. Enables the Opus encoder so the server can deliver TTS audio over WebRTC peer connections. Linux / macOS:
make build-voice
# Equivalent to: CGO_ENABLED=1 go build -o voxray ./cmd/voxray
Windows (PowerShell):
.\scripts\build-voice.ps1
Manual — any OS:
CGO_ENABLED=1 go build -o voxray ./cmd/voxray
If you run a binary built without CGO and a client connects via WebRTC, the server returns HTTP 503 and logs opus encoder unavailable (build without cgo). Use make build-voice if WebRTC transport is required.

Docker

Voxray ships a production-ready multi-stage Dockerfile. The default Docker build produces a WebSocket-only binary (CGO disabled, static binary on Alpine). Build the image:
docker build -t voxray .
Run with a config file mounted:
docker run -p 8080:8080 \
  -v $(pwd)/config.json:/app/config.json \
  voxray
Pass a config path via environment variable:
docker run -p 8080:8080 \
  -e VOXRAY_CONFIG=/app/config.json \
  -v $(pwd)/config.json:/app/config.json \
  voxray
The Docker image exposes port 8080 by default. The VOXRAY_CONFIG environment variable is pre-set to /app/config.json in the image. Mount your config at that path or override VOXRAY_CONFIG to point elsewhere.

Environment Setup

Create your config file

Copy the example config and open it in your editor:
cp config.example.json config.json

Four fields you must configure before first run

Every config file requires at minimum these four top-level settings before the server can start a voice pipeline:
FieldTypeDescriptionExample
stt_providerstringSpeech-to-text provider name"openai"
llm_providerstringLanguage model provider name"openai"
tts_providerstringText-to-speech provider name"openai"
api_keysobjectMap of provider name to API key{"openai": "sk-..."}
A minimal working config looks like:
{
  "transport": "websocket",
  "host": "0.0.0.0",
  "port": 8080,

  "stt_provider": "openai",
  "stt_model": "gpt-4o-mini-transcribe",

  "llm_provider": "openai",
  "model": "gpt-4.1-mini",

  "tts_provider": "openai",
  "tts_voice": "alloy",

  "api_keys": {
    "openai": "YOUR_OPENAI_API_KEY"
  }
}
Replace YOUR_OPENAI_API_KEY with your actual key. API keys can also be supplied via environment variables (for example, OPENAI_API_KEY) — they do not need to be embedded in the config file.
Set "transport": "both" and add "webrtc_ice_servers": ["stun:stun.l.google.com:19302"] if you want both WebSocket and WebRTC active at the same time. See the WebRTC quickstart for the full configuration.

Verify the Installation

Start the server:
./voxray -config config.json
# Windows: .\voxray.exe -config config.json
You should see startup log lines similar to:
INFO  voxray starting  transport=websocket host=0.0.0.0 port=8080
INFO  pipeline ready   stt=openai llm=openai tts=openai
INFO  server listening addr=0.0.0.0:8080
Confirm the health endpoint responds:
curl http://localhost:8080/health
# Expected: 200 OK
Confirm the readiness endpoint:
curl http://localhost:8080/ready
# Expected: 200 OK

Next Steps

WebSocket Quickstart

Connect a client to the WebSocket endpoint and make your first voice call.

WebRTC Quickstart

Build with CGO and connect a browser client via WebRTC for real-time audio.

Configuration Reference

Explore all config fields: transports, providers, recording, transcripts, and more.

Supported Providers

See the full matrix of STT, LLM, and TTS providers and their config keys.