Skip to main content

Documentation Index

Fetch the complete documentation index at: https://voxray-cac3ed72.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

How telephony works

Voxray handles phone calls through a two-phase handshake: an HTTP webhook that returns provider-specific XML or JSON, followed by a long-lived media WebSocket that carries the actual audio.
Phone call arrives


POST /          ← provider webhook; Voxray responds with XML/JSON
      │           instructing provider to open a media stream

GET /telephony/ws   ← provider opens WebSocket; audio flows both ways


Transport → Runner → Pipeline (VAD → STT → LLM → TTS → Sink)


TTS audio → /telephony/ws → provider → caller's earpiece
The full sequence for an inbound call:
  1. Set runner_transport to the provider name (e.g. "twilio").
  2. Set proxy_host to your public domain — no protocol prefix.
  3. Configure the provider’s phone number or SIP trunk to send POST / to https://{proxy_host}/.
  4. When a call arrives, the provider sends POST / to Voxray. Voxray responds with provider-specific XML (TwiML, TeXML, PHLO, etc.) that tells the provider to open a media WebSocket.
  5. The provider dials back on wss://{proxy_host}/telephony/ws.
  6. Audio flows bidirectionally: the provider streams encoded audio frames in; Voxray’s pipeline transcribes them with STT, generates a response with LLM, synthesises speech with TTS, and sends audio frames back out.
  7. When the caller hangs up or the provider closes the WebSocket, the session tears down cleanly.
Each call runs in an isolated goroutine — one goroutine per active connection. A single Voxray process handles many concurrent calls.

Supported transports

TransportProviderrunner_transport valueAudio formatNotes
TwilioTwilio"twilio"G.711 μ-law 8 kHzMost widely deployed; TwiML webhook response
TelnyxTelnyx"telnyx"G.711 μ-law 8 kHzTeXML webhook; SIP trunking supported
PlivoPlivo"plivo"G.711 μ-law 8 kHzXML webhook; requires PLIVO_AUTH_ID / PLIVO_AUTH_TOKEN
ExotelExotel"exotel"G.711 μ-law 8 kHzIndian telecom provider; XML webhook
Daily.coDaily"daily"Opus 48 kHzRoom-based; optional PSTN dial-in via POST /daily-dialin-webhook
LiveKitLiveKit"livekit"Opus 48 kHzRoom-based; requires LIVEKIT_* env vars
Twilio, Telnyx, Plivo, and Exotel all use G.711 μ-law at 8 kHz on the wire. Voxray automatically upsamples to 16 kHz for STT and downsamples + re-encodes TTS output to 8 kHz μ-law before sending it back. You do not need to configure codecs or sample rates.

Base configuration

The minimum config.json to handle inbound telephony calls:
{
  "runner_transport": "twilio",
  "proxy_host": "your-domain.com",
  "port": 8080,
  "stt_provider": "groq",
  "llm_provider": "groq",
  "tts_provider": "groq",
  "model": "llama-3.1-8b-instant"
}
proxy_host must be a bare hostname — no https://, no trailing slash. Correct: "your-domain.com". Wrong: "https://your-domain.com". Voxray builds webhook URLs from this value; an embedded protocol prefix will produce malformed XML and the provider will reject it.
runner_transport and proxy_host are the two fields that activate telephony mode. Everything else follows the same provider and model configuration as non-telephony deployments.

Provider-specific setup

Twilio uses TwiML. When a call arrives, Voxray responds with a <Connect><Stream> directive that tells Twilio to open a media WebSocket to /telephony/ws.Twilio console setup:
  1. Buy or configure a Twilio phone number.
  2. Under “Voice Configuration”, set the webhook to https://{proxy_host}/ with HTTP method POST.
  3. No additional credentials are required in Voxray — Twilio signs requests with X-Twilio-Signature, but the webhook endpoint is intentionally unauthenticated by design (providers do not always send auth headers on media connections).
config.json:
{
  "runner_transport": "twilio",
  "proxy_host": "bot.example.com",
  "port": 8080,
  "stt_provider": "groq",
  "llm_provider": "openai",
  "tts_provider": "openai",
  "model": "gpt-4o-mini"
}
No DAILY_API_KEY is needed for Twilio. The only external secret required is your STT/LLM/TTS provider API keys.

Audio processing pipeline

Voxray handles codec conversion and resampling automatically. You do not need to configure audio formats. For G.711 providers (Twilio, Telnyx, Plivo, Exotel):
Provider → G.711 μ-law 8 kHz → [Voxray upsample to 16 kHz] → STT

                                                               LLM

Provider ← G.711 μ-law 8 kHz ← [Voxray downsample + encode] ← TTS
For Opus providers (Daily.co, LiveKit):
Provider → Opus 48 kHz → [Voxray decode] → STT

                                           LLM

Provider ← Opus 48 kHz ← [Voxray encode] ← TTS
STT and LLM operate at their native resolution regardless of what the provider delivers on the wire. The pipeline processors (VAD, STT, LLM, TTS, Sink) are identical across all telephony transports — only the transport layer serializer differs.

Production deployment

TLS is required

Twilio, Telnyx, Plivo, and Exotel all require HTTPS webhooks. Your Voxray server must be reachable over TLS on a public domain before you can configure a phone number to point at it. Options:
  • Reverse proxy: Run Nginx or Caddy in front of Voxray. Set port to an internal port (e.g. 8080) and terminate TLS at the proxy.
  • Built-in TLS: Enable Voxray’s TLS listener directly:
    {
      "tls_enable": true,
      "tls_cert_file": "/etc/letsencrypt/live/bot.example.com/fullchain.pem",
      "tls_key_file": "/etc/letsencrypt/live/bot.example.com/privkey.pem"
    }
    
    Or via environment variables: VOXRAY_TLS_ENABLE=true, VOXRAY_TLS_CERT_FILE, VOXRAY_TLS_KEY_FILE.

API key security

The telephony webhook (POST /) is intentionally unauthenticated — providers do not consistently forward auth headers across both the webhook and the subsequent media WebSocket. However, you should still protect the rest of the server:
{
  "server_api_key": "your-strong-random-key"
}
When set, server_api_key enforces Authorization: Bearer <key> or X-API-Key: <key> on /start, /sessions/*, /webrtc/offer, and /ws. Set it via VOXRAY_SERVER_API_KEY to keep it out of the config file.
For additional webhook security with Twilio, validate the X-Twilio-Signature header in a reverse proxy or middleware before requests reach Voxray. Twilio’s signature uses your auth token and the full webhook URL.

Monitoring

Voxray exposes Prometheus metrics at /metrics. For telephony deployments, watch these signals:
MetricWhat to watch for
STT error rateHigh error rates indicate codec issues or upstream STT quota exhaustion
LLM latency (p95)Spikes cause long silences between caller turn and bot response
Active connectionsTracks concurrent call load; plan capacity accordingly
Pipeline queue depthWhen pipeline_input_queue_cap is reached, audio frames block — increase capacity or scale out
{
  "metrics_enabled": true,
  "pipeline_input_queue_cap": 512
}

Horizontal scaling

For multiple concurrent calls across multiple instances, use Redis as the session store so all instances share session state:
{
  "session_store": "redis",
  "redis_url": "redis://redis.internal:6379/0",
  "session_ttl_secs": 3600
}
Put a TCP load balancer (not HTTP round-robin) in front of your Voxray instances so that the provider’s WebSocket connection to /telephony/ws is sticky to the instance that handled the initial POST / webhook for that call.

Local development with ngrok

To test telephony locally before deploying:
# Start Voxray
./voxray --config config.json

# In another terminal, expose port 8080
ngrok http 8080
ngrok prints a public URL like https://abc123.ngrok.io. Set that as your proxy_host (without https://):
{
  "runner_transport": "twilio",
  "proxy_host": "abc123.ngrok.io",
  "port": 8080
}
Then point your Twilio (or other provider) phone number webhook at https://abc123.ngrok.io/. Incoming test calls will route through ngrok to your local Voxray process.
ngrok free tier generates a new URL on each restart. Update proxy_host and your provider webhook URL each time. ngrok paid plans support reserved domains.