Documentation Index
Fetch the complete documentation index at: https://voxray-cac3ed72.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
How telephony works
Voxray handles phone calls through a two-phase handshake: an HTTP webhook that returns provider-specific XML or JSON, followed by a long-lived media WebSocket that carries the actual audio.- Set
runner_transportto the provider name (e.g."twilio"). - Set
proxy_hostto your public domain — no protocol prefix. - Configure the provider’s phone number or SIP trunk to send
POST /tohttps://{proxy_host}/. - When a call arrives, the provider sends
POST /to Voxray. Voxray responds with provider-specific XML (TwiML, TeXML, PHLO, etc.) that tells the provider to open a media WebSocket. - The provider dials back on
wss://{proxy_host}/telephony/ws. - Audio flows bidirectionally: the provider streams encoded audio frames in; Voxray’s pipeline transcribes them with STT, generates a response with LLM, synthesises speech with TTS, and sends audio frames back out.
- When the caller hangs up or the provider closes the WebSocket, the session tears down cleanly.
Supported transports
| Transport | Provider | runner_transport value | Audio format | Notes |
|---|---|---|---|---|
| Twilio | Twilio | "twilio" | G.711 μ-law 8 kHz | Most widely deployed; TwiML webhook response |
| Telnyx | Telnyx | "telnyx" | G.711 μ-law 8 kHz | TeXML webhook; SIP trunking supported |
| Plivo | Plivo | "plivo" | G.711 μ-law 8 kHz | XML webhook; requires PLIVO_AUTH_ID / PLIVO_AUTH_TOKEN |
| Exotel | Exotel | "exotel" | G.711 μ-law 8 kHz | Indian telecom provider; XML webhook |
| Daily.co | Daily | "daily" | Opus 48 kHz | Room-based; optional PSTN dial-in via POST /daily-dialin-webhook |
| LiveKit | LiveKit | "livekit" | Opus 48 kHz | Room-based; requires LIVEKIT_* env vars |
Twilio, Telnyx, Plivo, and Exotel all use G.711 μ-law at 8 kHz on the wire. Voxray automatically upsamples to 16 kHz for STT and downsamples + re-encodes TTS output to 8 kHz μ-law before sending it back. You do not need to configure codecs or sample rates.
Base configuration
The minimumconfig.json to handle inbound telephony calls:
runner_transport and proxy_host are the two fields that activate telephony mode. Everything else follows the same provider and model configuration as non-telephony deployments.
Provider-specific setup
- Twilio
- Telnyx
- Plivo
- Daily.co
- LiveKit
Twilio uses TwiML. When a call arrives, Voxray responds with a No
<Connect><Stream> directive that tells Twilio to open a media WebSocket to /telephony/ws.Twilio console setup:- Buy or configure a Twilio phone number.
- Under “Voice Configuration”, set the webhook to
https://{proxy_host}/with HTTP methodPOST. - No additional credentials are required in Voxray — Twilio signs requests with
X-Twilio-Signature, but the webhook endpoint is intentionally unauthenticated by design (providers do not always send auth headers on media connections).
DAILY_API_KEY is needed for Twilio. The only external secret required is your STT/LLM/TTS provider API keys.Audio processing pipeline
Voxray handles codec conversion and resampling automatically. You do not need to configure audio formats. For G.711 providers (Twilio, Telnyx, Plivo, Exotel):Production deployment
TLS is required
Twilio, Telnyx, Plivo, and Exotel all require HTTPS webhooks. Your Voxray server must be reachable over TLS on a public domain before you can configure a phone number to point at it. Options:-
Reverse proxy: Run Nginx or Caddy in front of Voxray. Set
portto an internal port (e.g.8080) and terminate TLS at the proxy. -
Built-in TLS: Enable Voxray’s TLS listener directly:
Or via environment variables:
VOXRAY_TLS_ENABLE=true,VOXRAY_TLS_CERT_FILE,VOXRAY_TLS_KEY_FILE.
API key security
The telephony webhook (POST /) is intentionally unauthenticated — providers do not consistently forward auth headers across both the webhook and the subsequent media WebSocket. However, you should still protect the rest of the server:
server_api_key enforces Authorization: Bearer <key> or X-API-Key: <key> on /start, /sessions/*, /webrtc/offer, and /ws. Set it via VOXRAY_SERVER_API_KEY to keep it out of the config file.
Monitoring
Voxray exposes Prometheus metrics at/metrics. For telephony deployments, watch these signals:
| Metric | What to watch for |
|---|---|
| STT error rate | High error rates indicate codec issues or upstream STT quota exhaustion |
| LLM latency (p95) | Spikes cause long silences between caller turn and bot response |
| Active connections | Tracks concurrent call load; plan capacity accordingly |
| Pipeline queue depth | When pipeline_input_queue_cap is reached, audio frames block — increase capacity or scale out |
Horizontal scaling
For multiple concurrent calls across multiple instances, use Redis as the session store so all instances share session state:/telephony/ws is sticky to the instance that handled the initial POST / webhook for that call.
Local development with ngrok
To test telephony locally before deploying:https://abc123.ngrok.io. Set that as your proxy_host (without https://):
https://abc123.ngrok.io/. Incoming test calls will route through ngrok to your local Voxray process.
ngrok free tier generates a new URL on each restart. Update
proxy_host and your provider webhook URL each time. ngrok paid plans support reserved domains.