What you’ll build
A Voxray server that accepts inbound phone calls routed through Twilio. When someone dials your Twilio number:- Twilio receives the PSTN call and sends a POST webhook to your server’s
/endpoint - Voxray responds with TwiML XML that points Twilio to
/telephony/ws - Twilio opens a WebSocket to
/telephony/wsfor bidirectional audio - Audio flows through the STT → LLM → TTS pipeline in real time
- Voxray streams synthesised speech back; Twilio plays it to the caller
Prerequisites
- Voxray binary built or downloaded
- A Twilio account with an active phone number and Voice capabilities enabled
- An AI provider account with API keys (this tutorial uses Groq — fast, free tier available)
- ngrok for local tunnelling (production deployments use a real public domain)
Steps
ngrok creates a public HTTPS tunnel to your local machine so Twilio’s webhook can reach Voxray during development.
macOS (Homebrew)
Linux
Manual download
Download the binary for your OS from ngrok.com/download, unzip it, and place
ngrok on your PATH.Your authtoken is at dashboard.ngrok.com/get-started/your-authtoken.
Note your HTTPS forwarding URL (e.g.
https://abc123.ngrok.io). You will use the hostname (abc123.ngrok.io) in the next step and the full URL when configuring Twilio.ngrok free-tier URLs change every time you restart ngrok. If you restart ngrok you must update the Twilio webhook URL to match the new hostname and also update
proxy_host in your config.Create
config.json in your working directory. Replace abc123.ngrok.io with your actual ngrok hostname:{
"host": "localhost",
"port": 8080,
"runner_transport": "twilio",
"proxy_host": "abc123.ngrok.io",
"stt_provider": "groq",
"llm_provider": "groq",
"tts_provider": "groq",
"model": "llama-3.1-8b-instant",
"api_keys": {
"groq": "gsk_..."
}
}
runner_transport"twilio"POST / and GET /telephony/ws) and selects the Twilio serializerproxy_host"abc123.ngrok.io"https:// — bare hostname onlystt_provider"groq"llm_provider"groq"tts_provider"groq"model"llama-3.1-8b-instant"proxy_host must be the bare hostname with no protocol prefix and no trailing slash. Correct: abc123.ngrok.io. Incorrect: https://abc123.ngrok.io or abc123.ngrok.io/.You can use any STT/LLM/TTS provider combination. Groq is shown here because it has a generous free tier and very low latency. To use OpenAI instead:
{
"runner_transport": "twilio",
"proxy_host": "abc123.ngrok.io",
"stt_provider": "openai",
"llm_provider": "openai",
"tts_provider": "openai",
"model": "gpt-4o-mini",
"stt_model": "gpt-4o-mini-transcribe",
"tts_voice": "nova",
"api_keys": { "openai": "sk-..." }
}
Voxray listening on localhost:8080
runner_transport: twilio proxy_host: abc123.ngrok.io
stt: groq llm: groq tts: groq
POST / — webhook endpoint that returns TwiML XML to TwilioGET /telephony/ws — WebSocket endpoint for bidirectional media- Webhook:
https://abc123.ngrok.io(your full ngrok HTTPS URL, no trailing slash) - HTTP Method:
POST
Also verify Voice Geographic Permissions allow calls from the regions you expect (Console → Voice → Settings → Geographic Permissions). Without the right permissions enabled, Twilio will reject incoming calls silently.
https://abc123.ngrok.iowss://abc123.ngrok.io/telephony/wsAudio characteristics and resampling
Twilio’s media stream delivers audio as G.711 μ-law encoded at 8kHz, which is the standard PSTN codec. Most STT providers expect 16kHz PCM. Voxray’s Twilio serializer handles this automatically:- Decodes μ-law bytes to 16-bit linear PCM
- Upsamples from 8kHz to 16kHz using linear interpolation
- Passes 16kHz PCM frames to the STT processor
Production deployment
For production, replace ngrok with a server that has a stable public hostname and a valid TLS certificate. Minimum production checklist:- Deploy Voxray on a VM or container with a public IP (or behind a load balancer)
- Obtain a TLS certificate (e.g. via Let’s Encrypt / Certbot, or terminate TLS at your load balancer)
- Point a domain name (e.g.
voice.example.com) at your server - Set
proxy_hostinconfig.jsonto your domain:
- Update your Twilio phone number webhook to
https://voice.example.com - Optionally set
server_api_keyin config and use Twilio’s request validation to verify webhook authenticity
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Twilio error 11200 (HTTP retrieval failure) | Webhook returned non-XML or timed out | Check Voxray is running and proxy_host matches your ngrok URL exactly; Twilio must receive text/xml |
| 404 when Twilio POSTs to your URL | proxy_host has wrong value or includes https:// | Set proxy_host to the bare hostname only: abc123.ngrok.io |
| Caller hears nothing / one-way audio | STT or TTS provider error | Check Voxray logs for provider errors; verify the API key is correct and has credits |
| Call connects but agent never speaks | LLM not responding | Confirm the model name is valid for the chosen provider; check for timeout errors in logs |
connection refused in ngrok terminal | Voxray not running on port 8080 | Start Voxray before placing a test call; verify port in config matches the ngrok tunnel port |
| ngrok tunnel URL changed | Restarted ngrok on free plan | Update proxy_host in config, restart Voxray, and update the Twilio webhook URL |
| Twilio error 13225 (no geographic permission) | Outbound or inbound region not enabled | Enable the caller’s region in Twilio Console → Voice → Settings → Geographic Permissions |