Skip to main content

What you’ll build

A Voxray server that accepts inbound phone calls routed through Twilio. When someone dials your Twilio number:
  1. Twilio receives the PSTN call and sends a POST webhook to your server’s / endpoint
  2. Voxray responds with TwiML XML that points Twilio to /telephony/ws
  3. Twilio opens a WebSocket to /telephony/ws for bidirectional audio
  4. Audio flows through the STT → LLM → TTS pipeline in real time
  5. Voxray streams synthesised speech back; Twilio plays it to the caller
Audio format: Twilio sends G.711 μ-law at 8kHz. Voxray automatically upsamples to 16kHz before passing audio to the STT provider — no manual resampling needed.

Prerequisites

  • Voxray binary built or downloaded
  • A Twilio account with an active phone number and Voice capabilities enabled
  • An AI provider account with API keys (this tutorial uses Groq — fast, free tier available)
  • ngrok for local tunnelling (production deployments use a real public domain)

Steps

1
Install ngrok
2
ngrok creates a public HTTPS tunnel to your local machine so Twilio’s webhook can reach Voxray during development.
3
macOS (Homebrew)
brew install ngrok/ngrok/ngrok
Linux
curl -sSL https://ngrok-agent.s3.amazonaws.com/ngrok.asc \
  | sudo tee /etc/apt/trusted.gpg.d/ngrok.asc >/dev/null
echo "deb https://ngrok-agent.s3.amazonaws.com buster main" \
  | sudo tee /etc/apt/sources.list.d/ngrok.list
sudo apt update && sudo apt install ngrok
Manual download
Download the binary for your OS from ngrok.com/download, unzip it, and place ngrok on your PATH.
4
Authenticate ngrok with your account token (one-time setup):
5
ngrok config add-authtoken <YOUR_NGROK_AUTHTOKEN>
7
Start your ngrok tunnel
8
In a separate terminal, expose port 8080:
9
ngrok http 8080
10
ngrok prints output like:
11
Forwarding  https://abc123.ngrok.io -> http://localhost:8080
12
Note your HTTPS forwarding URL (e.g. https://abc123.ngrok.io). You will use the hostname (abc123.ngrok.io) in the next step and the full URL when configuring Twilio.
13
ngrok free-tier URLs change every time you restart ngrok. If you restart ngrok you must update the Twilio webhook URL to match the new hostname and also update proxy_host in your config.
14
Configure Voxray
15
Create config.json in your working directory. Replace abc123.ngrok.io with your actual ngrok hostname:
16
{
  "host": "localhost",
  "port": 8080,
  "runner_transport": "twilio",
  "proxy_host": "abc123.ngrok.io",
  "stt_provider": "groq",
  "llm_provider": "groq",
  "tts_provider": "groq",
  "model": "llama-3.1-8b-instant",
  "api_keys": {
    "groq": "gsk_..."
  }
}
17
Key fields explained:
18
FieldValuePurposerunner_transport"twilio"Activates the telephony endpoints (POST / and GET /telephony/ws) and selects the Twilio serializerproxy_host"abc123.ngrok.io"Hostname Voxray embeds in TwiML responses so Twilio can open the WebSocket. Must NOT include https:// — bare hostname onlystt_provider"groq"Routes audio to Groq’s transcription servicellm_provider"groq"Routes transcripts to Groq’s chat completionstts_provider"groq"Synthesises responses via Groq TTSmodel"llama-3.1-8b-instant"Groq LLM model — very low latency
19
proxy_host must be the bare hostname with no protocol prefix and no trailing slash. Correct: abc123.ngrok.io. Incorrect: https://abc123.ngrok.io or abc123.ngrok.io/.
20
You can use any STT/LLM/TTS provider combination. Groq is shown here because it has a generous free tier and very low latency. To use OpenAI instead:
21
{
  "runner_transport": "twilio",
  "proxy_host": "abc123.ngrok.io",
  "stt_provider": "openai",
  "llm_provider": "openai",
  "tts_provider": "openai",
  "model": "gpt-4o-mini",
  "stt_model": "gpt-4o-mini-transcribe",
  "tts_voice": "nova",
  "api_keys": { "openai": "sk-..." }
}
22
Start Voxray
23
./voxray -config config.json
24
Expected output:
25
Voxray listening on localhost:8080
runner_transport: twilio  proxy_host: abc123.ngrok.io
stt: groq  llm: groq  tts: groq
26
Voxray is now serving:
27
  • POST / — webhook endpoint that returns TwiML XML to Twilio
  • GET /telephony/ws — WebSocket endpoint for bidirectional media
  • 28
    Configure your Twilio phone number
    29
  • Log in to the Twilio Console
  • Navigate to Phone Numbers → Manage → Active Numbers
  • Click your phone number
  • Under Voice & Fax → A call comes in, set:
    • Webhook: https://abc123.ngrok.io (your full ngrok HTTPS URL, no trailing slash)
    • HTTP Method: POST
  • Click Save configuration
  • 30
    Also verify Voice Geographic Permissions allow calls from the regions you expect (Console → Voice → Settings → Geographic Permissions). Without the right permissions enabled, Twilio will reject incoming calls silently.
    31
    Call your number and talk to your agent
    32
    Dial your Twilio phone number from any phone. The call flow is:
    33
  • Twilio receives your call and POSTs to https://abc123.ngrok.io
  • Voxray returns TwiML that streams media to wss://abc123.ngrok.io/telephony/ws
  • Twilio opens the WebSocket; Voxray starts the pipeline
  • Your voice → Groq STT → Groq LLM → Groq TTS → your earpiece
  • 34
    You should hear the agent respond within 1–2 seconds of finishing a sentence.

    Audio characteristics and resampling

    Twilio’s media stream delivers audio as G.711 μ-law encoded at 8kHz, which is the standard PSTN codec. Most STT providers expect 16kHz PCM. Voxray’s Twilio serializer handles this automatically:
    • Decodes μ-law bytes to 16-bit linear PCM
    • Upsamples from 8kHz to 16kHz using linear interpolation
    • Passes 16kHz PCM frames to the STT processor
    No configuration is required. The upsampling is transparent; you will notice slightly reduced audio fidelity compared to a WebSocket client sending native 16kHz audio, which is inherent to the PSTN codec.

    Production deployment

    For production, replace ngrok with a server that has a stable public hostname and a valid TLS certificate. Minimum production checklist:
    1. Deploy Voxray on a VM or container with a public IP (or behind a load balancer)
    2. Obtain a TLS certificate (e.g. via Let’s Encrypt / Certbot, or terminate TLS at your load balancer)
    3. Point a domain name (e.g. voice.example.com) at your server
    4. Set proxy_host in config.json to your domain:
    {
      "runner_transport": "twilio",
      "proxy_host": "voice.example.com",
      ...
    }
    
    1. Update your Twilio phone number webhook to https://voice.example.com
    2. Optionally set server_api_key in config and use Twilio’s request validation to verify webhook authenticity
    Twilio requires the webhook endpoint to respond in under 15 seconds. Voxray responds immediately with TwiML (well under 1 second) so this limit is not a concern in practice.

    Troubleshooting

    SymptomCauseFix
    Twilio error 11200 (HTTP retrieval failure)Webhook returned non-XML or timed outCheck Voxray is running and proxy_host matches your ngrok URL exactly; Twilio must receive text/xml
    404 when Twilio POSTs to your URLproxy_host has wrong value or includes https://Set proxy_host to the bare hostname only: abc123.ngrok.io
    Caller hears nothing / one-way audioSTT or TTS provider errorCheck Voxray logs for provider errors; verify the API key is correct and has credits
    Call connects but agent never speaksLLM not respondingConfirm the model name is valid for the chosen provider; check for timeout errors in logs
    connection refused in ngrok terminalVoxray not running on port 8080Start Voxray before placing a test call; verify port in config matches the ngrok tunnel port
    ngrok tunnel URL changedRestarted ngrok on free planUpdate proxy_host in config, restart Voxray, and update the Twilio webhook URL
    Twilio error 13225 (no geographic permission)Outbound or inbound region not enabledEnable the caller’s region in Twilio Console → Voice → Settings → Geographic Permissions