Single-instance vs multi-instance
Voxray follows a one goroutine per active connection model. Each Runner goroutine is fully isolated, so vertical scaling (larger CPU/memory) is the first lever to pull.- Single instance (default)
- Multiple instances (Redis)
The default session store is in-memory. No external dependencies are needed.All session state lives in the process. Restarting the process loses in-flight sessions. This mode is appropriate for development, single-region deployments, and workloads where one instance can handle peak load.
Vertical scaling — giving the instance more CPU cores and RAM — is the simplest path to higher throughput. Go’s goroutine scheduler saturates available cores automatically.
Load balancer requirements
WebSocket upgrade passthrough
All load balancers in front of Voxray must pass theUpgrade: websocket and Connection: Upgrade headers through without stripping them. Without this, WebSocket handshakes fail at the proxy.
proxy_read_timeout (nginx) or the equivalent idle-connection timeout in your load balancer to match the longest expected call duration. A 60-second default will terminate active voice sessions.
WebRTC signaling
WebRTC media is peer-to-peer: once ICE negotiation completes, audio flows directly between the client and the provider (or via TURN). The Voxray server only handles signaling.POST /webrtc/offer— stateless; can land on any instance.POST /startandPATCH /sessions/{id}/api/offer— session lookup is backed by Redis in multi-instance mode, so these are also instance-agnostic.
Telephony WebSocket backhaul
Telephony providers (Twilio, Telnyx, Plivo, Exotel) open a persistent WebSocket to/telephony/ws. This connection must remain on the same instance for the call’s lifetime. Use cookie-based or IP-based session affinity for telephony traffic, or terminate telephony on a dedicated instance pool.
Performance tuning
The following config keys control buffering and write behavior on the hot path between transport and pipeline.| Config Key | Env Variable | Default | Description |
|---|---|---|---|
pipeline_input_queue_cap | VOXRAY_PIPELINE_INPUT_QUEUE_CAP | 256 | Channel buffer between transport read and pipeline push. When full, the reader blocks — back-pressure propagates to the transport rather than unbounded memory growth. Increase under bursty audio input. |
ws_write_coalesce_ms | VOXRAY_WS_WRITE_COALESCE_MS | 0 (disabled) | When greater than zero, the WebSocket writer batches frames within this window (milliseconds) before flushing. Reduces syscall count; adds a small fixed latency. |
ws_write_coalesce_max_frames | VOXRAY_WS_WRITE_COALESCE_MAX_FRAMES | — | Maximum frames per coalesce batch. Only active when ws_write_coalesce_ms > 0. |
S3 recording at scale
Recording uploads run through an async worker pool so they never block the voice pipeline.Worker pool configuration
| Config Key | Env Variable | Default | Description |
|---|---|---|---|
recording.worker_count | VOXRAY_RECORDING_WORKER_COUNT | 2 | Number of goroutines processing the upload queue concurrently. |
recording.queue_cap | VOXRAY_RECORDING_QUEUE_CAP | 32 | Job queue capacity. When full, new upload jobs are dropped (with a log warning). Size this to absorb bursts between worker completions. |
recording.max_retries | VOXRAY_RECORDING_MAX_RETRIES | 3 | Retry attempts per upload on S3 error, with exponential backoff. |
worker_count proportionally to your calls-per-minute volume and typical recording duration.
Storage layout
Recordings are stored under the configured S3 bucket and base path using a date-partitioned key:recording.base_path = "recordings/" and recording.format = "wav":
Enable recording by setting
"recording.enable": true (or VOXRAY_RECORDING_ENABLE=true) alongside the bucket and credentials. AWS credentials must be available via the standard chain: env vars, instance role, or shared credentials file.Transcript database at scale
When transcript logging is enabled, Voxray writes each turn (speaker role, text, sequence number, timestamp) to a SQL database. The table is created automatically on startup if it does not exist.Supported drivers
- PostgreSQL
- MySQL
sslmode=require in production. For connection pooling, set pool_max_conns (pgx DSN) or use PgBouncer in transaction mode in front of Postgres.Table schema
The following schema is auto-created on startup:| Column | Type | Description |
|---|---|---|
session_id | TEXT / VARCHAR | Voxray session identifier |
role | TEXT / VARCHAR | Speaker role (user or assistant) |
text | TEXT | Transcript text for this turn |
seq | INTEGER | Turn sequence number within the session |
created_at | TIMESTAMP | Wall-clock time of the turn |
database/sql package. To tune pool size, append DSN parameters specific to your driver (e.g., max_open_conns, max_idle_conns via pgx extended DSN, or use a proxy like PgBouncer).
Kubernetes deployment
The following example deploys three Voxray replicas backed by a Redis session store, with secrets injected via Kubernetes Secrets.Horizontal Pod Autoscaler
Voxray exposeshttp_active_connections as a Prometheus gauge. Use the Prometheus Adapter to surface this as a custom metric, then configure an HPA to scale based on connections per pod:
Set
minReplicas: 2 for production to maintain availability during a single pod restart or rolling update.