Skip to main content
Voxray is configured through a JSON file combined with environment variable overrides. This page documents every configuration field, its type, default value, and the corresponding environment variable where applicable.

Config Loading

Config File

Pass the path to your config file with the -config flag at startup:
./voxray -config /path/to/config.json
Alternatively, set the VOXRAY_CONFIG environment variable and omit the flag:
export VOXRAY_CONFIG=/path/to/config.json
./voxray
When neither is provided, Voxray looks for config.json in the current working directory.

Loading Order and Precedence

Environment variables always win. The resolution order for every field is:
  1. Environment variable (highest priority — overrides everything)
  2. config.json field (applied after file load)
  3. Built-in default (used when the field is absent from the file)
This means you can ship a base config.json and override individual values at deploy time without modifying the file — a standard 12-factor pattern.

12-Factor Best Practice

Never commit secrets (API keys, server_api_key, database DSNs) to source control. Use environment variables for all sensitive values in production and CI environments. Reserve inline api_keys in config.json for local development only.

Environment Variable Reference

All VOXRAY_* variables are read by ApplyEnvOverrides immediately after the config file is parsed. Unset variables leave the corresponding config field unchanged.

Server

Environment VariableConfig FieldTypeDescription
VOXRAY_HOST or HOSThoststringServer bind address. VOXRAY_HOST takes precedence over HOST.
VOXRAY_PORT or PORTportintServer listen port. VOXRAY_PORT takes precedence over PORT.
VOXRAY_LOG_LEVELlog_levelstringLog verbosity: "debug", "info", or "error". Value is lowercased and trimmed.
VOXRAY_JSON_LOGSjson_logsboolSet to "true" or "1" to enable JSON log output (one object per line).

Security

Environment VariableConfig FieldTypeDescription
VOXRAY_SERVER_API_KEYserver_api_keystringWhen set, all protected endpoints require Authorization: Bearer <key> or X-API-Key: <key>.
VOXRAY_MAX_BODY_BYTESmax_request_body_bytesint64Maximum JSON request body size in bytes. 0 disables the limit.
VOXRAY_CORS_ORIGINScors_allowed_origins[]stringComma-separated list of allowed CORS origins (e.g. https://app.example.com,https://staging.example.com). Empty means no CORS headers.
VOXRAY_DAILY_DIALIN_WEBHOOK_SECRETdaily_dialin_webhook_secretstringSecret validated via X-Webhook-Secret header for POST /daily-dialin-webhook.

TLS

Environment VariableConfig FieldTypeDescription
VOXRAY_TLS_ENABLEtls_enableboolSet to "true" or "1" to enable HTTPS (ListenAndServeTLS).
VOXRAY_TLS_CERT_FILEtls_cert_filestringPath to the TLS certificate file (PEM). Required when TLS is enabled.
VOXRAY_TLS_KEY_FILEtls_key_filestringPath to the TLS private key file (PEM). Required when TLS is enabled.

Transport and Pipeline

Environment VariableConfig FieldTypeDescription
VOXRAY_RTC_MAX_DURATION_SECSrtc_max_duration_secsfloat64Maximum lifetime for RTC/WebSocket voice sessions in seconds after first inbound audio. 0 disables the limit.
VOXRAY_PIPELINE_INPUT_QUEUE_CAPpipeline_input_queue_capintBuffer size between transport read and pipeline push. Default 256.
VOXRAY_WS_WRITE_COALESCE_MSws_write_coalesce_msintWebSocket write coalescing window in milliseconds. 0 (default) disables coalescing.
VOXRAY_WS_WRITE_COALESCE_MAX_FRAMESws_write_coalesce_max_framesintMaximum frames to coalesce per WebSocket write batch when coalescing is enabled.

VAD

Environment VariableConfig FieldTypeDescription
VOXRAY_VAD_BATCH_SIZEvad_batch_sizeintNumber of consecutive VAD chunks to batch before inference (e.g. for Silero). Default 1 (no batching).

Recording

Environment VariableConfig FieldTypeDescription
VOXRAY_RECORDING_ENABLErecording.enableboolSet to "true" or "1" to enable per-session audio recording.
VOXRAY_RECORDING_BUCKETrecording.bucketstringS3 bucket name for recording uploads.
VOXRAY_RECORDING_BASE_PATHrecording.base_pathstringKey prefix within the S3 bucket (e.g. "recordings/").
VOXRAY_RECORDING_FORMATrecording.formatstringFile format/extension (e.g. "wav").
VOXRAY_RECORDING_WORKER_COUNTrecording.worker_countintNumber of async S3 uploader worker goroutines.
VOXRAY_RECORDING_QUEUE_CAPrecording.queue_capintUpload job queue capacity. Default 32.
VOXRAY_RECORDING_MAX_RETRIESrecording.max_retriesintNumber of S3 upload retry attempts on failure with exponential backoff. Default 3.

Transcripts

Environment VariableConfig FieldTypeDescription
VOXRAY_TRANSCRIPTS_ENABLEtranscripts.enableboolSet to "true" or "1" to enable per-message transcript logging to a SQL database.
VOXRAY_TRANSCRIPTS_DRIVERtranscripts.driverstringSQL driver name (e.g. "postgres", "mysql"). Required when transcripts are enabled.
VOXRAY_TRANSCRIPTS_DSNtranscripts.dsnstringSQL data source name / connection string. Required when transcripts are enabled.
VOXRAY_TRANSCRIPTS_TABLEtranscripts.table_namestringSQL table for transcript rows. Default "call_transcripts".

API Keys (Provider-Specific)

API keys are not read through ApplyEnvOverrides — they are resolved lazily by GetAPIKey when each provider is first used. The table below lists the environment variable each provider checks when api_keys.<provider> is absent from config.json.
Providerapi_keys keyEnvironment Variable
OpenAIopenaiOPENAI_API_KEY
GroqgroqGROQ_API_KEY
SarvamsarvamSARVAM_API_KEY
AnthropicanthropicANTHROPIC_API_KEY
ElevenLabselevenlabsELEVENLABS_API_KEY
GooglegoogleGOOGLE_API_KEY
AWSawsAWS_SECRET_ACCESS_KEY
MistralmistralMISTRAL_API_KEY
DeepSeekdeepseekDEEPSEEK_API_KEY
xAI (Grok)xaiXAI_API_KEY
CerebrascerebrasCEREBRAS_API_KEY
OllamaollamaOLLAMA_API_KEY
QwenqwenDASHSCOPE_API_KEY or QWEN_API_KEY
WhisperwhisperWHISPER_API_KEY (falls back to OPENAI_API_KEY)
AsyncAIasyncaiASYNC_AI_API_KEY
CambcambCAMB_API_KEY
FishfishFISH_API_KEY
GradiumgradiumGRADIUM_API_KEY
HumehumeHUME_API_KEY
InworldinworldINWORLD_API_KEY
MinimaxminimaxMINIMAX_API_KEY
MoondreammoondreamMOONDREAM_API_KEY
NeuphonicneuphonicNEUPHONIC_API_KEY
OpenPipeopenpipeOPENPIPE_API_KEY
SonioxsonioxSONIOX_API_KEY
XTTSxttsXTTS_API_KEY
Google Cloud Projectgoogle_cloud_projectGOOGLE_CLOUD_PROJECT
Google Cloud Locationgoogle_cloud_locationGOOGLE_CLOUD_LOCATION
AWS Regionaws_regionAWS_REGION (default "us-east-1")

config.json Field Reference

Server

FieldTypeDefaultDescription
hoststring"localhost"Server bind address. Use "0.0.0.0" to listen on all interfaces.
portint3042Server listen port.
log_levelstring"info"Log verbosity: "debug", "info", or "error".
json_logsboolfalseEnable one-JSON-object-per-line structured logging.

Providers

FieldTypeDefaultDescription
providerstringDefault provider for all tasks when task-specific keys are absent.
stt_providerstringSTT provider key (e.g. "sarvam", "openai", "groq"). Falls back to provider.
llm_providerstringLLM provider key (e.g. "openai", "anthropic", "groq"). Falls back to provider.
tts_providerstringTTS provider key (e.g. "sarvam", "elevenlabs", "openai"). Falls back to provider.
modelstring"gpt-3.5-turbo" (OpenAI)LLM chat model name. Provider-specific (e.g. "llama-3.1-8b-instant" for Groq).
stt_modelstringProvider defaultSTT model identifier (e.g. "saarika:v2.5" for Sarvam, "whisper-large-v3-turbo" for Groq).
stt_languagestringBCP-47 language code for STT (e.g. "hi-IN", "en-IN"). Empty means auto-detect.
tts_modelstringProvider defaultTTS model identifier (e.g. "bulbul:v2" for Sarvam).
tts_voicestringProvider defaultTTS speaker/voice name or ID (e.g. "anushka" for Sarvam, voice UUID for ElevenLabs).

API Keys

FieldTypeDescription
api_keysmap[string]stringMap of provider name to API key string. Keys are looked up before falling back to environment variables. See the API keys table above for key names per provider.

Transport

FieldTypeDefaultDescription
transportstring"websocket"Network transport(s) to enable. "" or "websocket": WebSocket only (/ws). "smallwebrtc": SmallWebRTC only (/webrtc/offer). "both": both on the same HTTP server.
webrtc_ice_servers[]stringICE server URLs for SmallWebRTC (STUN/TURN). When empty, a default STUN server is used.
rtc_max_duration_secsfloat640Maximum voice session duration in seconds. 0 or negative disables enforcement.
pipeline_input_queue_capint256Buffer capacity between transport read and pipeline push. Increase under high concurrent load.
ws_write_coalesce_msint0WebSocket write coalescing window in milliseconds. 0 disables coalescing. When enabled, reduces syscalls at the cost of slight added latency.
ws_write_coalesce_max_framesintMax frames per coalesced WebSocket write when coalescing is enabled.

TLS

FieldTypeDefaultDescription
tls_enableboolfalseEnable HTTPS. When true, tls_cert_file and tls_key_file are required.
tls_cert_filestringFilesystem path to the TLS certificate (PEM format).
tls_key_filestringFilesystem path to the TLS private key (PEM format).

Security

FieldTypeDefaultDescription
server_api_keystringWhen non-empty, protected endpoints require Authorization: Bearer <key> or X-API-Key: <key>.
max_request_body_bytesint64262144 (256 KB)Maximum JSON request body size. 0 disables the limit. Applies to /webrtc/offer, /start, etc.
cors_allowed_origins[]string[]List of origins allowed for CORS. Empty means no CORS headers are added.

Turn Detection

FieldTypeDefaultDescription
turn_detectionstring"none"Turn detection mode. "none": disabled. "silence": end turn after a silence gap.
turn_stop_secsfloat643.0Silence duration (seconds) after speech that triggers an end-of-turn. Only used when turn_detection is "silence".
turn_pre_speech_msfloat64500Pre-speech audio padding in milliseconds prepended to each turn.
turn_max_duration_secsfloat648.0Maximum turn/segment duration in seconds before forcing an end-of-turn.
vad_start_secsfloat640VAD start trigger time for turn detection in seconds.
turn_asyncboolfalseUse async AnalyzeEndOfTurn instead of synchronous AppendAudio.

User Lifecycle

FieldTypeDefaultDescription
user_turn_stop_timeout_secsfloat64Falls back to turn_stop_secs or 5.0Timeout with no activity before forcing a user turn stop.
user_idle_timeout_secsfloat640After the bot finishes speaking, emit a UserIdleFrame when the user has been idle for this many seconds. 0 disables idle detection.

VAD (Voice Activity Detection)

FieldTypeDefaultDescription
vad_typestring"energy"VAD backend. "energy": RMS energy detector. "silero": Silero VAD neural model.
vad_thresholdfloat640.02RMS threshold for the energy VAD detector. Lower values detect quieter speech.
vad_confidencefloat640.7Confidence threshold for neural VAD (Silero). Range 0–1.
vad_start_secs_vadfloat640.2Minimum speech duration (seconds) before VAD considers audio as a valid utterance start.
vad_stop_secsfloat640.2Trailing silence (seconds) required for VAD to declare end of speech.
vad_min_volumefloat640.6Minimum audio volume for VAD activation (normalized 0–1). Raise to suppress background noise; lower if VAD misses quiet speech.
vad_batch_sizeint1Number of consecutive audio chunks to batch before VAD inference. Values greater than 1 are useful with Silero to match its expected frame size.

Interruptions

FieldTypeDefaultDescription
allow_interruptionsboolfalseAllow the user to interrupt bot speech mid-utterance.
interruption_strategystringInterruption strategy. "keyword": interrupt only when a keyword is detected (future). Default strategy fires on any user speech.
min_wordsintMinimum word count in the user’s utterance before interruption is triggered. Used with "keyword" strategy.

Runner and Telephony

FieldTypeDefaultDescription
runner_transportstringDevelopment/telephony runner transport. Options: "webrtc", "daily", "twilio", "telnyx", "plivo", "exotel", "livekit". Empty uses the standard WebSocket transport at /ws.
runner_portint8080Port override for runner transports. Python runner uses 7860.
proxy_hoststringPublic hostname for telephony webhook XML (e.g. "mybot.ngrok.io"). No protocol prefix.
dialinboolfalseEnable Daily PSTN dial-in webhook at POST /daily-dialin-webhook. Only valid with runner_transport=daily.
daily_dialin_webhook_secretstringWhen set, requires X-Webhook-Secret: <secret> on POST /daily-dialin-webhook.

Session Store

FieldTypeDefaultDescription
session_storestring"memory"Session storage backend. "memory": in-process, not shared across instances. "redis": shared via Redis for horizontal scaling.
redis_urlstringRedis connection URL (e.g. "redis://localhost:6379/0"). Required when session_store is "redis".
session_ttl_secsint3600Session TTL in seconds. Applies to the Redis store.

Observability

FieldTypeDefaultDescription
metrics_enabled*booltrueExpose Prometheus metrics at /metrics. When omitted (null), defaults to true. Set to false to disable.

Recording

FieldTypeDefaultDescription
recording.enableboolfalseEnable per-session audio recording with async S3 upload.
recording.bucketstringS3 bucket name for recordings.
recording.base_pathstringKey prefix within the S3 bucket (e.g. "recordings/").
recording.formatstringFile format/extension (e.g. "wav").
recording.worker_countintNumber of concurrent S3 upload worker goroutines.
recording.queue_capint32Upload job queue capacity. Tune based on S3 bandwidth and concurrent session count.
recording.max_retriesint3S3 upload retry attempts on failure. Uses exponential backoff between attempts.

Transcripts

FieldTypeDefaultDescription
transcripts.enableboolfalseEnable per-message transcript logging to a SQL database.
transcripts.driverstringSQL driver name (e.g. "postgres", "mysql"). Required when enabled.
transcripts.dsnstringSQL connection string / DSN. Required when enabled.
transcripts.table_namestring"call_transcripts"SQL table to write transcript rows into.

Plugins

FieldTypeDefaultDescription
plugins[]string[]List of plugin names to activate (e.g. ["echo", "frame_filter"]).
plugin_optionsmap[string]json.RawMessagePer-plugin JSON configuration. Key is the plugin name; value is a raw JSON object passed to the plugin on init.

MCP (Model Context Protocol)

FieldTypeDefaultDescription
mcp.commandstringMCP server executable (e.g. "npx", "go").
mcp.args[]stringArguments to the MCP server command (e.g. ["-y", "mcp-server"]).
mcp.tools_filter[]stringWhen non-empty, only the listed tool names are registered with the LLM. All tools are registered when this field is omitted.

Minimal Config Example

{
  "host": "0.0.0.0",
  "port": 8080,
  "provider": "openai",
  "model": "gpt-4.1-mini",
  "api_keys": {
    "openai": "sk-..."
  }
}

Full Example with Common Settings

{
  "host": "0.0.0.0",
  "port": 8080,

  "stt_provider": "groq",
  "llm_provider": "openai",
  "tts_provider": "openai",

  "model": "gpt-4.1-mini",
  "stt_model": "whisper-large-v3-turbo",

  "transport": "both",
  "webrtc_ice_servers": [
    "stun:stun.l.google.com:19302"
  ],

  "turn_detection": "silence",
  "turn_stop_secs": 3.0,
  "vad_type": "energy",
  "vad_threshold": 0.01,
  "vad_min_volume": 0.25,

  "allow_interruptions": true,

  "log_level": "info",
  "json_logs": true,

  "metrics_enabled": true,
  "server_api_key": "",
  "cors_allowed_origins": [],
  "max_request_body_bytes": 262144,

  "plugins": []
}
In production, omit all api_keys entries from config.json and set the corresponding environment variables instead (e.g. OPENAI_API_KEY, GROQ_API_KEY). This keeps secrets out of config files that might be checked into version control or shipped in container images.