Provider Matrix
Each row is a provider supported by Voxray. Checkmarks indicate which pipeline stages the provider covers. The API Key Env Var column shows the environment variable Voxray reads when no key is set inapi_keys.
| Provider | STT | LLM | TTS | Realtime | API Key Env Var |
|---|---|---|---|---|---|
| OpenAI | ✓ | ✓ | ✓ | ✓ | OPENAI_API_KEY |
| Anthropic | ✓ | ANTHROPIC_API_KEY | |||
| Groq | ✓ | ✓ | ✓ | GROQ_API_KEY | |
| Grok (xAI) | ✓ | XAI_API_KEY | |||
| Cerebras | ✓ | CEREBRAS_API_KEY | |||
| Mistral | ✓ | MISTRAL_API_KEY | |||
| DeepSeek | ✓ | DEEPSEEK_API_KEY | |||
| AWS | ✓ | ✓ | ✓ | AWS SDK credential chain | |
| Google (Gemini) | ✓ | ✓ | ✓ | GOOGLE_API_KEY | |
| Google Vertex AI | ✓ | Application Default Credentials | |||
| Ollama | ✓ | OLLAMA_API_KEY (optional) | |||
| Qwen (Dashscope) | ✓ | DASHSCOPE_API_KEY | |||
| AsyncAI | ✓ | ASYNC_AI_API_KEY | |||
| Fish | ✓ | FISH_API_KEY | |||
| Inworld | ✓ | ✓ | INWORLD_API_KEY | ||
| Minimax | ✓ | ✓ | MINIMAX_API_KEY | ||
| Moondream | ✓ | MOONDREAM_API_KEY | |||
| OpenPipe | ✓ | OPENPIPE_API_KEY | |||
| ElevenLabs | ✓ | ✓ | ELEVENLABS_API_KEY | ||
| Sarvam | ✓ | ✓ | SARVAM_API_KEY | ||
| Hume | ✓ | ✓ | HUME_API_KEY | ||
| Neuphonic | ✓ | NEUPHONIC_API_KEY | |||
| XTTS | ✓ | XTTS_API_KEY (self-hosted) | |||
| Whisper | ✓ | WHISPER_API_KEY or OPENAI_API_KEY | |||
| Camb | ✓ | CAMB_API_KEY | |||
| Gradium | ✓ | GRADIUM_API_KEY | |||
| Soniox | ✓ | SONIOX_API_KEY |
Google Vertex AI uses Application Default Credentials (ADC) rather than an API key. Set
GOOGLE_CLOUD_PROJECT and optionally GOOGLE_CLOUD_LOCATION (default: us-central1). AWS similarly uses the SDK credential chain — see the AWS integration guide for details.Setting API Keys
API keys can be provided two ways. Environment variables take precedence when both are set. Config file (api_keys object):
Mix-and-Match Providers
STT, LLM, and TTS are configured independently withstt_provider, llm_provider, and tts_provider. There is no requirement to use the same vendor for all three stages. Use provider as a fallback when you want the same vendor across all stages without repeating it.
Recommended Combinations
Budget — Groq across the board
Budget — Groq across the board
All three stages run on Groq. One API key covers STT, LLM, and TTS. Groq’s free tier is generous for development and light production use.
Quality — Groq STT + Anthropic LLM + ElevenLabs TTS
Quality — Groq STT + Anthropic LLM + ElevenLabs TTS
Fast transcription from Groq, high-quality reasoning from Anthropic Claude, and expressive voice synthesis from ElevenLabs. A common production stack for voice agents where quality matters more than cost.
Local — Ollama LLM + Groq STT + Sarvam TTS
Local — Ollama LLM + Groq STT + Sarvam TTS
Run the LLM entirely on-premises with Ollama. Only a Groq API key is required for STT; Sarvam handles TTS. Suitable for air-gapped or privacy-sensitive deployments where the LLM cannot leave your network.Start Ollama locally with
ollama serve before running Voxray. No Sarvam key is needed for TTS if you are using a self-hosted Sarvam-compatible endpoint.AWS-native — Transcribe + Bedrock + Polly
AWS-native — Transcribe + Bedrock + Polly
All three stages run inside AWS. A single IAM policy covers all required permissions. Ideal for AWS-first infrastructure where data residency and VPC network paths are a priority.See the AWS integration guide for IAM policy details and Bedrock model access setup.
Indian languages — Sarvam STT + Groq LLM + Sarvam TTS
Indian languages — Sarvam STT + Groq LLM + Sarvam TTS
Sarvam provides first-class support for Hindi, Tamil, Telugu, Bengali, Kannada, and other Indian languages. Groq handles LLM inference. This stack is production-ready for Indian-language voice agents.
Integration Guides
OpenAI
GPT-4.1, Whisper, and TTS voices. The default provider when no provider is configured.
Anthropic
Claude 3 and Claude 3.5 models via the Anthropic API.
Groq
Ultra-fast LPU inference for STT, LLM, and TTS on a single API key.
Ollama
Run open-weight models locally or on-prem. No API key required.
ElevenLabs
High-quality neural TTS with voice cloning and 30+ languages.
Sarvam
STT and TTS optimized for Hindi and other Indian languages.
AWS
Amazon Transcribe, Bedrock, and Polly via the AWS SDK credential chain.
Not Listed?
If you need a provider that is not in the matrix above, Voxray’s provider system is designed to be extended. Each provider is a small Go package that implements one or more of theLLMService, STTService, or TTSService interfaces defined in pkg/services.
To add a new provider:
- Create a package under
pkg/services/<provider-name>/implementing the relevant interface(s). - Register the provider constant and add it to the appropriate
Supported*Providersslice inpkg/services/factory.go. - Add a case to
NewLLMFromConfig,NewSTTFromConfig, orNewTTSFromConfig(whichever applies). - Add the API key resolution case to
apiKeyForProviderin the same file.