Skip to main content

Provider Matrix

Each row is a provider supported by Voxray. Checkmarks indicate which pipeline stages the provider covers. The API Key Env Var column shows the environment variable Voxray reads when no key is set in api_keys.
ProviderSTTLLMTTSRealtimeAPI Key Env Var
OpenAIOPENAI_API_KEY
AnthropicANTHROPIC_API_KEY
GroqGROQ_API_KEY
Grok (xAI)XAI_API_KEY
CerebrasCEREBRAS_API_KEY
MistralMISTRAL_API_KEY
DeepSeekDEEPSEEK_API_KEY
AWSAWS SDK credential chain
Google (Gemini)GOOGLE_API_KEY
Google Vertex AIApplication Default Credentials
OllamaOLLAMA_API_KEY (optional)
Qwen (Dashscope)DASHSCOPE_API_KEY
AsyncAIASYNC_AI_API_KEY
FishFISH_API_KEY
InworldINWORLD_API_KEY
MinimaxMINIMAX_API_KEY
MoondreamMOONDREAM_API_KEY
OpenPipeOPENPIPE_API_KEY
ElevenLabsELEVENLABS_API_KEY
SarvamSARVAM_API_KEY
HumeHUME_API_KEY
NeuphonicNEUPHONIC_API_KEY
XTTSXTTS_API_KEY (self-hosted)
WhisperWHISPER_API_KEY or OPENAI_API_KEY
CambCAMB_API_KEY
GradiumGRADIUM_API_KEY
SonioxSONIOX_API_KEY
Google Vertex AI uses Application Default Credentials (ADC) rather than an API key. Set GOOGLE_CLOUD_PROJECT and optionally GOOGLE_CLOUD_LOCATION (default: us-central1). AWS similarly uses the SDK credential chain — see the AWS integration guide for details.

Setting API Keys

API keys can be provided two ways. Environment variables take precedence when both are set. Config file (api_keys object):
{
  "api_keys": {
    "openai": "sk-...",
    "groq": "gsk_...",
    "elevenlabs": "el_..."
  }
}
Environment variables:
export OPENAI_API_KEY=sk-...
export GROQ_API_KEY=gsk_...
export ELEVENLABS_API_KEY=el_...

Mix-and-Match Providers

STT, LLM, and TTS are configured independently with stt_provider, llm_provider, and tts_provider. There is no requirement to use the same vendor for all three stages. Use provider as a fallback when you want the same vendor across all stages without repeating it.
{
  "stt_provider": "groq",
  "llm_provider": "anthropic",
  "tts_provider": "elevenlabs"
}
All three stages run on Groq. One API key covers STT, LLM, and TTS. Groq’s free tier is generous for development and light production use.
{
  "provider": "groq",
  "model": "llama-3.1-8b-instant",
  "stt_model": "whisper-large-v3-turbo",
  "api_keys": {
    "groq": "gsk_..."
  }
}
Fast transcription from Groq, high-quality reasoning from Anthropic Claude, and expressive voice synthesis from ElevenLabs. A common production stack for voice agents where quality matters more than cost.
{
  "stt_provider": "groq",
  "llm_provider": "anthropic",
  "tts_provider": "elevenlabs",
  "model": "claude-3-5-sonnet-20241022",
  "tts_voice": "Rachel",
  "api_keys": {
    "groq": "gsk_...",
    "anthropic": "sk-ant-...",
    "elevenlabs": "el_..."
  }
}
Run the LLM entirely on-premises with Ollama. Only a Groq API key is required for STT; Sarvam handles TTS. Suitable for air-gapped or privacy-sensitive deployments where the LLM cannot leave your network.
{
  "stt_provider": "groq",
  "llm_provider": "ollama",
  "tts_provider": "sarvam",
  "model": "llama3.2",
  "api_keys": {
    "groq": "gsk_..."
  }
}
Start Ollama locally with ollama serve before running Voxray. No Sarvam key is needed for TTS if you are using a self-hosted Sarvam-compatible endpoint.
All three stages run inside AWS. A single IAM policy covers all required permissions. Ideal for AWS-first infrastructure where data residency and VPC network paths are a priority.
{
  "stt_provider": "aws",
  "llm_provider": "aws",
  "tts_provider": "aws",
  "model": "anthropic.claude-3-haiku-20240307-v1:0",
  "tts_voice": "Joanna",
  "api_keys": {
    "aws_region": "us-east-1"
  }
}
See the AWS integration guide for IAM policy details and Bedrock model access setup.
Sarvam provides first-class support for Hindi, Tamil, Telugu, Bengali, Kannada, and other Indian languages. Groq handles LLM inference. This stack is production-ready for Indian-language voice agents.
{
  "stt_provider": "sarvam",
  "llm_provider": "groq",
  "tts_provider": "sarvam",
  "model": "llama-3.1-8b-instant",
  "stt_model": "saarika:v2.5",
  "stt_language": "hi-IN",
  "tts_model": "bulbul:v2",
  "tts_voice": "anushka",
  "api_keys": {
    "sarvam": "...",
    "groq": "gsk_..."
  }
}

Integration Guides

OpenAI

GPT-4.1, Whisper, and TTS voices. The default provider when no provider is configured.

Anthropic

Claude 3 and Claude 3.5 models via the Anthropic API.

Groq

Ultra-fast LPU inference for STT, LLM, and TTS on a single API key.

Ollama

Run open-weight models locally or on-prem. No API key required.

ElevenLabs

High-quality neural TTS with voice cloning and 30+ languages.

Sarvam

STT and TTS optimized for Hindi and other Indian languages.

AWS

Amazon Transcribe, Bedrock, and Polly via the AWS SDK credential chain.

Not Listed?

If you need a provider that is not in the matrix above, Voxray’s provider system is designed to be extended. Each provider is a small Go package that implements one or more of the LLMService, STTService, or TTSService interfaces defined in pkg/services. To add a new provider:
  1. Create a package under pkg/services/<provider-name>/ implementing the relevant interface(s).
  2. Register the provider constant and add it to the appropriate Supported*Providers slice in pkg/services/factory.go.
  3. Add a case to NewLLMFromConfig, NewSTTFromConfig, or NewTTSFromConfig (whichever applies).
  4. Add the API key resolution case to apiKeyForProvider in the same file.
See the contributing guide for the full walkthrough, interface signatures, and a reference implementation you can use as a starting point.