LLM
Not supported
STT
Supported
TTS
Supported
"stt_provider": "elevenlabs") and TTS (via "tts_provider": "elevenlabs"). You can use one or both independently — pair ElevenLabs TTS with Groq STT, for example, if you want fast transcription with premium voice output.
API Key
Get your API key from the ElevenLabs dashboard under Profile → API Keys.- config.json
- Environment Variable
Quick Start Config
- config.json
- Environment Variables
ElevenLabs does not provide LLM functionality. You must set
llm_provider to a supported LLM provider (e.g. openai, anthropic, groq, ollama) when using ElevenLabs for STT and/or TTS.Text-to-Speech
ElevenLabs TTS produces highly realistic, emotionally nuanced speech with a wide selection of pre-built voices. It is one of the highest-quality TTS providers available for voice agents.Streaming
ElevenLabs implements Voxray’sTTSStreamingService interface. When tts_provider is set to "elevenlabs", Voxray streams audio chunks as they are generated rather than waiting for the full response. This significantly reduces the time-to-first-audio compared to non-streaming TTS providers.
Selecting a Voice
Set"tts_voice" to the voice ID string from the ElevenLabs Voice Library. Voice IDs are UUIDs, not display names.
Finding Voice IDs
- Log in to elevenlabs.io
- Go to Voices in the left sidebar
- Click any voice to open it
- Click Copy Voice ID — the ID is a string like
21m00Tcm4TlvDq8ikWAM
Voice Cloning
ElevenLabs supports custom voice clones from audio samples. Once cloned, the voice appears in your Voice Library with its own voice ID. Use it in Voxray exactly like any pre-built voice:TTS Model
You can optionally specify a TTS model via"tts_model". ElevenLabs offers several models with different latency and quality trade-offs:
| Model | Latency | Quality | Notes |
|---|---|---|---|
eleven_turbo_v2_5 | Lowest | High | Recommended for real-time voice agents |
eleven_multilingual_v2 | Medium | Highest | 29 languages, best naturalness |
eleven_monolingual_v1 | Low | High | English only |
tts_model is not set, Voxray passes an empty string and ElevenLabs uses its default model.
Speech-to-Text
ElevenLabs provides transcription via Scribe, its proprietary speech recognition engine. Set"stt_provider": "elevenlabs" to use it.
"stt_model". If omitted, Voxray passes an empty string and ElevenLabs uses its default transcription model.
Full Pipeline Examples
ElevenLabs STT + TTS with OpenAI LLM
Groq STT + ElevenLabs TTS (mixed providers)
*_provider key.
Troubleshooting
| Symptom | Likely Cause | Fix |
|---|---|---|
401 Unauthorized | Invalid or missing API key | Verify ELEVENLABS_API_KEY or api_keys.elevenlabs |
404 Not Found on TTS | Invalid voice ID | Copy the voice ID from the ElevenLabs dashboard |
| High TTS latency | Using non-turbo model | Switch tts_model to eleven_turbo_v2_5 |
| Poor transcription accuracy | Background noise or low sample rate | Ensure clean 16kHz+ audio input from the client |
| Voice sounds different than preview | Model mismatch | Match the model used when previewing the voice in the dashboard |