Skip to main content

LLM

Not supported

STT

Supported

TTS

Supported
ElevenLabs is a voice AI platform known for its realistic, emotionally expressive text-to-speech and its Scribe transcription engine. Voxray supports ElevenLabs for both STT (via "stt_provider": "elevenlabs") and TTS (via "tts_provider": "elevenlabs"). You can use one or both independently — pair ElevenLabs TTS with Groq STT, for example, if you want fast transcription with premium voice output.

API Key

Get your API key from the ElevenLabs dashboard under Profile → API Keys.
{
  "api_keys": {
    "elevenlabs": "your_elevenlabs_api_key"
  }
}

Quick Start Config

{
  "stt_provider": "elevenlabs",
  "tts_provider": "elevenlabs",
  "tts_voice": "<voice-id-from-elevenlabs>",
  "llm_provider": "openai",
  "model": "gpt-4.1-mini",
  "api_keys": {
    "elevenlabs": "your_elevenlabs_api_key",
    "openai": "sk-..."
  }
}
ElevenLabs does not provide LLM functionality. You must set llm_provider to a supported LLM provider (e.g. openai, anthropic, groq, ollama) when using ElevenLabs for STT and/or TTS.

Text-to-Speech

ElevenLabs TTS produces highly realistic, emotionally nuanced speech with a wide selection of pre-built voices. It is one of the highest-quality TTS providers available for voice agents.

Streaming

ElevenLabs implements Voxray’s TTSStreamingService interface. When tts_provider is set to "elevenlabs", Voxray streams audio chunks as they are generated rather than waiting for the full response. This significantly reduces the time-to-first-audio compared to non-streaming TTS providers.

Selecting a Voice

Set "tts_voice" to the voice ID string from the ElevenLabs Voice Library. Voice IDs are UUIDs, not display names.
{
  "tts_provider": "elevenlabs",
  "tts_voice": "21m00Tcm4TlvDq8ikWAM"
}

Finding Voice IDs

  1. Log in to elevenlabs.io
  2. Go to Voices in the left sidebar
  3. Click any voice to open it
  4. Click Copy Voice ID — the ID is a string like 21m00Tcm4TlvDq8ikWAM
Pre-built voices available across all plans include Rachel, Domi, Bella, Antoni, Elli, Josh, Arnold, Adam, and Sam. The full Voice Library (including community voices) is accessible from your dashboard.

Voice Cloning

ElevenLabs supports custom voice clones from audio samples. Once cloned, the voice appears in your Voice Library with its own voice ID. Use it in Voxray exactly like any pre-built voice:
{
  "tts_provider": "elevenlabs",
  "tts_voice": "your_cloned_voice_id"
}
For production deployments, create a custom voice clone to ensure brand consistency. A cloned voice sounds identical to your source speaker and is tied to your account — it will not appear for other users.

TTS Model

You can optionally specify a TTS model via "tts_model". ElevenLabs offers several models with different latency and quality trade-offs:
{
  "tts_provider": "elevenlabs",
  "tts_voice": "21m00Tcm4TlvDq8ikWAM",
  "tts_model": "eleven_turbo_v2_5"
}
ModelLatencyQualityNotes
eleven_turbo_v2_5LowestHighRecommended for real-time voice agents
eleven_multilingual_v2MediumHighest29 languages, best naturalness
eleven_monolingual_v1LowHighEnglish only
When tts_model is not set, Voxray passes an empty string and ElevenLabs uses its default model.

Speech-to-Text

ElevenLabs provides transcription via Scribe, its proprietary speech recognition engine. Set "stt_provider": "elevenlabs" to use it.
{
  "stt_provider": "elevenlabs",
  "stt_model": "scribe_v1"
}
Specify the model with "stt_model". If omitted, Voxray passes an empty string and ElevenLabs uses its default transcription model.

Full Pipeline Examples

ElevenLabs STT + TTS with OpenAI LLM

{
  "transport": "both",
  "host": "0.0.0.0",
  "port": 8080,

  "stt_provider": "elevenlabs",
  "stt_model": "scribe_v1",

  "llm_provider": "openai",
  "model": "gpt-4.1-mini",

  "tts_provider": "elevenlabs",
  "tts_voice": "21m00Tcm4TlvDq8ikWAM",
  "tts_model": "eleven_turbo_v2_5",

  "api_keys": {
    "elevenlabs": "your_elevenlabs_api_key",
    "openai": "sk-..."
  },

  "webrtc_ice_servers": [
    "stun:stun.l.google.com:19302"
  ]
}

Groq STT + ElevenLabs TTS (mixed providers)

{
  "stt_provider": "groq",

  "llm_provider": "anthropic",
  "model": "claude-3-5-haiku-20241022",

  "tts_provider": "elevenlabs",
  "tts_voice": "21m00Tcm4TlvDq8ikWAM",
  "tts_model": "eleven_turbo_v2_5",

  "api_keys": {
    "groq": "gsk_...",
    "anthropic": "sk-ant-...",
    "elevenlabs": "your_elevenlabs_api_key"
  }
}
Mixing providers is fully supported. Each stage (STT, LLM, TTS) is resolved independently from its own *_provider key.

Troubleshooting

SymptomLikely CauseFix
401 UnauthorizedInvalid or missing API keyVerify ELEVENLABS_API_KEY or api_keys.elevenlabs
404 Not Found on TTSInvalid voice IDCopy the voice ID from the ElevenLabs dashboard
High TTS latencyUsing non-turbo modelSwitch tts_model to eleven_turbo_v2_5
Poor transcription accuracyBackground noise or low sample rateEnsure clean 16kHz+ audio input from the client
Voice sounds different than previewModel mismatchMatch the model used when previewing the voice in the dashboard