STT
Amazon Transcribe — real-time streaming transcription
LLM
Amazon Bedrock — Claude, Titan, Llama, and more
TTS
Amazon Polly — neural voices in 30+ languages
Overview
AWS is unique among Voxray providers: it uses the AWS SDK v2 credential chain rather than a single bearer token. All three pipeline stages — speech recognition, language model inference, and speech synthesis — resolve credentials the same way and share a single region setting. Set"stt_provider", "llm_provider", and "tts_provider" to "aws" in your config to activate all three, or mix AWS with other providers as needed.
Authentication
- Environment Variables
- ~/.aws/credentials
- IAM Role (Recommended for Production)
The simplest method for local development and CI pipelines. Set the following before starting Voxray:The SDK picks these up automatically; no config change is needed.
Region Configuration
All AWS services require a region. Voxray resolves the region in this order:aws_regionkey insideapi_keysin your config fileAWS_REGIONenvironment variable- Default:
us-east-1
Quick Start Config
The minimal configuration to run a fully AWS-native voice pipeline:aws key is needed inside api_keys — credentials come from the SDK credential chain described above. The aws key in api_keys is accepted for legacy compatibility but is not used for authentication.
Amazon Bedrock (LLM)
Amazon Bedrock provides access to foundation models from Anthropic, Amazon, Meta, and others via a unified API. You must enable model access in the Bedrock console for each model ID you use; models are not accessible by default.Supported Models
| Model ID | Family | Notes |
|---|---|---|
anthropic.claude-3-haiku-20240307-v1:0 | Anthropic Claude 3 | Fastest Claude 3; lowest latency |
anthropic.claude-3-sonnet-20240229-v1:0 | Anthropic Claude 3 | Balanced quality and speed |
anthropic.claude-3-5-sonnet-20240620-v1:0 | Anthropic Claude 3.5 | Highest quality Claude on Bedrock |
amazon.titan-text-express-v1 | Amazon Titan | Amazon’s own foundation model |
meta.llama3-8b-instruct-v1:0 | Meta Llama 3 | Open-weight 8B instruction model |
meta.llama3-70b-instruct-v1:0 | Meta Llama 3 | Open-weight 70B instruction model |
Model availability varies by region. Check the Bedrock console in your target region to confirm a model is available and request access before deploying.
Amazon Transcribe (STT)
Amazon Transcribe streams audio in real time and returns partial and final transcription results. Voxray uses theStartStreamTranscription API over a bidirectional WebSocket — no intermediate file uploads are required.
Key characteristics:
- Real-time streaming transcription with incremental results
- Supports 30+ languages including en-US, en-GB, es-US, fr-FR, de-DE, ja-JP, hi-IN, pt-BR, and more
- Automatic punctuation and number normalization
- No per-request cold start once the stream is open
stt_language config key (e.g. "en-US").
Amazon Polly (TTS)
Amazon Polly synthesizes text to speech using neural voices. Voxray streams the audio output directly into the pipeline without buffering the full response. Key characteristics:- Neural TTS engine with natural-sounding voices
- 30+ languages and 60+ voices
- Low-latency synthesis suitable for real-time voice agents
Voice Selection
Set the voice ID viatts_voice in your config. If no voice is set, Voxray defaults to Joanna.
Common Voice IDs
| Voice ID | Language | Gender |
|---|---|---|
Joanna | en-US | Female |
Matthew | en-US | Male |
Amy | en-GB | Female |
Brian | en-GB | Male |
Celine | fr-FR | Female |
Vicki | de-DE | Female |
Mizuki | ja-JP | Female |
Lucia | es-ES | Female |
Camila | pt-BR | Female |
Required IAM Permissions
Attach these permissions to the IAM role or user that Voxray runs as. The principle of least privilege is applied — only the specific actions each service requires are listed.| Permission | Service | Purpose |
|---|---|---|
transcribe:StartStreamTranscription | Amazon Transcribe | Open a real-time transcription stream |
bedrock:InvokeModelWithResponseStream | Amazon Bedrock | Streaming LLM inference |
polly:SynthesizeSpeech | Amazon Polly | Neural TTS synthesis |
s3:PutObject on your recording bucket to the same policy.