Observability - Voxray

Health endpoints

Voxray exposes two HTTP endpoints for health checking. They are unauthenticated and designed to be called directly by load balancers and orchestrators.

Endpoint	Purpose	Success	Failure condition
`GET /health`	Liveness — is the process alive and responsive?	`200 OK`	Process not running or HTTP server unresponsive
`GET /ready`	Readiness — can the instance accept traffic?	`200 OK`	`503 Service Unavailable` if Redis is unreachable (when `session_store=redis`)

Use /health as the liveness probe and /ready as the readiness probe. Do not swap them: a failing Redis should remove the pod from the load balancer’s rotation (/ready), but it should not cause Kubernetes to restart the pod (/health).

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 2
  periodSeconds: 5
  failureThreshold: 2

In single-instance mode with the default in-memory session store, /ready always returns 200 OK as long as the process is up. The 503 behavior is only active when session_store=redis.

Prometheus metrics

Voxray exposes metrics at GET /metrics in Prometheus text exposition format. Point a Prometheus scrape job at this endpoint.

Enabling and disabling

Metrics are enabled by default. To disable:

{
  "metrics_enabled": false
}

When disabled, the /metrics endpoint returns 404. Re-enable by removing the key or setting it to true.

Prometheus scrape config

scrape_configs:
  - job_name: voxray
    static_configs:
      - targets:
          - "voxray-host:8080"
    metrics_path: /metrics
    scrape_interval: 15s

For Kubernetes, use the Prometheus Operator ServiceMonitor or annotate the pod with prometheus.io/scrape: "true" and prometheus.io/port: "8080".

/metrics is unauthenticated by default. It exposes internal performance data that could reveal information about your AI providers, session volumes, and error rates. Restrict access using a firewall rule, nginx allow/deny directives, a Kubernetes NetworkPolicy, or a VPN. Do not expose /metrics directly to the public internet.

location /metrics {
    allow 10.0.0.0/8;
    deny  all;
    proxy_pass http://voxray_upstream;
}

Metric reference

HTTP metrics

These metrics cover all HTTP traffic into Voxray, including WebSocket upgrade requests and REST endpoints.

Metric	Type	Labels	Description
`http_requests_total`	Counter	`method`, `route`, `status_code`	Total HTTP requests received
`http_request_duration_seconds`	Histogram	`method`, `route`, `status_code`	Request duration from receipt to response
`http_active_connections`	Gauge	`route`	Current number of active HTTP connections (including open WebSocket sessions)
`http_errors_total`	Counter	`method`, `route`, `error_type`	Total HTTP errors, broken down by error type
`http_timeout_total`	Counter	`method`, `route`	Total requests that timed out before completion

AI pipeline metrics

Voxray instruments every stage of the STT → LLM → TTS pipeline with latency histograms and error counters. All pipeline metrics carry a model label so you can compare provider performance without separate dashboards.

Speech-to-text (STT)

Metric	Type	Description
`stt_time_to_first_token_seconds`	Histogram	Time from audio stream start to first transcription token
`stt_transcription_latency_seconds`	Histogram	End-to-end transcription latency per utterance
`stt_streaming_lag_seconds`	Histogram	Lag between audio frame arrival and transcript emission
`stt_errors_total`	Counter	STT errors by `error_type`
`stt_fallback_total`	Counter	Fallback invocations (e.g., primary STT failed, secondary used)

Large language model (LLM)

Metric	Type	Description
`llm_time_to_first_token_seconds`	Histogram	Time from prompt submission to first streamed token
`llm_generation_latency_seconds`	Histogram	Full LLM generation latency (first to last token)
`llm_inter_token_latency_seconds`	Histogram	Latency between consecutive streamed tokens
`llm_errors_total`	Counter	LLM errors by `error_type`
`llm_retries_total`	Counter	Automatic retry attempts
`llm_fallback_total`	Counter	Fallback invocations

Text-to-speech (TTS)

Metric	Type	Description
`tts_time_to_first_audio_chunk_seconds`	Histogram	Time from text-in to first audio chunk delivered to the client
`tts_synthesis_latency_seconds`	Histogram	Full TTS synthesis latency (text-in to final audio frame)
`tts_streaming_lag_seconds`	Histogram	Lag between text input and audio output for streaming TTS
`tts_errors_total`	Counter	TTS errors by `error_type`
`tts_fallback_total`	Counter	Fallback invocations

WebRTC metrics

Metric	Type	Description
`webrtc_peer_connections_total`	Counter	Total peer connections by `state` (connected, failed, closed)
`webrtc_peer_connections_active`	Gauge	Current active peer connections
`webrtc_bytes_sent_total`	Counter	Total bytes sent over WebRTC
`webrtc_bytes_received_total`	Counter	Total bytes received over WebRTC
`webrtc_connection_failures_total`	Counter	Connection failures by `reason`
`webrtc_reconnection_attempts_total`	Counter	Reconnection attempts per session

Recording metrics

Metric	Type	Description
`recording_jobs_enqueued_total`	Counter	Total recording upload jobs placed into the worker queue
`recording_jobs_success_total`	Counter	Total jobs that completed successfully
`recording_jobs_failed_total`	Counter	Total jobs that exhausted all retries and failed
`recording_queue_depth`	Gauge	Current number of pending jobs in the upload queue

Label cardinality

The session_id label is passed through SampledSessionID() before being applied to any metric. This function either SHA-256 hashes the raw ID to a fixed-length hex string, or returns the constant "sampled_out" when the configured sample rate causes the session to be excluded. This prevents high-cardinality session IDs from creating unbounded time-series in Prometheus. You do not need to configure this separately — it is applied automatically inside the metrics package.

The label set across all metrics is: session_id (hashed/sampled), stage, direction, status, model. For HTTP metrics, method, route, and status_code replace the pipeline-specific labels.

Alerting

The following alerts cover the most operationally significant failure modes. Add them to your Prometheus alerting rules or import them into Grafana.

Alert	Condition	Severity	Interpretation
LLM provider degraded	`histogram_quantile(0.95, rate(llm_time_to_first_token_seconds_bucket[5m])) > 2`	Warning	p95 time-to-first-token exceeds 2 seconds — LLM provider is slow or overloaded
Instance near connection limit	`http_active_connections > 80` (per instance)	Warning	Approaching capacity; scale out or investigate connection leaks
S3 upload failures	`rate(recording_jobs_failed_total[5m]) > 0`	Critical	Recording uploads are failing after all retries; check S3 credentials and connectivity
High 5xx error rate	`rate(http_errors_total{error_type=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.01`	Warning	More than 1% of requests are returning server errors; investigate pipeline logs
Recording queue saturation	`recording_queue_depth > 28`	Warning	Queue approaching capacity (default 32); workers cannot keep up with upload volume
STT provider errors	`rate(stt_errors_total[5m]) > 0.1`	Warning	STT error rate rising; may degrade transcription quality

# prometheus/alerts/voxray.yml
groups:
  - name: voxray
    rules:
      - alert: VoxrayLLMHighTTFT
        expr: histogram_quantile(0.95, rate(llm_time_to_first_token_seconds_bucket[5m])) > 2
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "LLM p95 time-to-first-token above 2s"
          description: "LLM provider may be degraded. Current p95: {{ $value }}s"

      - alert: VoxrayRecordingFailures
        expr: rate(recording_jobs_failed_total[5m]) > 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Recording upload jobs failing"
          description: "S3 upload failures detected. Check credentials and bucket policy."

      - alert: VoxrayHighErrorRate
        expr: rate(http_errors_total[5m]) / rate(http_requests_total[5m]) > 0.01
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "HTTP error rate above 1%"
          description: "Error rate: {{ $value | humanizePercentage }}"

Structured logging

Log format

Development (text)
Production (JSON)

The default log format is human-readable text, suitable for local development and tailing with docker logs or kubectl logs.

2026/05/15 12:34:56 INFO session started session_id=sess_abc123 transport=websocket
2026/05/15 12:34:57 INFO stt transcript received session_id=sess_abc123 text="hello"

Enable JSON logging for structured log shipping:

{
  "json_logs": true,
  "log_level": "info"
}

Or via environment variables:

VOXRAY_JSON_LOGS=true
VOXRAY_LOG_LEVEL=info

Each log line becomes a single JSON object:

{"time":"2026-05-15T12:34:56Z","level":"INFO","msg":"session started","session_id":"sess_abc123","transport":"websocket"}
{"time":"2026-05-15T12:34:57Z","level":"INFO","msg":"stt transcript received","session_id":"sess_abc123","text":"hello"}

Log levels

Level	Config value	Use case
Debug	`"log_level": "debug"`	Verbose frame-by-frame logging; not for production
Info	`"log_level": "info"`	Session lifecycle, provider calls, errors — recommended for production
Warn	`"log_level": "warn"`	Only warnings and errors
Error	`"log_level": "error"`	Errors only; minimal output

Override at runtime without redeploying by setting VOXRAY_LOG_LEVEL in the environment. The environment variable takes precedence over the config file value.

Log shipping

Voxray writes logs to stdout. Ship them from stdout to your preferred backend using any standard log collector:

Grafana Loki
Fluentd / Fluent Bit
AWS CloudWatch
Datadog

Use the Promtail agent or the Loki Docker driver to tail container stdout and push to Loki. With json_logs: true, Promtail can parse fields automatically using json pipeline stages:

# promtail pipeline stage
pipeline_stages:
  - json:
      expressions:
        level: level
        session_id: session_id
  - labels:
      level:
      session_id:

Point Fluent Bit’s tail input at the container log file (e.g., /var/log/containers/voxray-*.log) and use the json parser. Forward to Elasticsearch, Splunk, or any Fluentd output plugin.

[INPUT]
    Name   tail
    Path   /var/log/containers/voxray-*.log
    Parser json

[OUTPUT]
    Name  es
    Match *
    Host  elasticsearch.internal
    Port  9200
    Index voxray-logs

Use the CloudWatch Logs agent or the awslogs Docker log driver:

{
  "log-driver": "awslogs",
  "log-opts": {
    "awslogs-group": "/voxray/production",
    "awslogs-region": "us-east-1",
    "awslogs-stream-prefix": "voxray"
  }
}

CloudWatch Insights can then query JSON fields directly, e.g.:

fields @timestamp, session_id, msg
| filter level = "ERROR"
| sort @timestamp desc
| limit 50

Enable the Datadog Agent log collection and set json_logs: true in Voxray config. Datadog automatically parses JSON log attributes into facets for filtering and alerting.

# datadog agent log config
logs:
  - type: docker
    service: voxray
    source: go
    log_processing_rules:
      - type: multi_line
        name: new_log_start
        pattern: '^\{'

Always run with json_logs: true and log_level: info in production. Text logs are harder to parse programmatically, and debug level generates very high log volume (one entry per audio frame in some paths).

​Health endpoints

​Prometheus metrics

​Enabling and disabling

​Prometheus scrape config

​Metric reference

​HTTP metrics

​AI pipeline metrics

​Speech-to-text (STT)

​Large language model (LLM)

​Text-to-speech (TTS)

​WebRTC metrics

​Recording metrics

​Label cardinality

​Alerting

​Structured logging

​Log format

​Log levels

​Log shipping

Health endpoints

Prometheus metrics

Enabling and disabling

Prometheus scrape config

Metric reference

HTTP metrics

AI pipeline metrics

Speech-to-text (STT)

Large language model (LLM)

Text-to-speech (TTS)

WebRTC metrics

Recording metrics

Label cardinality

Alerting

Structured logging

Log format

Log levels

Log shipping