LLMTextFrame, which the downstream TTS stage converts to audio and streams to your client. You hear what you typed.
Use this as a smoke test after installation, before you add real providers.
How the echo processor works
Voxray pipelines are chains of processors. Each processor receives frames from upstream and emits frames downstream. Theecho processor does exactly one thing: for every TextFrame it receives, it emits an LLMTextFrame with the same content.
LLMTextFrames, the echo text is spoken back over the same connection. With a WebSocket text-only client you will receive the echoed text frame directly; with a voice client you hear it as synthesized speech.
The echo plugin is intentionally stateless. It does not maintain conversation history, call any API, or buffer audio. Each
TextFrame is echoed independently.What this tutorial validates
Running the echo bot confirms all of the following before you spend a single API credit:- The Voxray binary builds and starts correctly on your machine
- The WebSocket transport accepts connections on the configured port
- The pipeline frame-passing machinery is functioning (frames flow from source to processor to sink)
- Your client can send frames and receive frames back over the same connection
- Audio encoding and decoding is working if you use a voice client
Prerequisites
Step-by-step
Create the echo bot config
Create a file named
echo-config.json in the project root with the following content. This is the minimum valid config for the echo bot — no API keys, no provider settings.echo-config.json
| Field | Value | Reason |
|---|---|---|
host | 0.0.0.0 | Listen on all interfaces so you can connect from localhost |
port | 3042 | Any free port works; 3042 is the Voxray default |
transport | websocket | WebSocket only — no CGO or Opus required |
plugins | ["echo"] | Activates the echo processor in the pipeline |
Unknown keys in the config JSON are silently ignored. You do not need
stt_provider, llm_provider, tts_provider, or api_keys for the echo bot. The server starts cleanly without them.Start the server
Run the binary with the echo config:You should see startup logs similar to:The
/health and /ready endpoints are always available:Connect with a WebSocket client
Any WebSocket client works. The simplest option is Alternatively, use the example Go client from the repository. It connects, sends a Or connect from a browser console:
websocat, a command-line WebSocket tool:StartFrame to initialize the pipeline, then sends a TTSSpeakFrame:Send a text frame
With The echo processor receives the
websocat connected, type any message and press Enter. The frame is sent as a TextFrame to the server:TextFrame, wraps the same text in an LLMTextFrame, and passes it downstream to the TTS sink, which emits the frame back to your client.In a text-only WebSocket session, you will receive the echoed text as a JSON frame. In a voice client session with a configured TTS provider, you will hear the text spoken back. The echo bot itself has no TTS provider — it only generates the
LLMTextFrame; a real TTS provider must be configured separately if you want audio output.Connecting Prometheus metrics
The/metrics endpoint is available by default even on the echo bot. Scrape it to verify the pipeline is counting frames:
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
bind: address already in use | Port 3042 is taken | Change port in the config or stop the conflicting process |
| Connection refused | Server didn’t start cleanly | Check the startup logs for a panic or config error |
| No echo response | Frame type mismatch | Ensure you are sending a TextFrame; the echo processor ignores other frame types |
plugin not found: echo | Binary built without plugin registry | Rebuild with go build ./cmd/voxray from the repo root |
| Server exits immediately | Malformed config JSON | Validate JSON with python3 -m json.tool echo-config.json |
What to do next
The echo bot confirms your Voxray installation is sound. The next step is adding real STT, LLM, and TTS providers to build a voice agent that understands and responds to natural speech.Local LLM with Ollama
Run a full voice pipeline with no cloud APIs. Use Ollama for LLM inference entirely on your own hardware.
Configuration reference
Complete list of every config key, default value, and environment variable override.