Skip to main content
Extensions are optional pipeline components that address common outbound call challenges: navigating automated phone menus (IVR) and detecting whether a human or voicemail answered the call. Both are designed to slot into a standard Voxray pipeline with minimal wiring.

IVR Extension

Use case

When placing outbound calls to businesses or government services, your agent will often encounter an Interactive Voice Response (IVR) system before reaching a human — “Press 1 for billing, press 2 for support.” The IVR extension automates this navigation: it listens to the IVR audio, uses an LLM to decide what to do, sends DTMF keypad tones, and hands off cleanly to normal conversation mode once a human is reached.

How it works

The extension operates in two LLM modes managed by a single IVRProcessor:
  1. Classifier mode — active at call start. The LLM listens to audio and outputs one of two mode tags: <mode>conversation</mode> (human detected, hand off) or <mode>ivr</mode> (IVR system detected, enter navigation mode).
  2. IVR navigation mode — activated when the classifier fires ivr. The processor injects the IVR system prompt into the LLM context and adjusts VAD silence thresholds (shorter stop time so the agent doesn’t wait long between IVR prompts). The LLM then outputs DTMF tags (<dtmf>1</dtmf>) and status tags (<status>completed</status>) as it navigates the menu.
The processor sits downstream of the LLM and pushes control frames upstream to effect mode changes: LLMMessagesUpdateFrame to swap the system prompt, and VADParamsUpdateFrame to tighten the silence window during IVR navigation.

Key types

IVRProcessor — the core frame processor:
FieldTypePurpose
ClassifierPromptstringSystem prompt for initial classification (conversation vs IVR).
IVRPromptstringSystem prompt injected when IVR is detected; instructs navigation.
IVRVADStopSecsfloat64Silence duration (seconds) that triggers end-of-turn during IVR navigation. Default: 2.0.
IVRStatus — reported via OnIVRStatusChanged:
ValueMeaning
IVRStatusDetectedIVR system identified; navigation mode activated.
IVRStatusCompletedAgent successfully navigated to the destination.
IVRStatusStuckAgent cannot proceed (unexpected menu, unrecognised options).
IVRStatusWaitIVR is playing hold music or an extended prompt; agent is waiting.

LLM command tags

The IVR system prompt must instruct the LLM to wrap commands in XML tags. The processor pattern-matches these tags in the streaming LLM output:
TagExampleEffect
<dtmf><dtmf>1</dtmf>Emits an OutputDTMFUrgentFrame with the keypad entry. Valid: 0–9, *, #.
<status><status>completed</status>Fires OnIVRStatusChanged with the corresponding IVRStatus.
<mode><mode>conversation</mode>Fires OnConversationDetected with conversation history and exits IVR mode.
Text outside the XML tags is forwarded downstream as normal AggregatedTextFrame content. The LLM can still speak to the IVR system between DTMF presses — useful for spoken menu options that require voice input rather than keypad input.

Pipeline position

Transport → Turn → STT → IVRNavigator → TTS → Sink
IVRNavigator wraps a [LLM, IVRProcessor] chain into a single pipeline node. The internal LLM context is automatically wired to IVRProcessor.SetSavedMessages so that when conversation mode is triggered, the full conversation history is available in the callback.

Setup

1

Import the extension package

import (
    "voxray-go/pkg/extensions/ivr"
    "voxray-go/pkg/pipeline"
)
2

Create the navigator

llm := services.NewLLMService(cfg) // any LLMService

navigator := ivr.NewIVRNavigator(llm, "Reach the billing department", 2.0)
The second argument is the navigation goal injected into the IVR system prompt. The third is IVRVADStopSecs.
3

Register callbacks

navigator.OnConversationDetected(func(history []map[string]any) {
    // A human answered. Hand off: swap to your main agent pipeline
    // or push LLMMessagesUpdateFrame with the main system prompt.
    log.Println("IVR: human detected, conversation history:", len(history), "turns")
})

navigator.OnIVRStatusChanged(func(status ivr.IVRStatus) {
    switch status {
    case ivr.IVRStatusCompleted:
        log.Println("IVR: navigation complete")
    case ivr.IVRStatusStuck:
        log.Println("IVR: agent is stuck, may need fallback")
    case ivr.IVRStatusWait:
        log.Println("IVR: waiting on hold")
    }
})
4

Add to pipeline

pl := pipeline.New()
pl.Add(turnProc)
pl.Add(sttProc)
pl.Add(navigator.Processor) // single node: [LLM → IVRProcessor]
pl.Add(ttsProc)
pl.Add(sink)

Frames emitted

FrameDirectionPurpose
OutputDTMFUrgentFrameDownstreamCarries a single DTMF keypad entry to the transport layer.
VADParamsUpdateFrameUpstreamUpdates stop_secs on the Turn processor when IVR mode activates.
LLMMessagesUpdateFrameUpstreamReplaces the LLM system prompt for classifier → IVR or IVR → conversation transitions.
AggregatedTextFrameDownstreamNon-tag text from the LLM forwarded to TTS.

Voicemail Detection Extension

Use case

When placing outbound calls, you need to know whether a human or an answering machine picked up before your agent starts speaking. Delivering a full sales pitch to voicemail wastes the recording slot and sounds wrong. The voicemail detection extension holds the TTS output in a gate, classifies the audio using a fast LLM, and then either releases the buffered speech (human) or triggers a tailored voicemail response.

How it works

The detector inserts a parallel pipeline branch after STT. Both branches receive the transcribed audio simultaneously:
  • Conversation gate branch — passes frames through normally, but closes if voicemail is detected, preventing the main LLM from wasting compute on a recording.
  • Classifier branch — runs a dedicated fast LLM with a classification prompt. Its output is monitored by ClassificationProcessor, which fires one of two notifiers when it sees CONVERSATION or VOICEMAIL in the LLM response.
Downstream, a TTS gate buffers all TTS audio frames. When the classification notifier fires:
  • CONVERSATION → the gate opens and releases all buffered TTS frames. The caller hears the agent’s greeting with minimal delay.
  • VOICEMAIL → the gate discards buffered frames. The OnVoicemailDetected callback fires so your code can push a custom TTS message (e.g. “Please call us back at 555-0100”).

Components

ComponentPackageRole
VoicemailDetectorpkg/extensions/voicemailBuilds the parallel pipeline and TTS gate; exposes Detector() and Gate().
ClassificationProcessorpkg/extensions/voicemailAggregates LLM output tokens, fires notifiers on CONVERSATION/VOICEMAIL.
NotifierGatepkg/extensions/voicemailOne-shot gate that opens or closes based on a notifier signal.
TTSGatepkg/extensions/voicemailBuffers TTS frames until a decision is made; releases or discards on signal.

Pipeline position

Transport → Turn → STT → [Detector] → ContextAggregator → LLM → TTS → [Gate] → Transport
det.Detector() is the parallel pipeline node inserted after STT. det.Gate() is the TTS buffer inserted after TTS and before the transport sink.

Setup

1

Import the extension package

import (
    "voxray-go/pkg/extensions/voicemail"
    "voxray-go/pkg/pipeline"
)
2

Create the detector

// A fast, cheap LLM is ideal for classification — latency matters here.
classifierLLM := services.NewLLMService(classifierCfg)

det := voicemail.NewVoicemailDetector(classifierLLM, 2.0)
// 2.0 = seconds of audio silence before the classifier runs
To customise the classification prompt:
customPrompt := "Listen to the following audio transcript. " +
    "Respond with exactly one word: CONVERSATION if a human answered, " +
    "VOICEMAIL if an answering machine answered. " +
    voicemail.ClassifierResponseInstruction

det := voicemail.NewVoicemailDetectorWithPrompt(classifierLLM, 2.0, customPrompt)
3

Register callbacks

det.OnConversationDetected(func() {
    log.Println("voicemail: human answered, conversation starting")
    // No action required — the TTS gate opens automatically.
})

det.OnVoicemailDetected(func() {
    log.Println("voicemail: answering machine detected")
    // Push a voicemail message or hang up.
    pipeline.Push(frames.NewTTSSpeakFrame("Hi, this is Alex from Acme. Please call us back at 555-0100. Goodbye."))
})
4

Add to pipeline

pl := pipeline.New()
pl.Add(transportSource)
pl.Add(turnProc)
pl.Add(sttProc)
pl.Add(det.Detector())    // parallel: conversation gate + classifier LLM
pl.Add(contextAggregator)
pl.Add(mainLLM)
pl.Add(ttsProc)
pl.Add(det.Gate())        // TTS gate: buffers until classification fires
pl.Add(transportSink)

Classifier prompt requirements

The classifier LLM must respond with exactly CONVERSATION or VOICEMAIL as a complete response. Any other output is ignored and classification stalls until a valid token appears.
Do not instruct the classifier to explain its reasoning or hedge its answer. A response like “Based on the audio, I believe this is a CONVERSATION” will not match. The prompt must produce a bare single-word response.
Append voicemail.ClassifierResponseInstruction to any custom prompt — it contains the exact constraint text Voxray expects.

Integration with IVR

For outbound campaigns that encounter both voicemail screening and IVR systems, chain the two extensions: voicemail detection runs first, and on OnConversationDetected you start the IVR navigator:
det.OnConversationDetected(func() {
    // Human (or live transfer) answered. Begin IVR navigation.
    navigator := ivr.NewIVRNavigator(llm, "Reach the billing department", 2.0)
    navigator.OnConversationDetected(func(history []map[string]any) {
        // IVR cleared; real human on the line. Hand off to main agent.
    })
    navigator.OnIVRStatusChanged(func(status ivr.IVRStatus) {
        // Track IVR navigation progress.
    })
    // Insert navigator into the live pipeline or start a new sub-pipeline.
})

Frames reference

FrameExtensionDirectionPurpose
OutputDTMFUrgentFrameIVRDownstreamEmits a DTMF tone (0–9, *, #) to the telephony transport.
VADParamsUpdateFrameIVRUpstreamAdjusts Turn processor silence thresholds for IVR pacing.
LLMMessagesUpdateFrameIVRUpstreamReplaces LLM context and optionally triggers an immediate LLM run.
AggregatedTextFrameIVRDownstreamNon-tag LLM text forwarded to TTS after pattern aggregation.
LLMFullResponseStartFrameBothDownstreamMarks start of a full LLM response for aggregation by classifier.
LLMFullResponseEndFrameBothDownstreamMarks end of full response; flushes remaining aggregated text.

Event hooks reference

ExtensionHookSignatureWhen it fires
IVROnConversationDetectedfunc(history []map[string]any)Classifier output contains conversation; IVR mode exits.
IVROnIVRStatusChangedfunc(status IVRStatus)Any IVR status tag seen: detected, completed, stuck, wait.
VoicemailOnConversationDetectedfunc()Classifier output is CONVERSATION; TTS gate opens.
VoicemailOnVoicemailDetectedfunc()Classifier output is VOICEMAIL; TTS gate discards buffer.