IVR Domain Model

We offer an IVR speech model for applications that require capturing user data via voice. The IVR speech model recognizes and formats letters, digits, numbers, names, email addresses, phone numbers and zip codes.

The IVR speech model is based on the precision model to achieve the highest level of accuracy.

Endpoint Detection

The IVR speech model can also detect an end of utterance in real-time and low-latency setting. This enables you to build interactive voice assistants that can quickly respond back to your users. You can enable endpoint detection by setting enable_endpoint_detection to true. When an endpoint is detected, the response will contain a word with Word.text equal to <end>.

In streaming mode, the responses consist of final words followed by non-final words (see Final vs Non-Final Words for more info). When the endpoint is detected, all non-final words become final words. This simplifies building voice applications, since you can apply your application logic only on final words and ignore non-final words.

Example

In the example below, we use precision_ivr model and enable_endpoint_detection in a streaming mode. We wait for the model to detect the endpoint and then print out all the final words. We simulate the stream by reading a file in small chunks.

configure_model_ivr.py

from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient, set_api_key

set_api_key("<YOUR-API-KEY>")


def iter_audio() -> Iterable[bytes]:
    with open("../test_data/test_audio_ivr.flac", "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio


def main():
    with SpeechClient() as client:
        words = []
        for result in transcribe_stream(
            iter_audio(),
            client,
            model="precision_ivr",
            enable_endpoint_detection=True,
            include_nonfinal=False,
        ):
            for word in result.words:
                words.append(word)
                if word.text == "<end>":
                    print(" ".join(w.text for w in words))
                    words = []


if __name__ == "__main__":
    main()    

Run

python3 configure_model_ivr.py

Output

Audio: what is your name
Words: What is your name ? <end> 

Audio: i'm mario chavez 
Words: I'm a Mario Chavez . <end> 

Audio: what is your zip code
Words: What is your zip code ? <end> 

Audio: nine four four o four
Words: 94404 . <end> 

Audio: what is your email address
Words: What is your email address ? <end> 

Audio: my email is m a r i o dot c h a v e z at gmail dot com
Words: My email is mario.chavez@gmail.com . <end>

configure_model_ivr.js

const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");

// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();

(async function () {
    const words = [];

    const onDataHandler = async (result) => {
        for (const word of result.words) {
            words.push(word);
            if (word.text == "<end>") {
                console.log(words.map((word) => word.text).join(" "));
                words.length = 0;
            }
        }
    };

    const onEndHandler = (error) => {
        if (error) {
            console.log("Error!", error);
        }
    };

    const stream = speechClient.transcribeStream(
        {
            model: "precision_ivr",
            enable_endpoint_detection: true,
            include_nonfinal: false,
        },
        onDataHandler,
        onEndHandler
    );

    const CHUNK_SIZE = 1024;
    const readable = fs.createReadStream("../test_data/test_audio_ivr.flac", {
        highWaterMark: CHUNK_SIZE,
    });

    for await (const chunk of readable) {
        await stream.writeAsync(chunk);
    }

    stream.end();
})();

Run

node configure_model_ivr.js

Output

Audio: what is your name
Words: What is your name ? <end> 

Audio: i'm a mario chavez 
Words: I'm a Mario Chavez . <end> 

Audio: what is your zip code
Words: What is your zip code ? <end> 

Audio: nine four four o four
Words: 94404 . <end> 

Audio: what is your email address
Words: What is your email address ? <end> 

Audio: my email is m a r i o dot c h a v e z at gmail dot com
Words: My email is mario.chavez@gmail.com . <end>

cookie Change your cookie preferences