IVR Domain Model#

We offer an IVR speech model for applications that require capturing user data via voice. The IVR speech model recognizes and formats letters, digits, numbers, names, email addresses, phone numbers and zip codes.

Endpoint Detection#

The IVR speech model can also detect the end of an utterance in real-time and low-latency setting. This enables you to build interactive voice assistants that can quickly respond to your users. You can enable endpoint detection by setting the enable_endpoint_detection TranscriptionConfig field to true. When an endpoint is detected, the response will contain a word with Word.text equal to <end>.

In streaming mode, each response consists of final words followed by non-final words (see Final vs Non-Final Words). When an endpoint is detected, all words up to and including <end> are returned as final. This simplifies building voice applications, since you can apply your application logic only on final words and ignore non-final words.

Example#

In this example, we use the en_precision_ivr model and enable_endpoint_detection. We wait for the model to detect an endpoint and then print out all final words. We simulate the stream by reading a file in small chunks.

This example builds on the one in Transcribe Streams, so it is recommended to first get familiar with that one.

configure_model_ivr.py

from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient


def iter_audio() -> Iterable[bytes]:
    with open("../test_data/test_audio_ivr.flac", "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio


# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
    with SpeechClient() as client:
        words = []
        for result in transcribe_stream(
            iter_audio(),
            client,
            model="en_precision_ivr",
            enable_endpoint_detection=True,
            include_nonfinal=False,
        ):
            for word in result.words:
                words.append(word)
                if word.text == "<end>":
                    print(" ".join(w.text for w in words))
                    words = []


if __name__ == "__main__":
    main()

Run

python3 configure_model_ivr.py

Output

What is your name ? <end>
I'm a Mario Chavez . <end>
What is your zip code ? <end>
94404 . <end>
What is your email address ? <end>
My email is mario.chavez@gmail.com . <end>