IVR Domain Model#
We offer an IVR speech model for applications that require capturing user data via voice. The IVR speech model recognizes and formats letters, digits, numbers, names, email addresses, phone numbers and zip codes.
Endpoint Detection#
The IVR speech model can also detect the end of an utterance in real-time and low-latency setting.
This enables you to build interactive voice assistants that can quickly respond to your users.
You can enable endpoint detection by setting the enable_endpoint_detection
TranscriptionConfig
field to true
. When an endpoint is detected, the response will
contain a word with Word.text
equal to <end>
.
In streaming mode, each response consists of final words followed by non-final words (see Final vs Non-Final Words). When an endpoint is detected, all words up to and including <end> are returned as final. This simplifies building voice applications, since you can apply your application logic only on final words and ignore non-final words.
Example#
In this example, we use the en_precision_ivr
model and enable_endpoint_detection
.
We wait for the model to detect an endpoint and then print out all final words.
We simulate the stream by reading a file in small chunks.
This example builds on the one in Transcribe Streams, so it is recommended to first get familiar with that one.
from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient
def iter_audio() -> Iterable[bytes]:
with open("../test_data/test_audio_ivr.flac", "rb") as fh:
while True:
audio = fh.read(1024)
if len(audio) == 0:
break
yield audio
# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
with SpeechClient() as client:
words = []
for result in transcribe_stream(
iter_audio(),
client,
model="en_precision_ivr",
enable_endpoint_detection=True,
include_nonfinal=False,
):
for word in result.words:
words.append(word)
if word.text == "<end>":
print(" ".join(w.text for w in words))
words = []
if __name__ == "__main__":
main()
Run
python3 configure_model_ivr.py
Output
What is your name ? <end>
I'm a Mario Chavez . <end>
What is your zip code ? <end>
94404 . <end>
What is your email address ? <end>
My email is mario.chavez@gmail.com . <end>
const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");
// Do not forget to set your API key in the SONIOX_API_KEY environment variable.
const speechClient = new SpeechClient();
(async function () {
const words = [];
const onDataHandler = async (result) => {
for (const word of result.words) {
words.push(word);
if (word.text == "<end>") {
console.log(words.map((word) => word.text).join(" "));
words.length = 0;
}
}
};
const onEndHandler = (error) => {
if (error) {
console.log(`Transcription error: ${error}`);
}
};
const stream = speechClient.transcribeStream(
{
model: "en_precision_ivr",
enable_endpoint_detection: true,
include_nonfinal: false,
},
onDataHandler,
onEndHandler
);
const CHUNK_SIZE = 1024;
const readable = fs.createReadStream("../test_data/test_audio_ivr.flac", {
highWaterMark: CHUNK_SIZE,
});
for await (const chunk of readable) {
await stream.writeAsync(chunk);
}
stream.end();
})();
Run
node configure_model_ivr.js
Output
What is your name ? <end>
I'm a Mario Chavez . <end>
What is your zip code ? <end>
94404 . <end>
What is your email address ? <end>
My email is mario.chavez@gmail.com . <end>