IVR Domain Model
We offer an IVR speech model for applications that require capturing user data via voice. The IVR speech model recognizes and formats letters, digits, numbers, names, email addresses, phone numbers and zip codes.
Endpoint Detection
The IVR speech model can also detect an end of utterance in real-time and low-latency setting.
This enables you to build interactive voice assistants that can quickly respond back to your users.
You can enable endpoint detection by setting enable_endpoint_detection
to true
.
When an endpoint is detected, the response will contain a word with Word.text
equal to <end>
.
In streaming mode, the responses consist of final words followed by non-final words (see Final vs Non-Final Words for more info). When the endpoint is detected, all non-final words become final words. This simplifies building voice applications, since you can apply your application logic only on final words and ignore non-final words.
Example
In the example below, we use precision_ivr
model and enable_endpoint_detection
in a streaming mode.
We wait for the model to detect the endpoint and then print out all the final words.
We simulate the stream by reading a file in small chunks.
from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient, set_api_key
set_api_key("<YOUR-API-KEY>")
def iter_audio() -> Iterable[bytes]:
with open("../test_data/test_audio_ivr.flac", "rb") as fh:
while True:
audio = fh.read(1024)
if len(audio) == 0:
break
yield audio
def main():
with SpeechClient() as client:
words = []
for result in transcribe_stream(
iter_audio(),
client,
model="precision_ivr",
enable_endpoint_detection=True,
include_nonfinal=False,
):
for word in result.words:
words.append(word)
if word.text == "<end>":
print(" ".join(w.text for w in words))
words = []
if __name__ == "__main__":
main()
Run
python3 configure_model_ivr.py
Output
Audio: what is your name
Words: What is your name ? <end>
Audio: i'm mario chavez
Words: I'm a Mario Chavez . <end>
Audio: what is your zip code
Words: What is your zip code ? <end>
Audio: nine four four o four
Words: 94404 . <end>
Audio: what is your email address
Words: What is your email address ? <end>
Audio: my email is m a r i o dot c h a v e z at gmail dot com
Words: My email is mario.chavez@gmail.com . <end>
const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");
// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();
(async function () {
const words = [];
const onDataHandler = async (result) => {
for (const word of result.words) {
words.push(word);
if (word.text == "<end>") {
console.log(words.map((word) => word.text).join(" "));
words.length = 0;
}
}
};
const onEndHandler = (error) => {
if (error) {
console.log("Error!", error);
}
};
const stream = speechClient.transcribeStream(
{
model: "precision_ivr",
enable_endpoint_detection: true,
include_nonfinal: false,
},
onDataHandler,
onEndHandler
);
const CHUNK_SIZE = 1024;
const readable = fs.createReadStream("../test_data/test_audio_ivr.flac", {
highWaterMark: CHUNK_SIZE,
});
for await (const chunk of readable) {
await stream.writeAsync(chunk);
}
stream.end();
})();
Run
node configure_model_ivr.js
Output
Audio: what is your name
Words: What is your name ? <end>
Audio: i'm a mario chavez
Words: I'm a Mario Chavez . <end>
Audio: what is your zip code
Words: What is your zip code ? <end>
Audio: nine four four o four
Words: 94404 . <end>
Audio: what is your email address
Words: What is your email address ? <end>
Audio: my email is m a r i o dot c h a v e z at gmail dot com
Words: My email is mario.chavez@gmail.com . <end>