5. Speech Adaptation

Overview

Speech adaptation helps to recognize specific words and phrases by biasing the recognition towards these options. Suppose that your audio data often includes the word "weather" and you want the system to recognize "weather" more often than "whether." In this case, you can use the speech adaptation to increase the likelyhood of recognizing "weather."

Speech adaptation is particularly useful for the following use cases:

  • Improving the recognition of words and phrases that occur frequently in your audio data.
  • Expanding the vocabulary of recognized words by Soniox speech recognition system.
  • Improving the recognition with noisy audio and unclear speech.

Introduction

The speech adapation is specified with SpeechContext object as part of a transcription request. It contains a list of SpeechContextEntry objects, each entry specifying a list of phrases (including words) and a boost value. The effects of specifying the SpeechContext are:

  1. The system ensures to have all the words in the SpeechContext in the vocabulary. This helps with recognizing out-of-vocabulary words.
  2. The system applies bias specified by the boost parameter to the associated phrases.

Boost Parameter

The boost parameter specifies the amount of bias the speech system assigns to a particular word or phrase. The boost parameter can be positive or negative, i.e. to make the speech model more or less likely to recognize the words or phrases. The allowed range of values for the boost parameter is from -30 to +30. If the boost parameter is not specified, no bias is assigned.

If you assign a boost value to a multi-word phrase, boost is applied to the entire phrase and only the entire phrase. For example, by assigning a boost value to the multi-word phrase "this speech recognition works", will result in more likely recognition of the phrase in its entirity. The maximum length of the phrase is 5 words.

It is recommended to test different values of the boost parameter on example speech data and choose the optimal value.

Single Word

from soniox.speech_service import SpeechContext, SpeechContextEntry

# Create the SpeechContext.
speech_context = SpeechContext(entries=[
    SpeechContextEntry(
        phrases=["weather"],
        boost=10,
    ),
    SpeechContextEntry(
        phrases=["whether"],
        boost=-2,
    )
])

# Provide SpeechContext with transcribe request. 
# All transcribe functions support SpeechContext.
result = transcribe_file_short(
    TEST_AUDIO_FLAC, client, speech_context=speech_context)

In this example we created a SpeechContext objects containing two SpeechContextEntry objects to bias the recognition of "weather" by 10 and "whether" by -2.

Multi-Word Phrases

speech_context = SpeechContext(entries=[
    SpeechContextEntry(
        phrases=["weather is hot", "weather is really hot"],
        boost=10,
    )
])

Expanding Vocabulary

speech_context = SpeechContext(entries=[
    SpeechContextEntry(
        phrases=["speechology"],
        boost=10,
    )
])

In this example, speechology is a made up word. However, we can easily enable the system to recognize this word by adding it to the SpeechContext with a sufficiently high boost value.