Custom Vocabulary

Custom vocabulary improve speech recognition accuracy by biasing the speech engine towards specific words and phrases that you specify.

Use custom vocabulary to achieve higher accuracy in recognizing:

  • Proper nouns: product names, people's names (e.g. "Soniox", "Klemen")
  • Domain-specific words: medical terms, legal jargon (e.g. "zestoretic", "ombudsman")
  • Ambiguous words: same-sounding words (e.g. "Carrie" vs "Kerry")

Introduction

Custom vocabulary is specified with the SpeechContext object as part of a transcription request. It contains a list of SpeechContextEntry objects, each entry specifying a list of phrases (which can be words or phrases) and a boost value. The effects of specifying the SpeechContext are:

  1. Ensures all the specified words and phrases are in the system's vocabulary. This helps with recognizing out-of-vocabulary words.
  2. The system applies the bias specified by the boost parameter to the associated words or phrases.

Boost Parameter

The boost parameter specifies the amount of bias the speech system assigns to a particular word or phrase. The boost parameter can be positive or negative, i.e. to make the speech model more or less likely to recognize the words or phrases. The allowed range of values for the boost parameter is between -30 to +30. If the boost parameter is not specified, no bias is assigned.

If you assign a boost value to a multi-word phrase, boost is applied to the phase in its entirety. For example, by assigning a boost value to the multi-word phrase "this speech recognition works", will result in more likely recognition of this entire phrase and not the individual words within the phrase. The maximum number of words permitted in a phrase is 5.

We recommend setting boost to 10 to start with and experimenting by adjusting the value for optimal results.

Single Word

In this example, we specify and boost acetylcarnitine and zestoretic words, which should result in significanly more accurate recognition of the two words.

customization_single_word.py

# Create SpeechContext.
speech_context = SpeechContext(
    entries=[
        SpeechContextEntry(
            phrases=["acetylcarnitine"],
            boost=20,
        ),
        SpeechContextEntry(
            phrases=["zestoretic"],
            boost=20,
        )
    ]
)

# Pass SpeechContext with transcribe request.
result = transcribe_file_short(
    "../test_data/acetylcarnitine_zestoretic.flac", 
    client, 
    speech_context=speech_context
)

Run

python3 customization_single_word.py

Output

Acetylcarnitine is a molecule and zestoretic is a medication .

customization_single_word.js

// Create SpeechContext.
const speech_context = {
    entries: [
        {
            phrases: ["acetylcarnitine"],
            boost: 20,
        },
        {
            phrases: ["zestoretic"],
            boost: 20,
        }
    ]
};

// Pass SpeechContext with transcribe request.
const result = await speechClient.transcribeFileShort(
    "../test_data/acetylcarnitine_zestoretic.flac",
    { speech_context: speech_context }
);

Run

node customization_single_word.js

Output

Acetylcarnitine is a molecule and zestoretic is a medication .

Multi-Word Phrases

When specifying phrases, you can use custom vocabulary to resolve ambiguity in recognizing accoustically similar sounding words.

customization_multi_word.py

# Create SpeechContext.
speech_context = SpeechContext(
    entries=[
        SpeechContextEntry(
            phrases=["carrie underwood", "kerry washington"],
            boost=10,
        )
    ]
)

# Pass SpeechContext with transcribe request.
result = transcribe_file_short(
    "../test_data/carrie_underwood_kerry_washington.flac", 
    client, 
    speech_context=speech_context
)

Run

python3 customization_multi_word.py

Output

Carry Underwood and Kerry Washington .

customization_multi_word.js

// Create SpeechContext.
const speech_context = {
    entries: [
        {
            phrases: ["carrie underwood", "kerry washington"],
            boost: 10,
        }
    ]
};

// Pass SpeechContext with transcribe request.
const result = await speechClient.transcribeFileShort(
    "../test_data/carrie_underwood_kerry_washington.flac",
    { speech_context: speech_context }
);

Run

node customization_multi_word.js

Output

Carry Underwood and Kerry Washington .

Expanding Vocabulary

Recognizing new words or out-of-vocabulary words is automatically supported with custom vocabulary. In this example, speechology is a made-up word. However, we can easily enable the system to recognize this word by adding it to the SpeechContext with a sufficiently high boost value.

customization_new_word.py

# Create SpeechContext.
speech_context = SpeechContext(
    entries=[
        SpeechContextEntry(
            phrases=["speechology"],
            boost=10,
        )
    ]
)

# Pass SpeechContext with transcribe request.
result = transcribe_file_short(
    "../test_data/speechology.flac", client, speech_context=speech_context
)

Run

python3 customization_new_word.py

Output

Speechology is a made up word

customization_new_word.js

// Create SpeechContext.
const speech_context = {
    entries: [
        {
            phrases: ["speechology"],
            boost: 10,
        }
    ]
};

// Pass SpeechContext with transcribe request.
const result = await speechClient.transcribeFileShort(
    "../test_data/speechology.flac",
    { speech_context: speech_context }
);

Run

node customization_new_word.js

Output

Speechology is a made up word

Manage Vocabularies

To learn how to reuse an existing vocabulary and create multiple vocabularies, see Manage Vocabularies.

cookie Change your cookie preferences