Customization#
Customization improves speech recognition accuracy by biasing the speech engine towards specific words and phrases that you specify.
Use customization to achieve higher accuracy in recognizing:
Proper nouns: product names, people’s names (e.g. “Soniox”, “Klemen”)
Domain-specific words: medical terms, legal jargon (e.g. “zestoretic”, “ombudsman”)
Ambiguous words: same-sounding words (e.g. “Carrie” vs “Kerry”)
Introduction#
Customization is used by specifying the speech_context
field in TranscriptionConfig
, which is a SpeechContext
object.
It contains a list of SpeechContextEntry
objects, each entry specifying a list of phrases
(which can be words or phrases)
and a boost
value.
Boost Parameter#
The boost parameter specifies the amount of bias the speech system assigns to a particular word or phrase. The boost value can be positive or negative, to make the speech model more or less likely to recognize the word or phrase. The supported range of boost values is -50 to 50; values outside this range will be clipped automatically.
To better recognize a word or phrase, we recommend setting the boost value to 15 to start with and increasing the value if needed. The boost value might need to be different for the default and low-latency model.
If boost
is 0 (which is the default if not specified), the speech context entry will have no effect.
On the other hand, an excessively high or low (negative) boost value may adversely affect speech recognition accuracy.
If you assign a boost value to a multi-word phrase, the boost applies to the phrase in its entirety. For example, assigning a boost value to the phrase “this speech recognition works” will result in more likely recognition of this entire phrase and not the individual words within the phrase.
Single Word#
In this example, we specify and boost acetylcarnitine and Zestoretic words, which should result in significanly more accurate recognition of the two words.
# Create SpeechContext.
speech_context = SpeechContext(
entries=[
SpeechContextEntry(
phrases=["acetylcarnitine"],
boost=15,
),
SpeechContextEntry(
phrases=["Zestoretic"],
boost=15,
)
]
)
# Pass SpeechContext with transcribe request.
result = transcribe_file_short(
"../test_data/acetylcarnitine_zestoretic.flac",
client,
model="en_v2",
speech_context=speech_context,
)
Run
python3 customization_single_word.py
Output
Acetylcarnitine is a molecule and Zestoretic is a medication.
// Create SpeechContext.
const speech_context = {
entries: [
{
phrases: ["acetylcarnitine"],
boost: 15,
},
{
phrases: ["Zestoretic"],
boost: 15,
}
]
};
// Pass SpeechContext with transcribe request.
const result = await speechClient.transcribeFileShort(
"../test_data/acetylcarnitine_zestoretic.flac",
{
model: "en_v2",
speech_context: speech_context,
}
);
Run
node customization_single_word.js
Output
Acetylcarnitine is a molecule and Zestoretic is a medication.
Multi-Word Phrases#
When specifying phrases, you can use customization to resolve ambiguity in recognizing accoustically similar sounding words.
# Create SpeechContext.
speech_context = SpeechContext(
entries=[
SpeechContextEntry(
phrases=["Carrie Underwood", "Kerry Washington"],
boost=20,
)
]
)
# Pass SpeechContext with transcribe request.
result = transcribe_file_short(
"../test_data/carrie_underwood_kerry_washington.flac",
client,
model="en_v2",
speech_context=speech_context
)
Run
python3 customization_multi_word.py
Output
Carrie Underwood and Kerry Washington.
// Create SpeechContext.
const speech_context = {
entries: [
{
phrases: ["Carrie Underwood", "Kerry Washington"],
boost: 20,
}
]
};
// Pass SpeechContext with transcribe request.
const result = await speechClient.transcribeFileShort(
"../test_data/carrie_underwood_kerry_washington.flac",
{
model: "en_v2",
speech_context: speech_context
}
);
Run
node customization_multi_word.js
Output
Carrie Underwood and Kerry Washington.
Expanding Vocabulary#
Recognizing new words or out-of-vocabulary words is automatically supported. In this example, speechology is a made-up word.
However, we can easily enable the system to recognize this word by adding it to the SpeechContext
with a sufficiently high boost
value.
# Create SpeechContext.
speech_context = SpeechContext(
entries=[
SpeechContextEntry(
phrases=["speechology"],
boost=15,
)
]
)
# Pass SpeechContext with transcribe request.
result = transcribe_file_short(
"../test_data/speechology.flac",
client,
model="en_v2",
speech_context=speech_context,
)
Run
python3 customization_new_word.py
Output
Speechology is a made-up word.
// Create SpeechContext.
const speech_context = {
entries: [
{
phrases: ["speechology"],
boost: 15,
}
]
};
// Pass SpeechContext with transcribe request.
const result = await speechClient.transcribeFileShort(
"../test_data/speechology.flac",
{
model: "en_v2",
speech_context: speech_context,
}
);
Run
node customization_new_word.js
Output
Speechology is a made-up word.