Custom vocabulary and context biasing

Improving recognition of expected words and languages

Updated June 29, 2026

A recognizer has heard "nice" a million times and your company name zero times, so when someone says "Soniox," it writes "so nice" or "Sonic's": it chooses common real words over a name it has never encountered. The same happens to a cardiologist's "amiodarone," a logistics team's depot codes, and the surname of every third person on a customer call. The model is doing what it was trained to do: prefer the words it has already seen.

Context biasing changes what the recognizer expects without changing the model itself.

Preference for common words

A speech recognizer bets on the most probable transcription given the sound. That probability blends two things: how well the words match the audio, and how likely those words are as language. The second term is the language prior, and it lets recognition hold up against noise and accents. It also buries rare words. A made-up product name and a common phrase can be acoustically close, and the prior tips the scale toward the phrase every time.

The words that suffer are the ones that matter most in a deployment: personal and place names, brand and product names, medical and legal and technical jargon, and strings that are not really words at all, such as part numbers and reference codes (their own problem, covered in alphanumerics). All are underrepresented in any general training set, so they sit low in the prior and rarely win against a common-word alternative.

How context biasing changes decoding

Biasing nudges that language prior at runtime. Instead of retraining, you hand the recognizer a small amount of context, and it temporarily raises the probability of the words and phrases you named. A close acoustic match to one of those terms can then win against the everyday word it used to lose to.

The mechanism goes by several names: shallow fusion, on-the-fly rescoring, keyword boosting. All work the same way: the terms you supply get a thumb on the scale during decoding. This is a bias, not a filter. A boosted word is made more likely, not mandatory, so audio that clearly is not your term still transcribes normally.

Types of context biasing

Context biasing comes as three related controls, from weakest to strongest.

The first is a context or vocabulary list: the terms you expect, optionally with weights. This is the everyday tool. A meeting bot loads the attendees' names. A pharmacy line loads a drug formulary. A support line loads product names and error codes. Supplying the term, and sometimes a hint of how it is pronounced or used, flips it from a reliable miss to a reliable hit.

The second is language hints: a soft steer toward the languages you expect, without forbidding others. On multilingual audio this sharpens language identification and keeps the recognizer from drifting into the wrong language on ambiguous sounds, while it still copes if someone says something unexpected.

The third is language restrictions: a hard constraint that forbids any language outside a named set. This is the strongest and most dangerous lever, because anything you leave off the list becomes impossible to transcribe correctly. Reach for it only when you are certain of the possible languages and a stray one would cause real harm, as with a regulated form that accepts exactly two languages.

LeverWhat it doesWhen to useRisk if misused
Vocabulary / context listRaises odds of named termsNames, jargon, codesOver-triggers if weighted too high
Language hintsSoftly favors expected languagesKnown but not guaranteed languagesLow; it only biases
Language restrictionsForbids unlisted languagesCertain, closed language setOff-list language becomes untranscribable
Three levers, from softest to hardest. Each constrains the recognizer more, and risks more if you are wrong about what will be said.

Avoiding excessive bias

The discipline of biasing is restraint. A short, accurate context list of the terms that actually appear beats a giant dump of every word in your industry, because every term you boost is a term the recognizer is now more willing to hear. Load the fifty product names that come up on your calls, not the ten thousand SKUs in the catalog. The catalog dump feels thorough but does real damage: the recognizer is now primed to mishear ordinary speech as part numbers nobody said.

Weights deserve the same caution. If a term keeps losing close calls, raise it a little. If a boosted term starts appearing where it should not, lower it. Tune against real transcripts of real audio, not the demo sentence, the same lesson that runs through how to benchmark speech-to-text yourself. Context biasing is one of the most powerful knobs in a deployment and one of the easiest to overuse, because the failures it fixes are so predictable that it tempts you to overcorrect.

Common questions

What is the difference between custom vocabulary and context biasing?

Custom vocabulary is the most familiar form of context biasing: a list of names, terms, or codes the model should be ready to hear. Context biasing is the broader practice, with three levers of increasing strength: vocabulary lists, language hints, and language restrictions, all of which nudge the language prior at runtime instead of retraining the model.

Will adding a word to my vocabulary list guarantee it is transcribed?

No, by design, and you do not want it to. Biasing is a bias, not a filter: a boosted word is made more likely, not mandatory, so it still has to match the audio to win. Push the weight high enough to guarantee a term and the recognizer starts hearing it where it is not, so that "so anxious" comes out as your brand name.

When should I use language restrictions instead of hints?

Hints by default, restrictions almost never. A restriction is the strongest and most dangerous lever, because anything you leave off the list becomes impossible to transcribe correctly. Reach for it only when the language set is certain and closed and a stray language would cause real harm, such as a regulated form that accepts exactly two.

Does context biasing help with phone numbers and IDs?

Barely. Biasing works on words and named phrases, but the difficulty in phone numbers, account IDs, and emails is formatting digits and symbols, not recognizing rare words, so a vocabulary list has little to grip. That is a distinct problem, covered in alphanumerics in speech recognition.

References

  1. Gong, X., Lv, A., Wang, Z., Zhu, H., & Qian, Y. (2025). BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM. arXiv preprint arXiv:2505.19179.