Context
Learn how to use custom context to enhance trancription accuracy.
Overview
Soniox Speech-to-Text AI lets you improve both transcription and translation accuracy by providing context with each session.
Context helps the model understand your domain, recognize important terms, and apply custom vocabulary and translation preferences.
Think of it as giving the model your world — what the conversation is about, which words are important, and how certain terms should be translated.
Context sections
You provide context through the context object that can include up to four sections,
each improving accuracy in different ways:
| Section | Type | Description |
|---|---|---|
general | array of JSON objects | Structured key-value information (domain, topic, intent, etc.) |
text | string | Longer free-form background text or related documents |
terms | array of strings | Domain-specific or uncommon words |
translation_terms | array of JSON objects | Custom translations for ambiguous terms |
All sections are optional — include only what's relevant for your use case.
General
General information provides baseline context which guides the AI model. It helps the model adapt its vocabulary to the correct domain, improving transcription and translation quality and clarifying ambiguous words.
It consists of structured key-value pairs describing the conversation domain, topic, intent, and other relevant metadata such as participant's names, organization, setting, location, etc.
Example
Text
Provide longer unstructured text that expands on general information — examples include:
- History of prior interactions with a customer.
- Reference documents.
- Background summaries.
- Meeting notes.
Example
Transcription terms
Improve transcription accuracy of important or uncommon words and phrases that you expect in the audio — such as:
- Domain or industry-specific terminology.
- Brand or product names.
- Rare, uncommon, or invented words.
Example
Translation terms
Control how specific words or phrases are translated — useful for:
- Technical terminology.
- Entity names.
- Words with ambiguous domain-specific translations.
- Idioms and figurative speech with non-literal meaning.
Example for English → Spanish translation
Tips
- Start with
generalcontext to provide broad information about the audio, such as domain, topic, or setting. This helps the model understand what the audio is about, without needing prior knowledge of exact words or phrases. - Key-value pairs in
generalcan be arbitrary, but keep them relatively short — ideally 10 or fewer. - If specific words or names are important, add them to
terms. This ensures consistent spelling and casing for difficult entities. - Use
textcontext only for large supporting documents. It is less influential thangeneralorterms. translation_termsare only valuable for translation, otherwise useterms.- If you want translated names or brands unchanged, specify them like
"St John's"→"St John's".
Example: Restaurant takeaway order
Experimental features
- Improving language detection: The primary method for ensuring consistent single language output are
language restrictions. In rare cases, when the model still transcribes
in the wrong language,
generalcontext can help by including language and instructions keys or addingtermsin the correct language.
Example for English language
- Improving speaker diarization: Providing speaker information in
generalcontext can help the model more reliably separate voices.
Example
Context size limit
- Maximum 8,000 tokens (~10,000 characters).
- Supports large blocks of text: glossaries, scripts, domain summaries.
- If you exceed the limit, the API will return an error → trim or summarize first.