Soniox
Shared concepts

Context

Learn how to use custom context to enhance trancription accuracy.

Overview

Soniox Speech-to-Text AI lets you improve both transcription and translation accuracy by providing context with each session.

Context helps the model understand your domain, recognize important terms, and apply custom vocabulary and translation preferences.

Think of it as giving the model your world — what the conversation is about, which words are important, and how certain terms should be translated.


Context sections

You provide context through the context object that can include up to four sections, each improving accuracy in different ways:

SectionTypeDescription
generalarray of JSON objectsStructured key-value information (domain, topic, intent, etc.)
textstringLonger free-form background text or related documents
termsarray of stringsDomain-specific or uncommon words
translation_termsarray of JSON objectsCustom translations for ambiguous terms

All sections are optional — include only what's relevant for your use case.

General

General information provides baseline context which guides the AI model. It helps the model adapt its vocabulary to the correct domain, improving transcription and translation quality and clarifying ambiguous words.

It consists of structured key-value pairs describing the conversation domain, topic, intent, and other relevant metadata such as participant's names, organization, setting, location, etc.

Example

{
  "context": {
    "general": [
      { "key": "domain",       "value": "Healthcare" },
      { "key": "topic",        "value": "Diabetes management consultation" },
      { "key": "doctor",       "value": "Dr. Martha Smith" },
      { "key": "patient",      "value": "Mr. David Miller" },
      { "key": "organization", "value": "St John's Hospital" }
    ]
  }
}

Text

Provide longer unstructured text that expands on general information — examples include:

  • History of prior interactions with a customer.
  • Reference documents.
  • Background summaries.
  • Meeting notes.

Example

{
  "context": {
    "text": "The customer, Maria Lopez, contacted BrightWay Insurance to update
    her auto policy after purchasing a new vehicle. Agent Daniel Kim reviewed the
    changes, explained the premium adjustment, and offered a bundling discount.
    Maria agreed to update the policy and scheduled a follow-up to consider
    additional options."
  }
}

Transcription terms

Improve transcription accuracy of important or uncommon words and phrases that you expect in the audio — such as:

  • Domain or industry-specific terminology.
  • Brand or product names.
  • Rare, uncommon, or invented words.

Example

{
  "context": {
    "terms": [
      "Celebrex", 
      "Zyrtec", 
      "Xanax",
      "Prilosec", 
      "Amoxicillin Clavulanate Potassium"
    ]
  }
}

Translation terms

Control how specific words or phrases are translated — useful for:

  • Technical terminology.
  • Entity names.
  • Words with ambiguous domain-specific translations.
  • Idioms and figurative speech with non-literal meaning.

Example for English → Spanish translation

{
  "context": {
    "translation_terms": [
      { "source": "Mr. Smith", "target": "Sr. Smith" },
      { "source": "MRI",       "target": "RM" },
      { "source": "St John's", "target": "St John's" },
      { "source": "stroke",    "target": "ictus" }
    ]
  }
}

Tips

  • Start with general context to provide broad information about the audio, such as domain, topic, or setting. This helps the model understand what the audio is about, without needing prior knowledge of exact words or phrases.
  • Key-value pairs in general can be arbitrary, but keep them relatively short — ideally 10 or fewer.
  • If specific words or names are important, add them to terms. This ensures consistent spelling and casing for difficult entities.
  • Use text context only for large supporting documents. It is less influential than general or terms.
  • translation_terms are only valuable for translation, otherwise use terms.
  • If you want translated names or brands unchanged, specify them like "St John's""St John's".

Example: Restaurant takeaway order

{
  "context": {
    "general": [
      { "key": "restaurant", "value": "Spice India" },
      { "key": "location",   "value": "London, UK" },
      { "key": "setting",    "value": "Phone ordering" },
      { "key": "topic",      "value": "Customer placing a takeaway order" }
    ],
    "terms": [
      "butter chicken",
      "paneer tikka",
      "naan",
      "biryani",
      "tandoori chicken",
      "masala dosa",
      "samosa",
      "mango lassi"
    ],
    "text": "Spice India is a casual Indian restaurant serving 
      a variety of traditional and popular dishes from across India. 
      Customers can order flavorful curries, grilled specialties, 
      rice dishes, and vegetarian options. 
      The restaurant offers takeaway and delivery services. 
      Customers typically call to ask about menu options, 
      portion sizes, dietary preferences, or popular dishes. 
      Conversations often focus on ordering food efficiently 
      and clarifying customer choices."
  }
}

Experimental features

  • Improving language detection: The primary method for ensuring consistent single language output are language restrictions. In rare cases, when the model still transcribes in the wrong language, general context can help by including language and instructions keys or adding terms in the correct language.

Example for English language

{
  "context": {
    "general": [
      { 
        "key": "language", 
        "value": "English" 
      },
      { 
        "key": "instructions", 
        "value": "Conversation is in English. Output transcription only in English language." 
      }
    ],
    "terms": [
      "concurrency", 
      "polymorphism", 
      "serialization",
      "idempotency"
    ]
  }
}
  • Improving speaker diarization: Providing speaker information in general context can help the model more reliably separate voices.

Example

{
  "context": {
    "general": [
      { "key": "setting",  "value": "Talk show interview" },
      { "key": "topic",    "value": "How AI is transforming the modern business landscape" },
      { "key": "speakers", "value": "2 speakers (1 male host, 1 female guest)" }
    ]
  }
}

Context size limit

  • Maximum 8,000 tokens (~10,000 characters).
  • Supports large blocks of text: glossaries, scripts, domain summaries.
  • If you exceed the limit, the API will return an error → trim or summarize first.