Customization

Overview

Soniox Speech-to-Text AI allows you to enhance transcription accuracy by providing custom context for each transcription session. This feature is especially useful when working with:

Industry-specific terminology
Brand names or product names
Uncommon names or made-up words
Domain-specific documents or phrases

By providing context, you help the AI model better understand and anticipate the words spoken in your audio — even if some terms do not appear clearly or completely.

The context parameter accepts any text that may be relevant to the transcription session. This text is not required to appear in the audio — it simply acts as guidance for the model to improve recognition accuracy when necessary.

The model uses the provided context only when helpful, and it does not override normal speech recognition behavior.

Supported context types

You can supply many types of text to the context parameter, such as:

List of terms or keywords

Useful for proper nouns, technical vocabulary, or product names:

{
  "context": "Celebrex, Zyrtec, Xanax, Prilosec, Amoxicillin Clavulanate Potassium"
}

Full text or summary

Provide a paragraph, summary, or reference document related to the audio content:

{
  "context": "The customer, Maria Lopez, contacted BrightWay Insurance to update her auto policy after purchasing a new vehicle. Agent Daniel Kim reviewed the changes, explained the premium adjustment, and offered a bundling discount. Maria agreed to update the policy and scheduled a follow-up to consider additional options."
}

Context size limit

The context can contain up to 8,000 tokens (roughly 10,000+ characters)
This allows you to include substantial information, including summaries, scripts, or glossary-style entries

If the context exceeds the limit, the API will return an error — be sure to trim or summarize as needed.

Best practices

Use commas or spacing to separate terms in short lists
Keep context relevant to the session — don't overload with unrelated data
Preprocess content from related documents (e.g., transcripts, emails, product info) into a clean context block

Use cases

Use case	Example context
Medical transcription	Medication names, procedure terms, doctor/patient names.
Call center recordings	Customer name, agent info, company-specific lingo.
Industry-specific jargon	Terms from legal, finance, biotech, or tech domains.
Podcasts / interviews	Guest names, brand mentions, episode summaries.
Custom words and neologisms	Fictional terms, product names, made-up branding.

Example: Custom word recognition

The following example demonstrates how to transcribe audio containing words Celebrex, Zyrtec, Xanax, Prilosec, Amoxicillin Clavulanate Potassium by including them in the context:

import os
import time
 
import requests
 
# Retrieve the API key from environment variable (ensure SONIOX_API_KEY is set)
api_key = os.environ["SONIOX_API_KEY"]
api_base = "https://api.soniox.com"
audio_url = "https://soniox.com/media/examples/context_demo.mp3"
 
session = requests.Session()
session.headers["Authorization"] = f"Bearer {api_key}"
 
 
def poll_until_complete(transcription_id):
    while True:
        res = session.get(f"{api_base}/v1/transcriptions/{transcription_id}")
        res.raise_for_status()
        data = res.json()
        if data["status"] == "completed":
            return
        elif data["status"] == "error":
            raise Exception(
                f"Transcription failed: {data.get('error_message', 'Unknown error')}"
            )
        time.sleep(1)
 
 
def main():
    print("Starting transcription...")
 
    res = session.post(
        f"{api_base}/v1/transcriptions",
        json={
            "audio_url": audio_url,
            "model": "stt-async-preview",
            "language_hints": ["en", "es"],
            "context": (
                "Celebrex, Zyrtec, Xanax, Prilosec, "
                + "Amoxicillin Clavulanate Potassium"
            ),
        },
    )
    res.raise_for_status()
    transcription_id = res.json()["id"]
    print(f"Transcription ID: {transcription_id}")
 
    # Poll until transcription is done
    poll_until_complete(transcription_id)
 
    # Get the transcript text
    res = session.get(f"{api_base}/v1/transcriptions/{transcription_id}/transcript")
    res.raise_for_status()
    print("Transcript:")
    print(res.json()["text"])
 
    # Delete the transcription
    res = session.delete(f"{api_base}/v1/transcriptions/{transcription_id}")
    res.raise_for_status()
 
 
if __name__ == "__main__":
    main()

View example on GitHub

Output