Soniox
Docs

Context

Soniox Speech-to-Text AI allows you to add custom context to each transcription session, enhancing the accurate recognition of specific terms, industry jargon, and names.

Context can be any text, such as a summary or a related document, or simply a list of relevant words. It may also include text or phrases that do not appear in the audio. The AI model utilizes this context when needed to improve recognition accuracy.

The maximum length of the context is 10,000 characters, which enables you to provide substantial amounts of contextual information to the AI model.

Example

The context parameter, which contains a string, can be passed as a payload when calling the Create Transcription API endpoint or when starting real-time transcription.

Example payload that includes a fictional product name, "Talk-a-Script," as context to improve recognition accuracy:

{
  "context": "Talk-a-Script"
}

Code example using context:

import os
import time
import requests
 
API_BASE = "https://api.soniox.com"
AUDIO_URL = "https://soniox.com/media/examples/talk_a_script.mp3"
 
# Retrieve the API key from environment variable (ensure SONIOX_API_KEY is set)
API_KEY = os.environ["SONIOX_API_KEY"]
 
# Create a requests session and set the Authorization header
session = requests.Session()
session.headers["Authorization"] = f"Bearer {API_KEY}"
 
# 1. Start a new transcription session by sending the audio URL to the API
print("Starting transcription...")
response = session.post(
    f"{API_BASE}/v1/transcriptions",
    json={
        "audio_url": AUDIO_URL,
        "model": "stt-async-preview",
        "context": "Talk-a-Script", # include fictional product name as context
    },
)
response.raise_for_status()
transcription = response.json()
 
transcription_id = transcription["id"]
print(f"Transcription started with ID: {transcription_id}")
 
# 2. Poll the transcription endpoint until the status is 'completed'
while True:
    response = session.get(f"{API_BASE}/v1/transcriptions/{transcription_id}")
    response.raise_for_status()
    transcription = response.json()
 
    status = transcription.get("status")
    if status == "error":
        raise Exception(
            f"Transcription error: {transcription.get('error_message', 'Unknown error')}"
        )
    elif status == "completed":
        # Stop polling when the transcription is complete
        break
 
    # Wait for 1 second before polling again
    time.sleep(1)
 
# 3. Retrieve the final transcript once transcription is completed
response = session.get(f"{API_BASE}/v1/transcriptions/{transcription_id}/transcript")
response.raise_for_status()
transcript = response.json()
 
# Print the transcript text
print("Transcript:")
print(transcript["text"])
 

Output:

We developed a new coding language called Talk-a-Script.

On this page