Real-time transcription with Python SDK

Soniox Python SDK supports transcribing audio in real-time with low latency and high accuracy. This makes it ideal for voice assistants, live captions, and conversational AI.

Connect to a real-time session

Example below streams audio from live radio to the Soniox real-time API. If you want to stream from a file instead, see: Create your first real-time session.

from typing import Iterator
from soniox import SonioxClient
from soniox.types import (
    RealtimeSTTConfig,
    Token,
    StructuredContext,
    StructuredContextGeneralItem,
)
from soniox.utils import render_tokens, start_audio_thread
import httpx

AUDIO_URL = "https://npr-ice.streamguys1.com/live.mp3?ck=1742897559135"

# Fetch audio from a live radio stream and yield it in chunks.
def stream_audio_from_url(audio_url) -> Iterator[bytes]:
    with httpx.Client() as client:
        with client.stream("GET", audio_url) as response:
            response.raise_for_status()
            for chunk in response.iter_bytes(4096):
                if chunk:
                    yield chunk


client = SonioxClient()

# Create config, see below for all parameters
config = RealtimeSTTConfig(
    model="stt-rt-v4",
    audio_format="mp3",
    enable_endpoint_detection=True,
    enable_speaker_diarization=True,
    language_hints=["en"],
    context=StructuredContext(
        general=[StructuredContextGeneralItem(key="domain", value="live radio / news broadcast")],
        text="Live NPR news and talk radio stream, including interviews, music, and commentary.",
        terms=["NPR", "news", "interview", "music", "commentary", "report", "broadcast", "anchor"],
    ),
)
final_tokens: list[Token] = []
non_final_tokens: list[Token] = []

def realtime():
    # Create new real-time websocket session
    with client.realtime.stt.connect(config=config) as session:
        # Stream audio from live radio to websocket
        start_audio_thread(session, stream_audio_from_url(AUDIO_URL))

        # Receive events from Soniox Real-time STT
        for event in session.receive_events():
            for token in event.tokens:
                if token.is_final:
                    final_tokens.append(token)
                else:
                    non_final_tokens.append(token)
            print(render_tokens(final_tokens, non_final_tokens))
            non_final_tokens.clear()

realtime()

For config options see: WebSocket API or RealtimeSTTConfig reference.

Endpoint detection

Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.

Manual finalization

Manual finalization gives you precise control over when audio should be finalized. When you know the user stopped talking (push-to-talk or client-side VAD), call finalize to mark all outstanding tokens as final.

Pause and resume

session.pause(); // keeps connection alive, drops audio while paused
session.resume(); // resume sending audio

You are billed for the full stream duration even when session is paused.

Keepalive

Soniox terminates your session if no audio arrives for ~20 seconds. To keep the connection alive, send a keepalive control message or run a background keepalive loop.

Python SDK automatically sends keepalive messages when session is paused via session.pause().

# Sends keepalive message manually
session.keep_alive()

Streaming audio from a file

Use stream_audio() with start_audio_thread() to stream from a file while receiving events. If you are streaming live audio (microphone, client stream, etc.), you can feed raw chunks without throttling. If you are streaming a prerecorded file, throttle chunks to simulate real-time delivery.

from soniox.utils import stream_audio, start_audio_thread, throttle_audio

...

with client.realtime.stt.connect(config=config) as session:
    # Start streaming audio on a background thread.
    start_audio_thread(session, stream_audio("audio.wav"))
    # Or throttle local audio file to simulate streaming (sends chunk every 100 ms)
    start_audio_thread(session, throttle_audio("audio.wav", delay_seconds=0.1))

    ...

Use send_bytes if you need more control

Direct stream and proxy stream

Read more about Direct stream and Proxy stream.

For direct streaming from a client, issue a temporary API key and pass it to the browser or device that will open the WebSocket connection:

from soniox import SonioxClient

client = SonioxClient()

key = client.auth.create_temporary_api_key(
    expires_in_seconds=3600,
    client_reference_id="support-call-123",
)

print(key.api_key, key.expires_at)

For proxy streaming, keep the WebSocket connection on your server and stream audio through your backend.