Soniox
SDKsPython

Real-time transcription with Python SDK

Create and connect to Soniox real-time speech-to-text sessions with the Python SDK

Soniox Python SDK supports transcribing audio in real-time with low latency and high accuracy. This makes it ideal for voice assistants, live captions, and conversational AI.

Connect to a real-time session

Example below streams audio from live radio to the Soniox real-time API. If you want to stream from a file instead, see: Create your first real-time session.

from typing import Iterator
from soniox import SonioxClient
from soniox.types import (
    RealtimeSTTConfig,
    Token,
    StructuredContext,
    StructuredContextGeneralItem,
)
from soniox.utils import render_tokens, start_audio_thread
import httpx

AUDIO_URL = "https://npr-ice.streamguys1.com/live.mp3?ck=1742897559135"

# Fetch audio from a live radio stream and yield it in chunks.
def stream_audio_from_url(audio_url) -> Iterator[bytes]:
    with httpx.Client() as client:
        with client.stream("GET", audio_url) as response:
            response.raise_for_status()
            for chunk in response.iter_bytes(4096):
                if chunk:
                    yield chunk


client = SonioxClient()

# Create config, see below for all parameters
config = RealtimeSTTConfig(
    model="stt-rt-v4",
    audio_format="mp3",
    enable_endpoint_detection=True,
    enable_speaker_diarization=True,
    language_hints=["en"],
    context=StructuredContext(
        general=[StructuredContextGeneralItem(key="domain", value="live radio / news broadcast")],
        text="Live NPR news and talk radio stream, including interviews, music, and commentary.",
        terms=["NPR", "news", "interview", "music", "commentary", "report", "broadcast", "anchor"],
    ),
)
final_tokens: list[Token] = []
non_final_tokens: list[Token] = []

def realtime():
    # Create new real-time websocket session
    with client.realtime.stt.connect(config=config) as session:
        # Stream audio from live radio to websocket
        start_audio_thread(session, stream_audio_from_url(AUDIO_URL))

        # Receive events from Soniox Real-time STT
        for event in session.receive_events():
            for token in event.tokens:
                if token.is_final:
                    final_tokens.append(token)
                else:
                    non_final_tokens.append(token)
            print(render_tokens(final_tokens, non_final_tokens))
            non_final_tokens.clear()

realtime()

For config options see: WebSocket API or RealtimeSTTConfig reference.

Endpoint detection

Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.

Read more about Endpoint detection

Enable endpoint detection by setting enable_endpoint_detection=True in the session config. You will receive special token <end> when speech ends.

# Enable endpoint detection
config = RealtimeSTTConfig(
    enable_endpoint_detection=True,
    ...
)

# When receiving events, check for special token <end>
for event in session.receive_events():
    for token in event.tokens:
        if token.text == "<end>":
            print("Endpoint detected")

Manual finalization

Manual finalization gives you precise control over when audio should be finalized. When you know the user stopped talking (push-to-talk or client-side VAD), call finalize to mark all outstanding tokens as final.

Read more about Manual finalization

# Finalize current buffered audio without closing the session.
session.finalize()

Pause and resume

session.pause(); // keeps connection alive, drops audio while paused
session.resume(); // resume sending audio

You are billed for the full stream duration even when session is paused.

Keepalive

Soniox terminates your session if no audio arrives for ~20 seconds. To keep the connection alive, send a keepalive control message or run a background keepalive loop.

Python SDK automatically sends keepalive messages when session is paused via session.pause().

# Sends keepalive message manually
session.keep_alive()

Read more about Connection keepalive

Streaming audio from a file

Use stream_audio() with start_audio_thread() to stream from a file while receiving events. If you are streaming live audio (microphone, client stream, etc.), you can feed raw chunks without throttling. If you are streaming a prerecorded file, throttle chunks to simulate real-time delivery.

from soniox.utils import stream_audio, start_audio_thread, throttle_audio

...

with client.realtime.stt.connect(config=config) as session:
    # Start streaming audio on a background thread.
    start_audio_thread(session, stream_audio("audio.wav"))
    # Or throttle local audio file to simulate streaming (sends chunk every 100 ms)
    start_audio_thread(session, throttle_audio("audio.wav", delay_seconds=0.1))

    ...

Use send_bytes if you need more control

Direct stream and proxy stream

Read more about Direct stream and Proxy stream.

For direct streaming from a client, issue a temporary API key and pass it to the browser or device that will open the WebSocket connection:

from soniox import SonioxClient

client = SonioxClient()

key = client.auth.create_temporary_api_key(
    expires_in_seconds=3600,
    client_reference_id="support-call-123",
)

print(key.api_key, key.expires_at)

For proxy streaming, keep the WebSocket connection on your server and stream audio through your backend.