Real-time transcription with Python SDK
Create and connect to Soniox real-time speech-to-text sessions with the Python SDK
Soniox Python SDK supports transcribing audio in real-time with low latency and high accuracy. This makes it ideal for voice assistants, live captions, and conversational AI.
Connect to a real-time session
Example below streams audio from live radio to the Soniox real-time API. If you want to stream from a file instead, see: Create your first real-time session.
For config options see: WebSocket API or RealtimeSTTConfig reference.
Endpoint detection
Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.
Read more about Endpoint detection
Enable endpoint detection by setting enable_endpoint_detection=True in the session config.
You will receive special token <end> when speech ends.
Manual finalization
Manual finalization gives you precise control over when audio should be finalized. When you know the user stopped talking (push-to-talk or client-side VAD), call finalize to mark all outstanding tokens as final.
Read more about Manual finalization
Pause and resume
You are billed for the full stream duration even when session is paused.
Keepalive
Soniox terminates your session if no audio arrives for ~20 seconds. To keep the connection alive, send a keepalive control message or run a background keepalive loop.
Python SDK automatically sends keepalive messages when session is paused via session.pause().
Read more about Connection keepalive
Streaming audio from a file
Use stream_audio() with start_audio_thread() to stream from a file while receiving events.
If you are streaming live audio (microphone, client stream, etc.), you can feed raw chunks without throttling.
If you are streaming a prerecorded file, throttle chunks to simulate real-time delivery.
Use
send_bytes
if you need more control
Direct stream and proxy stream
Read more about Direct stream and Proxy stream.
For direct streaming from a client, issue a temporary API key and pass it to the browser or device that will open the WebSocket connection:
For proxy streaming, keep the WebSocket connection on your server and stream audio through your backend.