Real-time speech generation with Python SDK
Convert text to speech with the Python SDK
Soniox Python SDK supports real-time Text-to-Speech generation with low latency streaming output. You send text chunks over WebSocket and receive audio chunks as they are generated.
Connect to a real-time session
For config options see: TTS WebSocket API and RealtimeTTSConfig reference.
Async real-time session
Send text incrementally
Use send_text_chunk when text arrives dynamically (for example from an LLM stream).
Set text_end=True on the final chunk, or call finish().
Equivalent explicit finalization:
Receive events vs audio chunks
receive_audio_chunks() yields decoded audio bytes directly and stops after finalization.
Use receive_events() when you want access to raw event metadata like audio_end, terminated, and errors.
Multiple streams on one connection
A single WebSocket connection can carry up to 5 concurrent streams.
Use connect_multi_stream() to open a multiplexed connection, then call open_stream() for each stream.
Each stream has its own stream_id and operates independently — you can send text and receive audio on all streams in parallel.
Async multi-stream
Sync multi-stream
In synchronous code, use threads to send text and receive audio from each stream concurrently.
Error handling
A failed stream does not close the whole WebSocket connection by default.
Stream-level errors finalize only that stream (terminated=True for the same stream_id), while other streams on the same connection can continue.
Connection-level failures end the whole connection and all streams.