Text-to-Speech
Use Soniox Text-to-Speech in LiveKit Agents.
Overview
soniox.TTS provides real-time text-to-speech synthesis using the Soniox WebSocket TTS API. It delivers:
- Ultra-low latency — synthesis starts from the first few words, before the full sentence is available, keeping voice agents responsive.
- 60+ languages with native-speaker-quality voices. See supported languages and voices.
- Accurate alphanumerics — phone numbers, email addresses, IDs, and other mixed letter-and-digit strings are pronounced correctly.
- Hallucination-free output — the model speaks the text you send, nothing more.
Basic usage
Use Soniox TTS in an AgentSession:
Configuration
soniox.TTS accepts top-level constructor arguments for the model, voice, language, and audio settings.
Constructor arguments
| Argument | Type | Default | Description |
|---|---|---|---|
api_key | str | None | SONIOX_API_KEY env var | Soniox API key. |
model | str | tts-rt-v1-preview | TTS model identifier. |
language | str | en | Language code (e.g., "en", "es", "fr"). See supported languages. |
voice | str | Maya | Voice identifier. See voices. |
audio_format | str | pcm_s16le | Output audio format. See audio formats. |
sample_rate | int | 24000 | Output sample rate in Hz. |
bitrate | int | None | None | Codec bitrate in bps for lossy compressed formats. |
websocket_url | str | wss://tts-rt.soniox.com/tts-websocket | Soniox WebSocket endpoint. See regional endpoints. |
http_session | aiohttp.ClientSession | None | None | Optional aiohttp session to reuse for the WebSocket connection. |
Advanced usage
Regional endpoints
If you want to use a different region than the default (US), pass a regional websocket_url to the TTS constructor:
See the list of regional endpoints for available endpoints.
Voice and language
Soniox TTS supports 60+ languages and a range of voices. Specify them via the constructor:
See the list of supported languages and available voices.
Updating options at runtime
Use update_options() to change the model, language, or voice mid-session without recreating the TTS instance: