Text-to-Speech
Use Soniox Text-to-Speech in a Pipecat pipeline.
Overview
SonioxTTSService provides real-time text-to-speech synthesis using the Soniox WebSocket TTS API. Text is streamed incrementally to the Soniox endpoint, and audio is returned as base64-encoded chunks. Up to five concurrent streams are multiplexed over a single WebSocket connection, making the service efficient for interactive voice applications.
Basic usage
Add SonioxTTSService to your Pipecat pipeline:
Configuration
SonioxTTSService accepts both top-level constructor arguments (for connection and audio formatting) and a Settings object (for voice, model, and language).
Constructor arguments
| Argument | Type | Default | Description |
|---|---|---|---|
api_key | str | required | Soniox API key. |
url | str | wss://tts-rt.soniox.com/tts-websocket | WebSocket endpoint. See regional endpoints. |
sample_rate | int | None | None | Output sample rate in Hz. Must be one of {8000, 16000, 24000, 44100, 48000} for raw PCM. |
audio_format | str | pcm_s16le | Output audio format. The default matches Pipecat's downstream audio pipeline. |
text_aggregation_mode | TextAggregationMode | TextAggregationMode.SENTENCE | Controls how incoming text is aggregated before synthesis. SENTENCE for natural speech, TOKEN for lower latency. |
settings | SonioxTTSService.Settings | None | Runtime-configurable settings (see below). |
Settings
Pass via settings=SonioxTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame.
| Setting | Type | Default | Description |
|---|---|---|---|
model | str | tts-rt-v1-preview | TTS model identifier. |
voice | str | Adrian | Voice identifier. See voices. |
language | Language | str | Language.EN | Language for synthesis. |
Advanced usage
Regional endpoints
If you want to use a different region than the default (US), pass a regional url to the SonioxTTSService constructor:
See the list of regional endpoints for available endpoints.
Voice and language
Soniox TTS supports 60+ languages and a range of voices. Specify them via Settings:
See the list of supported languages and available voices.
Sample rate
When using a raw PCM audio format, the sample rate must be one of {8000, 16000, 24000, 44100, 48000}:
If sample_rate is None, the service inherits from the pipeline's configured sample rate.
Text aggregation
By default the service buffers text until sentence boundaries (TextAggregationMode.SENTENCE) to produce more natural speech. For lower latency, switch to token-level streaming:
Complete examples
The following example demonstrates how to use SonioxTTSService in a Pipecat project: