Text-to-Speech

Overview

SonioxTTSService provides real-time text-to-speech synthesis using the Soniox WebSocket TTS API. Text is streamed incrementally to the Soniox endpoint, and audio is returned as base64-encoded chunks. Up to five concurrent streams are multiplexed over a single WebSocket connection, making the service efficient for interactive voice applications.

Basic usage

Add SonioxTTSService to your Pipecat pipeline:

import os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        voice="Adrian",
    ),
)

pipeline = Pipeline([
    # ...
    llm,
    tts,
    transport.output(),
])

Configuration

SonioxTTSService accepts both top-level constructor arguments (for connection and audio formatting) and a Settings object (for voice, model, and language).

Constructor arguments

Argument	Type	Default	Description
`api_key`	`str`	required	Soniox API key.
`url`	`str`	`wss://tts-rt.soniox.com/tts-websocket`	WebSocket endpoint. See regional endpoints.
`sample_rate`	`int \| None`	`None`	Output sample rate in Hz. Must be one of `{8000, 16000, 24000, 44100, 48000}` for raw PCM.
`audio_format`	`str`	`pcm_s16le`	Output audio format. The default matches Pipecat's downstream audio pipeline.
`text_aggregation_mode`	`TextAggregationMode`	`TextAggregationMode.SENTENCE`	Controls how incoming text is aggregated before synthesis. `SENTENCE` for natural speech, `TOKEN` for lower latency.
`settings`	`SonioxTTSService.Settings`	`None`	Runtime-configurable settings (see below).

Settings

Pass via settings=SonioxTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame.

Setting	Type	Default	Description
`model`	`str`	`tts-rt-v1`	TTS model identifier.
`voice`	`str`	`Adrian`	Voice identifier. See voices.
`language`	`Language \| str`	`Language.EN`	Language for synthesis.

Advanced usage

Regional endpoints

If you want to use a different region than the default (US), pass a regional url to the SonioxTTSService constructor:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    url="wss://tts-rt.eu.soniox.com/tts-websocket",
)

See the list of regional endpoints for available endpoints.

Voice and language

Soniox TTS supports 60+ languages and a range of voices. Specify them via Settings:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        model="tts-rt-v1",
        voice="Adrian",
        language="en",
    ),
)

See the list of supported languages and available voices.

Sample rate

When using a raw PCM audio format, the sample rate must be one of {8000, 16000, 24000, 44100, 48000}:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    sample_rate=16000,
    settings=SonioxTTSService.Settings(
        voice="Adrian",
    ),
)

If sample_rate is None, the service inherits from the pipeline's configured sample rate.

Text aggregation

By default the service buffers text until sentence boundaries (TextAggregationMode.SENTENCE) to produce more natural speech. For lower latency, switch to token-level streaming:

from pipecat.services.soniox.tts import SonioxTTSService
from pipecat.services.tts_service import TextAggregationMode

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    text_aggregation_mode=TextAggregationMode.TOKEN,
    settings=SonioxTTSService.Settings(
        voice="Adrian",
    ),
)

Complete examples

The following example demonstrates how to use SonioxTTSService in a Pipecat project:

Voice bot with Soniox STT and TTS

End-to-end voice bot using SonioxSTTService and SonioxTTSService.