Soniox

Text-to-Speech

Use Soniox Text-to-Speech in a Pipecat pipeline.

Overview

SonioxTTSService provides real-time text-to-speech synthesis using the Soniox WebSocket TTS API. Text is streamed incrementally to the Soniox endpoint, and audio is returned as base64-encoded chunks. Up to five concurrent streams are multiplexed over a single WebSocket connection, making the service efficient for interactive voice applications.

Basic usage

Add SonioxTTSService to your Pipecat pipeline:

import os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        voice="Adrian",
    ),
)

pipeline = Pipeline([
    # ...
    llm,
    tts,
    transport.output(),
])

Configuration

SonioxTTSService accepts both top-level constructor arguments (for connection and audio formatting) and a Settings object (for voice, model, and language).

Constructor arguments

ArgumentTypeDefaultDescription
api_keystrrequiredSoniox API key.
urlstrwss://tts-rt.soniox.com/tts-websocketWebSocket endpoint. See regional endpoints.
sample_rateint | NoneNoneOutput sample rate in Hz. Must be one of {8000, 16000, 24000, 44100, 48000} for raw PCM.
audio_formatstrpcm_s16leOutput audio format. The default matches Pipecat's downstream audio pipeline.
text_aggregation_modeTextAggregationModeTextAggregationMode.SENTENCEControls how incoming text is aggregated before synthesis. SENTENCE for natural speech, TOKEN for lower latency.
settingsSonioxTTSService.SettingsNoneRuntime-configurable settings (see below).

Settings

Pass via settings=SonioxTTSService.Settings(...). These can be updated mid-conversation with TTSUpdateSettingsFrame.

SettingTypeDefaultDescription
modelstrtts-rt-v1-previewTTS model identifier.
voicestrAdrianVoice identifier. See voices.
languageLanguage | strLanguage.ENLanguage for synthesis.

Advanced usage

Regional endpoints

If you want to use a different region than the default (US), pass a regional url to the SonioxTTSService constructor:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    url="wss://tts-rt.eu.soniox.com/tts-websocket",
)

See the list of regional endpoints for available endpoints.

Voice and language

Soniox TTS supports 60+ languages and a range of voices. Specify them via Settings:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        model="tts-rt-v1",
        voice="Adrian",
        language="en",
    ),
)

See the list of supported languages and available voices.

Sample rate

When using a raw PCM audio format, the sample rate must be one of {8000, 16000, 24000, 44100, 48000}:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    sample_rate=16000,
    settings=SonioxTTSService.Settings(
        voice="Adrian",
    ),
)

If sample_rate is None, the service inherits from the pipeline's configured sample rate.

Text aggregation

By default the service buffers text until sentence boundaries (TextAggregationMode.SENTENCE) to produce more natural speech. For lower latency, switch to token-level streaming:

from pipecat.services.soniox.tts import SonioxTTSService
from pipecat.services.tts_service import TextAggregationMode

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    text_aggregation_mode=TextAggregationMode.TOKEN,
    settings=SonioxTTSService.Settings(
        voice="Adrian",
    ),
)

Complete examples

The following example demonstrates how to use SonioxTTSService in a Pipecat project:

Reference