Switch your Pipecat STT or TTS provider to Soniox

Overview

Each provider section below shows how to swap an STT or TTS service for Soniox, plus how the source settings map across. The rest of your Pipecat pipeline (transport, LLM, aggregators) stays the same.

For configuration depth, see the STT and TTS reference pages. For an end-to-end walkthrough of a Soniox-only voice agent, see Build a voice agent.

Test Soniox before you migrate

The fastest way to evaluate is on our live comparison pages: record a sample on the STT comparison page or enter text on the TTS comparison page, and see the result side-by-side with your current provider.

Cases to evaluate when building a reliable voice agent across multiple languages:

Mid-utterance language switching. Start a sentence in English and finish in Spanish, or drop a French name into an English question. Soniox detects and transcribes both.
Non-English, end-to-end. Run a full conversation in your target language. Both STT and TTS cover 60+ languages.
Tricky inputs. Order numbers, postal codes, emails, phone numbers, brand names, and product codes with non-English spellings. Soniox preserves them on the STT side and pronounces them correctly on the TTS side.
Low latency. Soniox leads the Pipecat STT benchmark on time-to-final-transcript.

Before you migrate

Install the Soniox extras for Pipecat:

pip install "pipecat-ai[soniox]"

Create an API key in the Soniox Console. The same key works for STT and TTS.

export SONIOX_API_KEY=...

From Deepgram

STT

Before:

from pipecat.services.deepgram.stt import DeepgramSTTService

stt = DeepgramSTTService(
    api_key=os.environ["DEEPGRAM_API_KEY"],
    settings=DeepgramSTTService.Settings(
        model="nova-3-general",
        language="en-US",
        smart_format=True,
        keyterm=["Bright Smile Dental", "checkup", "cavity"],
        endpointing=300,
    ),
)

After:

from pipecat.services.soniox.stt import SonioxSTTService, SonioxContextObject
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.environ["SONIOX_API_KEY"],
    vad_force_turn_endpoint=False,
    settings=SonioxSTTService.Settings(
        language_hints=[Language.EN],
        context=SonioxContextObject(terms=["Bright Smile Dental", "checkup", "cavity"]),
    ),
)

Deepgram setting	Soniox equivalent	Notes
`language`	`language_hints` (list)	a hint, not a filter. Soniox still transcribes other languages
`interim_results`	always on	Soniox streams non-final and final tokens by default
`punctuate`	automatic in Soniox
`smart_format`, `numerals`	automatic in Soniox
`keywords` / `keyterm`	`context.terms`	wrap with `SonioxContextObject(terms=[...])`
`diarize`	`enable_speaker_diarization`
`endpointing` (ms), `utterance_end_ms`	`vad_force_turn_endpoint=False`	uses Soniox endpointing
`redact`, `profanity_filter`	no equivalent	handle in your application if needed

From ElevenLabs

TTS

Before:

from pipecat.services.elevenlabs.tts import ElevenLabsTTSService

tts = ElevenLabsTTSService(
    api_key=os.environ["ELEVENLABS_API_KEY"],
    settings=ElevenLabsTTSService.Settings(
        voice="<YOUR_VOICE_ID>",
        model="eleven_turbo_v2_5",
        stability=0.5,
        similarity_boost=0.75,
    ),
)

After:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.environ["SONIOX_API_KEY"],
    settings=SonioxTTSService.Settings(voice="Adrian"),
)

ElevenLabs TTS setting	Soniox equivalent	Notes
`voice`	`voice` (Settings)	see Soniox voices
`stability`, `similarity_boost`, `style`, `use_speaker_boost`	no equivalent
`speed`	no equivalent
`apply_text_normalization`	no equivalent

STT

Before:

from pipecat.services.elevenlabs.stt import ElevenLabsRealtimeSTTService, CommitStrategy

stt = ElevenLabsRealtimeSTTService(
    api_key=os.environ["ELEVENLABS_API_KEY"],
    commit_strategy=CommitStrategy.VAD,
    include_language_detection=True,
    settings=ElevenLabsRealtimeSTTService.Settings(
        vad_silence_threshold_secs=0.6,
        vad_threshold=0.5,
    ),
)

After:

from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.environ["SONIOX_API_KEY"],
    vad_force_turn_endpoint=False,
    settings=SonioxSTTService.Settings(
        enable_language_identification=True,
        language_hints=[Language.EN],
    ),
)

ElevenLabs Realtime STT setting	Soniox equivalent	Notes
`commit_strategy`, `vad_silence_threshold_secs`, `vad_threshold`, `min_speech_duration_ms`, `min_silence_duration_ms`	`vad_force_turn_endpoint=False`	uses Soniox endpointing
`include_language_detection`	`enable_language_identification`

From AssemblyAI

STT

Before:

from pipecat.services.assemblyai.stt import AssemblyAISTTService

stt = AssemblyAISTTService(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    settings=AssemblyAISTTService.Settings(
        language_detection=True,
        speaker_labels=True,
        keyterms_prompt=["Bright Smile Dental", "checkup", "cavity"],
    ),
)

After:

from pipecat.services.soniox.stt import SonioxSTTService, SonioxContextObject

stt = SonioxSTTService(
    api_key=os.environ["SONIOX_API_KEY"],
    vad_force_turn_endpoint=False,
    settings=SonioxSTTService.Settings(
        enable_language_identification=True,
        enable_speaker_diarization=True,
        context=SonioxContextObject(terms=["Bright Smile Dental", "checkup", "cavity"]),
    ),
)

AssemblyAI setting	Soniox equivalent	Notes
`language_detection`	`enable_language_identification`
`keyterms_prompt`	`context.terms`	wrap with `SonioxContextObject(terms=[...])`
`prompt`	`context.text`	wrap with `SonioxContextObject(text="...")`
`speaker_labels`	`enable_speaker_diarization`
`format_turns`, `formatted_finals`	automatic in Soniox
`min_turn_silence`, `max_turn_silence`, `end_of_turn_confidence_threshold`, `vad_threshold`	`vad_force_turn_endpoint=False`	uses Soniox endpointing
`domain`	`context.general`	wrap with `SonioxContextObject(general=[SonioxContextGeneralItem(...)])`

From Cartesia

TTS

Before:

from pipecat.services.cartesia.tts import CartesiaTTSService

tts = CartesiaTTSService(
    api_key=os.environ["CARTESIA_API_KEY"],
    settings=CartesiaTTSService.Settings(
        voice="71a7ad14-091c-4e8e-a314-022ece01c121",
        model="sonic-2",
    ),
)

After:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.environ["SONIOX_API_KEY"],
    settings=SonioxTTSService.Settings(voice="Adrian"),
)

Cartesia TTS setting	Soniox equivalent	Notes
`voice`	`voice` (Settings)	see Soniox voices
`generation_config.speed`	no equivalent
`generation_config.volume`	no equivalent
`generation_config.emotion`	no equivalent
`pronunciation_dict_id`	no equivalent

Cartesia supports inline <spell> tags in input text. Strip them before passing text to Soniox, otherwise the bot will read the tags aloud literally.

STT

Before:

from pipecat.services.cartesia.stt import CartesiaSTTService

stt = CartesiaSTTService(
    api_key=os.environ["CARTESIA_API_KEY"],
    settings=CartesiaSTTService.Settings(language="en"),
)

After:

from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.environ["SONIOX_API_KEY"],
    settings=SonioxSTTService.Settings(language_hints=[Language.EN]),
)

Cartesia STT setting	Soniox equivalent	Notes
`language`	`language_hints` (list)	a hint, not a filter. Soniox still transcribes other languages

From OpenAI

OpenAI provides both STT and TTS services in Pipecat.

STT

Before:

from pipecat.services.openai.stt import OpenAIRealtimeSTTService
from pipecat.transcriptions.language import Language

stt = OpenAIRealtimeSTTService(
    api_key=os.environ["OPENAI_API_KEY"],
    settings=OpenAIRealtimeSTTService.Settings(
        model="gpt-4o-transcribe",
        language=Language.EN,
        prompt="Expect medical terminology and patient names.",
        noise_reduction="near_field",
    ),
)

After:

from pipecat.services.soniox.stt import SonioxSTTService, SonioxContextObject
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.environ["SONIOX_API_KEY"],
    vad_force_turn_endpoint=False,
    settings=SonioxSTTService.Settings(
        language_hints=[Language.EN],
        context=SonioxContextObject(text="Expect medical terminology and patient names."),
    ),
)

OpenAI Realtime STT setting	Soniox equivalent	Notes
`language`	`language_hints` (list)	a hint, not a filter. Soniox still transcribes other languages
`prompt`	`context.text`	wrap with `SonioxContextObject(text="...")`
`noise_reduction`	no equivalent

TTS

Before:

from pipecat.services.openai.tts import OpenAITTSService

tts = OpenAITTSService(
    api_key=os.environ["OPENAI_API_KEY"],
    settings=OpenAITTSService.Settings(
        voice="alloy",
        model="gpt-4o-mini-tts",
        instructions="Speak in a friendly, professional tone.",
    ),
)

After:

from pipecat.services.soniox.tts import SonioxTTSService

tts = SonioxTTSService(
    api_key=os.environ["SONIOX_API_KEY"],
    settings=SonioxTTSService.Settings(voice="Adrian"),
)

OpenAI TTS setting	Soniox equivalent	Notes
`voice`	`voice` (Settings)	see Soniox voices
`instructions`	no equivalent
`speed`	no equivalent

OpenAI TTS is locked to 24 kHz output, so transports or recording stages downstream of it may be configured for that rate. Pass sample_rate=24000 on SonioxTTSService to keep them aligned. See the TTS reference for supported rates.

What to read next

Soniox STT in Pipecat

Constructor arguments, settings, language hints, context customization, endpoint detection.

Soniox TTS in Pipecat

Constructor arguments, settings, voices, sample rates, text aggregation.

Build a voice agent

End-to-end walkthrough of a Soniox-only voice agent.

Switch your Pipecat STT or TTS provider to Soniox

On this page