Switch your LiveKit STT or TTS provider to Soniox

Overview

Each provider section below shows how to swap an STT or TTS service for Soniox in an AgentSession, plus how the source settings map across. The rest of your LiveKit setup (Agent, room, VAD, LLM, tools) stays the same.

For configuration depth, see the STT and TTS reference pages. For an end-to-end walkthrough, see Build a voice agent.

Test Soniox before you migrate

The fastest way to evaluate is on our live comparison pages: record a sample on the STT comparison page or enter text on the TTS comparison page, and see the result side-by-side with your current provider.

Cases to evaluate when building a reliable voice agent across multiple languages:

Mid-utterance language switching. Start a sentence in English and finish in Spanish, or drop a French name into an English question. Soniox detects and transcribes both.
Non-English, end-to-end. Run a full conversation in your target language. Both STT and TTS cover 60+ languages.
Tricky inputs. Order numbers, postal codes, emails, phone numbers, brand names, and product codes with non-English spellings. Soniox preserves them on the STT side and pronounces them correctly on the TTS side.
Low latency. Soniox leads on time-to-final-transcript so the LLM picks up the moment the user stops talking.

Before you migrate

Install the Soniox extra for LiveKit Agents:

pip install "livekit-agents[soniox]~=1.5"

Create an API key in the Soniox Console. The same key works for STT and TTS.

export SONIOX_API_KEY=...

LiveKit configures turn-taking on AgentSession rather than on the STT service. To use Soniox's built-in endpoint detection, pass turn_handling=TurnHandlingOptions(turn_detection="stt") to AgentSession. Keep a local VAD (e.g. Silero) in the session — AgentSession also uses it for interruption detection (catching when the caller starts speaking while the agent is talking), and interruption.mode accepts "adaptive" or "vad", not "stt". See endpoint detection.

From Deepgram

STT

Before:

from livekit.plugins import deepgram

stt = deepgram.STT(
    model="nova-3",
    language="en",
    keyterm=["Bright Smile Dental", "checkup", "cavity"],
    punctuate=True,
    interim_results=True,
    endpointing_ms=25,
    enable_diarization=True,
)

After:

from livekit.plugins import soniox
from livekit.plugins.soniox import ContextObject, STTOptions

stt = soniox.STT(
    params=STTOptions(
        language_hints=["en"],
        context=ContextObject(terms=["Bright Smile Dental", "checkup", "cavity"]),
    ),
)

Deepgram setting	Soniox equivalent	Notes
`language`	`language_hints` (list)	A hint, not a filter. Soniox still transcribes other languages
`interim_results`	Always on	Soniox streams non-final and final tokens by default
`punctuate`	Automatic in Soniox
`smart_format`, `numerals`	Automatic in Soniox
`keyterm` / `keywords`	`context.terms`	Wrap with `ContextObject(terms=[...])`
`enable_diarization`	`enable_speaker_diarization`
`endpointing_ms`	Session-level `turn_handling`	Set `turn_detection="stt"` on `AgentSession`; tune via `max_endpoint_delay_ms`

From ElevenLabs

TTS

Before:

from livekit.plugins import elevenlabs

tts = elevenlabs.TTS(
    voice_id="<YOUR_VOICE_ID>",
    model="eleven_multilingual_v2",
    language="en",
)

After:

from livekit.plugins import soniox

tts = soniox.TTS(voice="Adrian", language="en")

ElevenLabs TTS setting	Soniox equivalent	Notes
`voice_id`	`voice`	See Soniox voices
`language`	`language`
`model`	`model`	Soniox default

STT

Before:

from livekit.plugins import elevenlabs

stt = elevenlabs.STT(model_id="scribe_v2_realtime")

After:

from livekit.plugins import soniox
from livekit.plugins.soniox import STTOptions

stt = soniox.STT(
    params=STTOptions(
        language_hints=["en"],
        enable_language_identification=True,
    ),
)

ElevenLabs Realtime STT setting	Soniox equivalent	Notes
`model_id`	`model` (default `stt-rt-v4`)
VAD / endpointing extra kwargs	Session-level `turn_handling`	Set `turn_detection="stt"` on `AgentSession`; tune via `max_endpoint_delay_ms`
`language_code`	`language_hints` (list)

From AssemblyAI

STT

Before:

from livekit.plugins import assemblyai

stt = assemblyai.STT(
    model="u3-rt-pro",
    language_detection=True,
    speaker_labels=True,
    keyterms_prompt=["Bright Smile Dental", "checkup", "cavity"],
    format_turns=False,
    end_of_turn_confidence_threshold=0.4,
    min_turn_silence=400,
    max_turn_silence=1280,
)

After:

from livekit.plugins import soniox
from livekit.plugins.soniox import ContextObject, STTOptions

stt = soniox.STT(
    params=STTOptions(
        enable_language_identification=True,
        enable_speaker_diarization=True,
        context=ContextObject(terms=["Bright Smile Dental", "checkup", "cavity"]),
    ),
)

AssemblyAI setting	Soniox equivalent	Notes
`language_detection`	`enable_language_identification`	On by default in Soniox
`keyterms_prompt`	`context.terms`	Wrap with `ContextObject(terms=[...])`
`speaker_labels`	`enable_speaker_diarization`
`format_turns`	Automatic in Soniox
`end_of_turn_confidence_threshold`, `min_turn_silence`, `max_turn_silence`	Session-level `turn_handling`	Set `turn_detection="stt"` on `AgentSession`; tune via `max_endpoint_delay_ms`

From Cartesia

TTS

Before:

from livekit.plugins import cartesia

tts = cartesia.TTS(
    model="sonic-3",
    voice="f786b574-daa5-4673-aa0c-cbe3e8534c02",
    language="en",
)

After:

from livekit.plugins import soniox

tts = soniox.TTS(voice="Adrian", language="en")

Cartesia TTS setting	Soniox equivalent	Notes
`model`	`model`	Soniox default `tts-rt-v1-preview`
`voice`	`voice`	See Soniox voices
`language`	`language`

Cartesia supports inline SSML-like tags (e.g. <spell>) in input text. Strip them before passing text to Soniox, otherwise the bot will read the tags aloud literally.

STT

Before:

from livekit.plugins import cartesia

stt = cartesia.STT(model="ink-whisper", language="en")

After:

from livekit.plugins import soniox
from livekit.plugins.soniox import STTOptions

stt = soniox.STT(params=STTOptions(language_hints=["en"]))

Cartesia STT setting	Soniox equivalent	Notes
`model`	`model` (default `stt-rt-v4`)
`language`	`language_hints` (list)	A hint, not a filter. Soniox still transcribes other languages

From OpenAI

OpenAI provides both STT and TTS services in LiveKit.

STT

Before:

from livekit.plugins import openai

stt = openai.STT(
    model="gpt-4o-transcribe",
    language="en",
    prompt="Expect medical terminology and patient names.",
    detect_language=True,
)

After:

from livekit.plugins import soniox
from livekit.plugins.soniox import ContextObject, STTOptions

stt = soniox.STT(
    params=STTOptions(
        language_hints=["en"],
        context=ContextObject(text="Expect medical terminology and patient names."),
    ),
)

OpenAI STT setting	Soniox equivalent	Notes
`model`	`model` (default `stt-rt-v4`)
`language`	`language_hints` (list)	A hint, not a filter. Soniox still transcribes other languages
`detect_language`	`enable_language_identification`	On by default in Soniox
`prompt`	`context.text`	Wrap with `ContextObject(text="...")`

| use_realtime | Soniox is always realtime | |

TTS

Before:

from livekit.plugins import openai

tts = openai.TTS(
    model="gpt-4o-mini-tts",
    voice="ash",
)

After:

from livekit.plugins import soniox

tts = soniox.TTS(voice="Adrian")

OpenAI TTS setting	Soniox equivalent	Notes
`voice`	`voice`	see Soniox voices
`model`	`model`	Soniox default `tts-rt-v1-preview`

What to read next

Soniox STT in LiveKit

Constructor arguments, STTOptions, language hints, context customization, and endpoint detection.

Soniox TTS in LiveKit

Constructor arguments, voices, sample rates, and runtime option updates.

Build a voice agent

End-to-end walkthrough of a Soniox-only LiveKit voice agent.

Switch your LiveKit STT or TTS provider to Soniox

On this page