Switch your LiveKit STT or TTS provider to Soniox
Step-by-step guide to to replace Deepgram, ElevenLabs, AssemblyAI, Cartesia, or OpenAI STT/TTS with Soniox in your LiveKit bot — config mapping and code diffs.
Overview
Each provider section below shows how to swap an STT or TTS service for Soniox in an AgentSession, plus how the source settings map across. The rest of your LiveKit setup (Agent, room, VAD, LLM, tools) stays the same.
For configuration depth, see the STT and TTS reference pages. For an end-to-end walkthrough, see Build a voice agent.
Test Soniox before you migrate
The fastest way to evaluate is on our live comparison pages: record a sample on the STT comparison page or enter text on the TTS comparison page, and see the result side-by-side with your current provider.
Cases to evaluate when building a reliable voice agent across multiple languages:
- Mid-utterance language switching. Start a sentence in English and finish in Spanish, or drop a French name into an English question. Soniox detects and transcribes both.
- Non-English, end-to-end. Run a full conversation in your target language. Both STT and TTS cover 60+ languages.
- Tricky inputs. Order numbers, postal codes, emails, phone numbers, brand names, and product codes with non-English spellings. Soniox preserves them on the STT side and pronounces them correctly on the TTS side.
- Low latency. Soniox leads on time-to-final-transcript so the LLM picks up the moment the user stops talking.
Before you migrate
Install the Soniox extra for LiveKit Agents:
Create an API key in the Soniox Console. The same key works for STT and TTS.
LiveKit configures turn-taking on AgentSession rather than on the STT
service. To use Soniox's built-in endpoint detection, pass
turn_handling=TurnHandlingOptions(turn_detection="stt") to AgentSession.
Keep a local VAD (e.g. Silero) in the session — AgentSession also uses it
for interruption detection (catching when the caller starts speaking while the
agent is talking), and interruption.mode accepts "adaptive" or "vad",
not "stt". See endpoint
detection.
From Deepgram
STT
Before:
After:
| Deepgram setting | Soniox equivalent | Notes |
|---|---|---|
language | language_hints (list) | A hint, not a filter. Soniox still transcribes other languages |
interim_results | Always on | Soniox streams non-final and final tokens by default |
punctuate | Automatic in Soniox | |
smart_format, numerals | Automatic in Soniox | |
keyterm / keywords | context.terms | Wrap with ContextObject(terms=[...]) |
enable_diarization | enable_speaker_diarization | |
endpointing (ms) | Session-level turn_handling | Set turn_detection="stt" on AgentSession; tune via max_endpoint_delay_ms |
From ElevenLabs
TTS
Before:
After:
| ElevenLabs TTS setting | Soniox equivalent | Notes |
|---|---|---|
voice_id | voice | See Soniox voices |
language | language | |
model | model | Soniox default |
STT
Before:
After:
| ElevenLabs Realtime STT setting | Soniox equivalent | Notes |
|---|---|---|
model_id | model (default stt-rt-v4) | |
| VAD / endpointing extra kwargs | Session-level turn_handling | Set turn_detection="stt" on AgentSession; tune via max_endpoint_delay_ms |
language_code | language_hints (list) |
From AssemblyAI
STT
Before:
After:
| AssemblyAI setting | Soniox equivalent | Notes |
|---|---|---|
language_detection | enable_language_identification | On by default in Soniox |
keyterms_prompt | context.terms | Wrap with ContextObject(terms=[...]) |
speaker_labels | enable_speaker_diarization | |
format_turns | Automatic in Soniox | |
end_of_turn_confidence_threshold,min_end_of_turn_silence_when_confident,max_turn_silence | Session-level turn_handling | Set turn_detection="stt" on AgentSession; tune via max_endpoint_delay_ms |
From Cartesia
TTS
Before:
After:
| Cartesia TTS setting | Soniox equivalent | Notes |
|---|---|---|
model | model | Soniox default tts-rt-v1-preview |
voice | voice | See Soniox voices |
language | language |
Cartesia supports inline SSML-like tags (e.g. <spell>) in input text. Strip them before passing text to Soniox, otherwise the bot will read the tags aloud literally.
STT
Before:
After:
| Cartesia STT setting | Soniox equivalent | Notes |
|---|---|---|
model | model (default stt-rt-v4) | |
language | language_hints (list) | A hint, not a filter. Soniox still transcribes other languages |
From OpenAI
OpenAI provides both STT and TTS services in LiveKit.
STT
Before:
After:
| OpenAI STT setting | Soniox equivalent | Notes |
|---|---|---|
model | model (default stt-rt-v4) | |
language | language_hints (list) | A hint, not a filter. Soniox still transcribes other languages |
detect_language | enable_language_identification | On by default in Soniox |
prompt | context.text | Wrap with ContextObject(text="...") |
| use_realtime | Soniox is always realtime | |
TTS
Before:
After:
| OpenAI TTS setting | Soniox equivalent | Notes |
|---|---|---|
voice | voice | see Soniox voices |
model | model | Soniox default tts-rt-v1-preview |