Until recently, shipping a working voice agent meant gluing together half a dozen services and hoping the seams did not show. Today, with frameworks like LiveKit, the orchestration is mostly solved. What is left is reasoning and speech, and speech is where most voice agents still break. Named entities fail to get transcribed, numbers are misspoken and latency drags. The agent works in English and falls apart in every other language.
That is the part Soniox is built for. And starting now, the full Soniox speech stack, both Speech-to-Text and Text-to-Speech, is available natively inside LiveKit.
What is LiveKit?
LiveKit is an open-source platform for real-time voice and video, and LiveKit Agents is the framework on top of it for building real-time AI agents.
It handles the hard parts of running a live voice application:
- WebRTC and telephony transport
- Voice activity detection
- Turn detection and interruption handling
- Streaming audio pipelines
- Job dispatch and scaling across workers
You compose an agent by dropping services into an AgentSession. LiveKit takes care of the orchestration so you can focus on what your agent should actually do.
Teams use LiveKit to build phone agents, browser assistants, customer support bots, AI companions, healthcare workflows, and internal voice tools. It runs the same way from a localhost demo to a production deployment serving real users.
What Soniox brings to LiveKit?
LiveKit gives you the framework. Soniox gives you the speech layer.
One speech platform for both directions
Most voice agents split STT and TTS across two vendors. That means two integrations, two dashboards, two pricing models, and two different lists of supported languages that rarely match up.
With Soniox, both sides of the conversation run through one platform. The same API key powers real-time transcription and real-time speech generation across the same 60+ languages on both sides.
Native-speaker accuracy in 60+ languages
Soniox was built for the world outside English. Soniox STT delivers native-speaker accuracy across 60+ languages, with automatic language identification and full support for mixed-language speech.
A user can start in English, switch to Spanish mid-sentence, and mention a French name. The model handles it without forcing every call through an English-first pipeline.
Soniox TTS speaks the same 60+ languages with natural voices, so the agent answers in the same language layer it heard.
Accurate on the details that break voice agents
The things that break voice agents in production are rarely the easy sentences. They are the names, phone numbers, email addresses, confirmation codes, product SKUs, and foreign words that appear in real conversations.
Soniox STT is designed to recognize these correctly, and Soniox TTS is designed to speak them back correctly. That alphanumeric and multilingual precision is the difference between a demo and something you can ship.
Low latency for natural conversations
A voice agent feels alive or dead based on how fast it responds. Soniox real-time STT is built for low-latency transcription and fast finalization. Soniox TTS streams generated speech, so the agent can start speaking almost immediately instead of waiting for the full response.
Together with LiveKit's real-time pipeline, that gives you the foundation for fast, natural turn-taking.
From prototype to global production
The same plugin works for a local prototype, a browser assistant, a phone bot, or a global rollout. Soniox supports high-concurrency real-time workloads and regional endpoints, so latency and data residency stay under control as your agent scales.
A unified speech layer
The LiveKit Soniox plugin gives you both STT and TTS in one place. Drop them into your AgentSession and you have the full speech loop.
Install the plugin:
pip install "livekit-agents[soniox]"
Set your API key in .env:
SONIOX_API_KEY=<your_soniox_api_key>
You can grab a key in the Soniox Console.
A minimal voice agent using Soniox for both STT and TTS:
from livekit.agents import AgentSession
from livekit.plugins import soniox
session = AgentSession(
stt=soniox.STT(),
tts=soniox.TTS(),
# llm, vad, and other settings
)
That is the whole speech layer.
When you need more control, the same plugin exposes language hints, regional endpoints, and context customization on STT, and full control over voice, language, model, and audio format on TTS. For example, a healthcare agent with domain context, custom vocabulary, and a tuned TTS voice:
from livekit.plugins import soniox
from livekit.plugins.soniox import (
ContextObject,
ContextGeneralItem,
ContextTranslationTerm,
)
session = AgentSession(
stt=soniox.STT(
params=soniox.STTOptions(
language_hints=["fr"],
context=ContextObject(
general=[ContextGeneralItem(key="domain", value="Healthcare")],
terms=["Celebrex", "Zyrtec", "Xanax", "Prilosec"],
),
),
),
tts=soniox.TTS(
model="tts-rt-v1",
voice="Mina",
language="fr",
),
)
For the full integration reference, see the Soniox LiveKit docs, Soniox STT plugin for LiveKit and Soniox TTS plugin for LiveKit.
Ship robust voice agents faster
LiveKit gives you everything you need to run a real-time voice agent in production. Soniox gives that agent ears and a voice that actually work across the languages your users speak.
Whether you are building a support bot for a global customer base, a phone agent that has to handle every accent in a country, a browser assistant, or an internal voice tool for your team, the LiveKit and Soniox combination is now a complete loop.
Get your Soniox API keyHappy building!