Soniox | How to build a voice bot with Pipecat and Soniox

In English, voice AI is mostly a solved problem. The models are great, the vendors are interchangeable, and you can ship a working agent in a weekend.

But as soon as you step outside English, the wheels come off fast.

Your STT vendor "supports" French at half the accuracy and twice the latency. Your TTS vendor offers one robotic voice per language. The two services bill separately, scale separately, and disagree about which languages they actually support. And in the gaps between them, you're still wiring up interruption handling, WebSocket lifecycle, and turn detection. By the time the multilingual bot ships, your French users get a noticeably worse experience than your English users and you're managing two account dashboards to make it possible.

Soniox is now a first-class integration in Pipecat, with Soniox STT and Soniox TTS available as native services. Real-time multilingual speech in both directions, behind a single API key.

What is Pipecat?

Pipecat is the open-source Python framework that has become the de-facto standard for building real-time voice agents. Teams use it to ship phone bots, browser-based assistants, and multimodal experiences in production.

Pipecat lets you compose voice agents by dropping services (STT, LLM, TTS, transports) into a pipeline. You pick the components you want, chain them together, and the framework handles the parts that take weeks to get right on your own: turn detection, interruption handling, VAD orchestration, and transport management.

Why Soniox is the best speech stack for Pipecat

Voice agents are bottlenecked by three things: latency, language coverage, and how cleanly the speech-in and speech-out halves fit together. Soniox is the only Pipecat-supported provider that solves all three from a single API:

Top of the Pipecat STT benchmark: Soniox leads Pipecat's open STT benchmark on time-to-final-transcript latency (the metric that defines how alive your bot feels).
One vendor, one key, both directions: Most Pipecat speech setups split STT and TTS across two providers, with separate accounts, separate billing, and inconsistent language support. Soniox ships STT and TTS under one API key, with the same set of supported languages on both ends.
Multilingual by design: Soniox STT delivers native-speaker accuracy across 60+ languages simultaneously: no per-call language flag, no model swap between calls. A single deployed bot can answer French, Hindi, Portuguese, and Japanese callers without configuration changes.
Built for the hard parts of speech generation: Correct pronunciation of entity names, alphanumerics, foreign words, with ultra-low-latency streaming.
Scalable: Run many agents concurrently at a cost-effective price.

What we built together

The integration ships two services:

SonioxSTTService: Powered by Soniox Speech-to-Text API, it gives you state-of-the-art accuracy across 60+ languages, with automatic language identification and semantic endpoint detection.
SonioxTTSService: Powered by Soniox Text-to-Speech API. Streaming synthesis with low first-byte latency supporting the same 60+ languages as Soniox STT, so multilingual bots stay multilingual all the way through the pipeline.

Both services drop straight into a standard Pipecat pipeline alongside Daily for transport, OpenAI or Anthropic for the LLM, and Silero for VAD.

How it works

1. Install

Minimal install for STT + TTS:

pip install "pipecat-ai[soniox]"

Full stack for a complete voice agent (Daily transport, OpenAI LLM, Silero VAD, runner CLI):

pip install "pipecat-ai[soniox,daily,openai,silero,runner]"

2. Set up environment variables

SONIOX_API_KEY=...
OPENAI_API_KEY=...
DAILY_API_KEY=...

Note: You can obtain a SONIOX_API_KEY from Soniox Console.

3a. Wire up STT

from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        language_hints=[Language.EN],
    ),
)

3b. Wire up TTS

from pipecat.services.soniox.tts import SonioxTTSService
from pipecat.transcriptions.language import Language

tts = SonioxTTSService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxTTSService.Settings(
        voice="Nina",
        language=Language.EN,
    ),
)

That's the full setup. Drop the services into a Pipecat pipeline alongside your LLM and let the framework take care of the rest.

A complete voice agent, end to end

We've published a runnable example using the full stack:

Soniox STT: multilingual real-time transcription
OpenAI: the LLM driving the conversation
Soniox TTS: multilingual real-time speech synthesis
Silero: voice activity detection
Daily: WebRTC transport (browser or phone)

Clone the Pipecat repo and run it:

git clone https://github.com/pipecat-ai/pipecat.git
cd pipecat/examples/voice
python voice-soniox.py -t webrtc

Open http://localhost:7860, click Connect, and start talking. To test against a Daily room instead, use -t daily.

Full source on GitHub: examples/voice/voice-soniox.py

Happy building!