Pipecat

Overview

Pipecat is a framework for building voice-enabled, real-time, multimodal AI applications. A typical Pipecat pipeline for voice applications looks like this:

Send Audio - Transmit and capture streamed audio from the user.
Transcribe Speech - Convert speech to text as the user is talking.
Process with LLM - Generate responses using a large language model.
Convert to Speech - Transform text responses into natural speech.
Play Audio - Stream the audio response back to the user.

Soniox plugs into two stages of this pipeline:

SonioxSTTService handles the transcribe speech step using the Soniox real-time STT API.
SonioxTTSService handles the convert to speech step using the Soniox real-time TTS API.

For more details on how Pipecat works, check the Pipecat documentation.

Installation

Install the Soniox extras for Pipecat:

pip install "pipecat-ai[soniox]"

You will also need to set up your Soniox API key as an environment variable:

export SONIOX_API_KEY=<your_soniox_api_key>

You can obtain a Soniox API key by signing up at the Soniox Console.

Services

Speech-to-Text

Use SonioxSTTService to transcribe user audio in real time, with language hints, context, and speaker diarization.

Text-to-Speech

Use SonioxTTSService to synthesize natural speech in 60+ languages over a streaming WebSocket connection.

Guides

Build a voice agent

Compose Soniox STT and TTS into a complete voice agent.

Migrate to Soniox

Swap your existing STT or TTS provider for Soniox in an existing Pipecat bot.

Pipecat

Overview

Installation

Services

Guides

On this page