Pipecat

Pipecat overview

Pipecat is a framework for building voice-enabled, real-time, multimodal AI applications. Pipecat's pipeline for real-time voice applications looks like this:

Send Audio - Transmit and capture streamed audio from the user
Transcribe Speech - Convert speech to text as the user is talking
Process with LLM - Generate responses using a large language model
Convert to Speech - Transform text responses into natural speech
Play Audio - Stream the audio response back to the user

At each step, there are multiple options for services to use. Soniox provides SonioxSTTService, which handles the Transcribe Speech step. For more details on how Pipecat works, check the Pipecat documentation.

Installation

To use SonioxSTTService in Pipecat projects, you need to install the Soniox dependencies:

pip install "pipecat-ai[soniox]"

You'll also need to set up your Soniox API key as an environment variable: SONIOX_API_KEY. You can obtain a Soniox API key by signing up at Soniox console.

Usage example

To integrate SonioxSTTService into a Pipecat pipeline for real-time speech-to-text transcription, you can simply create an instance of the service and add it to your pipeline:

from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.soniox.stt import SonioxSTTService

# Configure the service
stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
)

# Use in pipeline
pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    ...
])

Complete examples

The following examples demonstrate how to use SonioxSTTService in Pipecat projects:

Minimal interruptible conversation bot

Server-side bot that listens to user voice and responds with a spoken response.

Transcribe audio stream

Transcribe audio stream in Pipecat architecture.

Complete chatbot example

Chatbot agent using Soniox STT for Pipecat Cloud.

Advanced usage

Language hints

There is no need to pre-select a language — the model automatically detects and transcribes any supported language. It also handles multilingual audio seamlessly, even when multiple languages are mixed within a single sentence or conversation.

However, when you have prior knowledge of the languages likely to be spoken in your audio, you can use language hints to guide the model toward those languages for even greater recognition accuracy.

from pipecat.services.soniox.stt import SonioxInputParams
from pipecat.transcriptions.language import Language

SonioxInputParams(
  language_hints=[Language.EN, Language.ES, Language.JA, Language.ZH],
)

Language variants are ignored, for example Language.EN_GB will be treated same as Language.EN. See list of supported languages for a list of supported languages.

You can learn more about language hints here.

Customization with context

By providing context, you help the AI model better understand and anticipate the language in your audio - even if some terms do not appear clearly or completely.

from pipecat.services.soniox.config import SonioxInputParams

SonioxInputParams(
  context="Celebrex, Zyrtec, Xanax, Prilosec, Amoxicillin Clavulanate Potassium",
),

Pipecat overview

Installation

Usage example

Complete examples

Advanced usage

Language hints

Customization with context

Endpoint Detection and VAD

Automatic Pause Detection

Using Voice Activity Detection (VAD)

On this page