Speech-to-Text

Overview

SonioxSTTService provides real-time speech-to-text transcription using the Soniox WebSocket API, with support for 60+ languages, context customization, multilingual audio, and speaker diarization.

By default, the service uses the stt-rt-v4 model with vad_force_turn_endpoint=True, which disables Soniox's native turn detection and relies on Pipecat's local VAD to finalize transcripts. This significantly reduces the time to final segment. To use Soniox's native endpoint detection instead, set vad_force_turn_endpoint=False.

Basic usage

Add SonioxSTTService to your Pipecat pipeline:

import os
from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
)

pipeline = Pipeline([
    transport.input(),
    stt,
    llm,
    # ...
])

Configuration

SonioxSTTService accepts both top-level constructor arguments (for connection and behavior) and a Settings object (for runtime-configurable transcription parameters).

Constructor arguments

Argument	Type	Default	Description
`api_key`	`str`	required	Soniox API key.
`url`	`str`	`wss://stt-rt.soniox.com/transcribe-websocket`	Soniox WebSocket endpoint. See regional endpoints.
`sample_rate`	`int \| None`	`None`	Audio sample rate in Hz. When `None`, uses the pipeline's configured sample rate.
`vad_force_turn_endpoint`	`bool`	`True`	When enabled, Pipecat's local VAD triggers transcript finalization.
`settings`	`SonioxSTTService.Settings`	`None`	Runtime-configurable settings (see below).

Settings

Pass via settings=SonioxSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame.

Setting	Type	Default	Description
`model`	`str`	`stt-rt-v4`	Model to use for transcription.
`language_hints`	`list[Language]`	`None`	Language hints to bias recognition toward expected languages.
`language_hints_strict`	`bool`	`None`	If true, restrict recognition to the provided languages.
`context`	`SonioxContextObject`	`None`	Context customization to improve transcription accuracy.
`enable_speaker_diarization`	`bool`	`False`	Annotate tokens with speaker IDs.
`enable_language_identification`	`bool`	`False`	Annotate tokens with language IDs.
`client_reference_id`	`str`	`None`	Client reference ID for tracking.

Advanced usage

Regional endpoints

If you want to use a different region than the default (US), pass a regional url to the SonioxSTTService constructor:

from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    url="wss://stt-rt.eu.soniox.com/transcribe-websocket",
)

See the list of regional endpoints for available endpoints.

Language hints

There is no need to pre-select a language. The model automatically detects and transcribes any supported language and handles multilingual audio seamlessly, even when multiple languages are mixed within a single sentence or conversation.

When you have prior knowledge of the languages likely to appear in your audio, language hints help the model prioritize them for greater accuracy:

from pipecat.services.soniox.stt import SonioxSTTService
from pipecat.transcriptions.language import Language

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        language_hints=[Language.EN, Language.ES, Language.JA, Language.ZH],
    ),
)

Language variants are ignored, for example Language.EN_GB is treated the same as Language.EN. See the list of supported languages and learn more about language hints.

Customization with context

By providing context, you help the model better understand and anticipate the language in your audio, even when some terms do not appear clearly or completely.

from pipecat.services.soniox.stt import (
    SonioxSTTService,
    SonioxContextObject,
    SonioxContextGeneralItem,
)

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    settings=SonioxSTTService.Settings(
        context=SonioxContextObject(
            general=[
                SonioxContextGeneralItem(key="domain", value="Healthcare"),
            ],
            terms=["Celebrex", "Zyrtec", "Xanax"],
        ),
    ),
)

Learn more about customizing with context.

from pipecat.services.soniox.stt import SonioxSTTService

stt = SonioxSTTService(
    api_key=os.getenv("SONIOX_API_KEY"),
    vad_force_turn_endpoint=False,
)

Soniox will then listen for natural pauses in speech and finalize transcripts on its own. Learn more about endpoint detection.

Complete examples

The following examples demonstrate how to use SonioxSTTService in Pipecat projects:

Voice bot with Soniox STT and TTS

End-to-end voice bot using SonioxSTTService and SonioxTTSService.

Pipecat Cloud chatbot

Chatbot agent using Soniox STT for Pipecat Cloud.

Overview

Basic usage

Configuration

Constructor arguments

Settings

Advanced usage

Regional endpoints

Language hints

Customization with context

Endpoint detection and VAD

Pipecat VAD (default)

Soniox native endpoint detection

Complete examples

Reference

On this page