LangChain (Python)

Overview

LangChain is a popular framework for building applications powered by large language models (LLMs). The langchain-soniox package provides a document loader that transcribes audio files using Soniox's speech-to-text API, making it easy to incorporate audio transcription into your LangChain pipelines.

Setup

Install the package:

pip install langchain-soniox

Credentials

Get your Soniox API key from the Soniox Console and set it as an environment variable:

export SONIOX_API_KEY=your_api_key

Usage

Basic transcription

Transcribe audio files using the SonioxDocumentLoader:

from langchain_soniox import SonioxDocumentLoader

# Using a URL
loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)

docs = list(loader.lazy_load())
print(docs[0].page_content)  # Transcribed text

You can also load audio from a local file or from bytes:

# Using a local file path
loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3")

# Using binary data
with open("/path/to/audio.mp3", "rb") as f:
    audio_bytes = f.read()
loader = SonioxDocumentLoader(file_data=audio_bytes)

Async loading

For async operations, use alazy_load():

import asyncio
from langchain_soniox import SonioxDocumentLoader

async def transcribe_async():
    loader = SonioxDocumentLoader(
        file_url="https://soniox.com/media/examples/coffee_shop.mp3"
    )

    docs = [doc async for doc in loader.alazy_load()]
    print(docs[0].page_content)

asyncio.run(transcribe_async())

Advanced usage

Language hints

Soniox automatically detects and transcribes speech in 60+ languages. When you know which languages are likely to appear in your audio, provide language_hints to improve accuracy by biasing recognition toward those languages.

Language hints do not restrict recognition — they only bias the model toward the specified languages, while still allowing other languages to be detected if present.

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        language_hints=["en", "es"],
    ),
)

docs = list(loader.lazy_load())

For more details, see the Soniox language hints documentation.

Speaker diarization

Enable speaker identification to distinguish between different speakers:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_speaker_diarization=True,
    ),
)

docs = list(loader.lazy_load())

# Access speaker information in the metadata
current_speaker = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_speaker != token["speaker"]:
        current_speaker = token["speaker"]
        output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

Language identification

Enable automatic language detection and identification:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        enable_language_identification=True,
    ),
)

docs = list(loader.lazy_load())

# Access language information in the metadata
current_language = None
output = ""
for token in docs[0].metadata["tokens"]:
    if current_language != token["language"]:
        current_language = token["language"]
        output += f"\n[{current_language}] {token['text'].lstrip()}"
    else:
        output += token["text"]
print(output)

Context for improved accuracy

Provide domain-specific context to improve transcription accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary.

The context object supports four optional sections:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    StructuredContext,
    StructuredContextGeneralItem,
    StructuredContextTranslationTerm,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        context=StructuredContext(
            # Structured key-value information (domain, topic, intent, etc.)
            general=[
                StructuredContextGeneralItem(key="domain", value="Healthcare"),
                StructuredContextGeneralItem(
                    key="topic", value="Diabetes management consultation"
                ),
                StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
            ],
            # Longer free-form background text or related documents
            text="The patient has a history of...",
            # Domain-specific or uncommon words
            terms=["Celebrex", "Zyrtec", "Xanax"],
            # Custom translations for ambiguous terms
            translation_terms=[
                StructuredContextTranslationTerm(
                    source="Mr. Smith", target="Sr. Smith"
                ),
                StructuredContextTranslationTerm(source="MRI", target="RM"),
            ],
        ),
    ),
)

docs = list(loader.lazy_load())

For more details, see the Soniox context documentation.

Translation

Translate from any detected language to a target language:

from langchain_soniox import (
    SonioxDocumentLoader,
    SonioxTranscriptionOptions,
    TranslationConfig,
)

loader = SonioxDocumentLoader(
    file_url="https://soniox.com/media/examples/coffee_shop.mp3",
    options=SonioxTranscriptionOptions(
        translation=TranslationConfig(
            type="one_way",
            target_language="fr",
        ),
        language_hints=["en"],
    ),
)

docs = list(loader.lazy_load())

for token in docs[0].metadata["tokens"]:
    if token["translation_status"] == "translation":
        translated_text += token["text"]
    else:
        original_text += token["text"]

print(original_text)
print(translated_text)

You can also transcribe and translate between two languages simultaneously using two_way translation type. Learn more about translation here.

API reference

Constructor parameters

Parameter	Type	Required	Default	Description
`file_path`	`str`	No*	`None`	Path to local audio file to transcribe
`file_data`	`bytes`	No*	`None`	Binary data of audio file to transcribe
`file_url`	`str`	No*	`None`	URL of audio file to transcribe
`api_key`	`str`	No	`SONIOX_API_KEY` env var	Soniox API key
`base_url`	`str`	No	`https://api.soniox.com/v1`	API base URL (see regional endpoints)
`options`	`SonioxTranscriptionOptions`	No	`SonioxTranscriptionOptions()`	Transcription options
`polling_interval_seconds`	`float`	No	`1.0`	Time between status polls (seconds)
`timeout_seconds`	`float`	No	`300.0` (5 minutes)	Maximum time to wait for transcription
`http_request_timeout_seconds`	`float`	No	`60.0`	Timeout for individual HTTP requests

* You must specify exactly one of: file_path, file_data, or file_url.

Transcription options

The SonioxTranscriptionOptions class supports these parameters:

Parameter	Type	Description
`model`	`str`	Async model to use (see available models)
`language_hints`	`list[str]`	Language hints for transcription (ISO language codes)
`language_hints_strict`	`bool`	Enforce strict language hints
`enable_speaker_diarization`	`bool`	Enable speaker identification
`enable_language_identification`	`bool`	Enable language detection
`translation`	`TranslationConfig`	Translation configuration
`context`	`StructuredContext`	Context for improved accuracy
`client_reference_id`	`str`	Custom reference ID for your records
`webhook_url`	`str`	Webhook URL for completion notifications
`webhook_auth_header_name`	`str`	Custom auth header name for webhook
`webhook_auth_header_value`	`str`	Custom auth header value for webhook

Browse the API documentation for a full list of supported options.

Return value

The lazy_load() and alazy_load() methods yield a single Document object:

Document(
    page_content=str,  # The transcribed text
    metadata={
        "source": str,  # File URL, path, or "file_upload"
        "transcription_id": str,  # Unique transcription ID
        "audio_duration_ms": int,  # Audio duration in milliseconds
        "model": str,  # Model used for transcription
        "created_at": str,  # ISO 8601 timestamp
        "tokens": list[dict],  # Detailed token-level information
    }
)

The tokens array in metadata includes detailed information for each transcribed word:

text: The transcribed text
start_ms: Start time in milliseconds
end_ms: End time in milliseconds
speaker: Speaker ID (if diarization enabled), for example "1", "2", etc.
language: Detected language (if identification enabled), for example "en", "fr", etc.
translation_status: Translation status ("original", "translated" or "none")

Learn more about the Soniox API reference.

LangChain (Python)

On this page