LangChain
Soniox document loader for LangChain

Overview
LangChain is a popular framework for building applications powered by large language models (LLMs).
The @soniox/langchain package provides a document loader that transcribes audio files using Soniox's speech-to-text API, making it easy to incorporate audio transcription into your LangChain pipelines.
Setup
Install the package:
Credentials
Get your Soniox API key from the Soniox Console and set it as an environment variable:
Usage
Basic transcription
Transcribe audio files using the SonioxAudioTranscriptLoader:
Two-way translation
Transcribe and translate between two languages simultaneously:
One-way translation
Translate from any detected language to a target language:
Advanced usage
Language hints
Provide language hints to improve transcription accuracy:
Context for improved accuracy
Provide domain-specific context to improve transcription accuracy:
API reference
Constructor parameters
SonioxLoaderParams (required)
| Parameter | Type | Required | Description |
|---|---|---|---|
audio | Uint8Array | string | Yes | Audio file as buffer or URL |
audioFormat | SonioxAudioFormat | No | Audio file format |
apiKey | string | No | Soniox API key (defaults to SONIOX_API_KEY env var) |
apiBaseUrl | string | No | API base URL (defaults to https://api.soniox.com/v1). See regional endpoints. |
pollingIntervalMs | number | No | Polling interval in ms (min: 1000, default: 1000) |
pollingTimeoutMs | number | No | Polling timeout in ms (default: 180000) |
SonioxLoaderOptions (optional)
| Parameter | Type | Description |
|---|---|---|
model | SonioxTranscriptionModelId | Model to use (default: "stt-async-v3") |
translation | object | Translation configuration |
language_hints | string[] | Language hints for transcription |
language_hints_strict | boolean | Enforce strict language hints |
enable_speaker_diarization | boolean | Enable speaker identification |
enable_language_identification | boolean | Enable language detection |
context | object | Context for improved accuracy |
Browse the API reference for a full list of supported options.
Supported audio formats
aac- Advanced Audio Codingaiff- Audio Interchange File Formatamr- Adaptive Multi-Rateasf- Advanced Systems Formatflac- Free Lossless Audio Codecmp3- MPEG Audio Layer IIIogg- Ogg Vorbiswav- Waveform Audio File Formatwebm- WebM Audio
Return value
The load() method returns an array containing a single Document object:
The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.