LangChain

Overview

LangChain is a popular framework for building applications powered by large language models (LLMs). The @soniox/langchain package provides a document loader that transcribes audio files using Soniox's speech-to-text API, making it easy to incorporate audio transcription into your LangChain pipelines.

Setup

Install the package:

npm install @soniox/langchain

Credentials

Get your Soniox API key from the Soniox Console and set it as an environment variable:

export SONIOX_API_KEY=your_api_key

Usage

Basic transcription

Transcribe audio files using the SonioxAudioTranscriptLoader:

import { SonioxAudioTranscriptLoader } from "@soniox/langchain";

// Fetch the file
const response = await fetch(
  "https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3",
);
const audioBuffer = await response.bytes(); // Uint8Array

const loader = new SonioxAudioTranscriptLoader({
  audio: audioBuffer, // Or you can pass in a URL string
});

const docs = await loader.load();
console.log(docs[0].pageContent); // Transcribed text

Two-way translation

Transcribe and translate between two languages simultaneously:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    translation: {
      type: "two_way",
      language_a: "en",
      language_b: "es",
    },
    language_hints: ["en", "es"],
  },
);

const docs = await loader.load();

One-way translation

Translate from any detected language to a target language:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    translation: {
      type: "one_way",
      target_language: "fr",
    },
    language_hints: ["en"],
  },
);

const docs = await loader.load();

Advanced usage

Language hints

Provide language hints to improve transcription accuracy:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    language_hints: ["en", "es"],
  },
);

Context for improved accuracy

Provide domain-specific context to improve transcription accuracy:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    context: {
      general: [
        { key: "industry", value: "healthcare" },
        { key: "meeting_type", value: "consultation" },
      ],
      terms: ["hypertension", "cardiology", "metformin"],
      translation_terms: [
        { source: "blood pressure", target: "presión arterial" },
        { source: "medication", target: "medicamento" },
      ],
    },
  },
);

API reference

Constructor parameters

SonioxLoaderParams (required)

Parameter	Type	Required	Description
`audio`	`Uint8Array \| string`	Yes	Audio file as buffer or URL
`audioFormat`	`SonioxAudioFormat`	No	Audio file format
`apiKey`	`string`	No	Soniox API key (defaults to `SONIOX_API_KEY` env var)
`apiBaseUrl`	`string`	No	API base URL (defaults to `https://api.soniox.com/v1`). See regional endpoints.
`pollingIntervalMs`	`number`	No	Polling interval in ms (min: 1000, default: 1000)
`pollingTimeoutMs`	`number`	No	Polling timeout in ms (default: 180000)

SonioxLoaderOptions (optional)

Parameter	Type	Description
`model`	`SonioxTranscriptionModelId`	Model to use (default: `"stt-async-v3"`)
`translation`	`object`	Translation configuration
`language_hints`	`string[]`	Language hints for transcription
`language_hints_strict`	`boolean`	Enforce strict language hints
`enable_speaker_diarization`	`boolean`	Enable speaker identification
`enable_language_identification`	`boolean`	Enable language detection
`context`	`object`	Context for improved accuracy

Browse the API reference for a full list of supported options.

Supported audio formats

aac - Advanced Audio Coding
aiff - Audio Interchange File Format
amr - Adaptive Multi-Rate
asf - Advanced Systems Format
flac - Free Lossless Audio Codec
mp3 - MPEG Audio Layer III
ogg - Ogg Vorbis
wav - Waveform Audio File Format
webm - WebM Audio

Return value

The load() method returns an array containing a single Document object:

Document {
  pageContent: string, // The transcribed text
  metadata: SonioxTranscriptResponse // Full transcript with metadata
}

The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.

On this page