Soniox
Integrations

LangChain

Soniox document loader for LangChain

Soniox x Langchain

Overview

LangChain is a popular framework for building applications powered by large language models (LLMs). The @soniox/langchain package provides a document loader that transcribes audio files using Soniox's speech-to-text API, making it easy to incorporate audio transcription into your LangChain pipelines.

Setup

Install the package:

npm install @soniox/langchain

Credentials

Get your Soniox API key from the Soniox Console and set it as an environment variable:

export SONIOX_API_KEY=your_api_key

Usage

Basic transcription

Transcribe audio files using the SonioxAudioTranscriptLoader:

import { SonioxAudioTranscriptLoader } from "@soniox/langchain";

// Fetch the file
const response = await fetch(
  "https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3",
);
const audioBuffer = await response.bytes(); // Uint8Array

const loader = new SonioxAudioTranscriptLoader({
  audio: audioBuffer, // Or you can pass in a URL string
});

const docs = await loader.load();
console.log(docs[0].pageContent); // Transcribed text

Two-way translation

Transcribe and translate between two languages simultaneously:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    translation: {
      type: "two_way",
      language_a: "en",
      language_b: "es",
    },
    language_hints: ["en", "es"],
  },
);

const docs = await loader.load();

One-way translation

Translate from any detected language to a target language:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    translation: {
      type: "one_way",
      target_language: "fr",
    },
    language_hints: ["en"],
  },
);

const docs = await loader.load();

Advanced usage

Language hints

Provide language hints to improve transcription accuracy:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    language_hints: ["en", "es"],
  },
);

Context for improved accuracy

Provide domain-specific context to improve transcription accuracy:

const loader = new SonioxAudioTranscriptLoader(
  {
    audio: audioBuffer,
  },
  {
    context: {
      general: [
        { key: "industry", value: "healthcare" },
        { key: "meeting_type", value: "consultation" },
      ],
      terms: ["hypertension", "cardiology", "metformin"],
      translation_terms: [
        { source: "blood pressure", target: "presión arterial" },
        { source: "medication", target: "medicamento" },
      ],
    },
  },
);

API reference

Constructor parameters

SonioxLoaderParams (required)

ParameterTypeRequiredDescription
audioUint8Array | stringYesAudio file as buffer or URL
audioFormatSonioxAudioFormatNoAudio file format
apiKeystringNoSoniox API key (defaults to SONIOX_API_KEY env var)
apiBaseUrlstringNoAPI base URL (defaults to https://api.soniox.com/v1). See regional endpoints.
pollingIntervalMsnumberNoPolling interval in ms (min: 1000, default: 1000)
pollingTimeoutMsnumberNoPolling timeout in ms (default: 180000)

SonioxLoaderOptions (optional)

ParameterTypeDescription
modelSonioxTranscriptionModelIdModel to use (default: "stt-async-v3")
translationobjectTranslation configuration
language_hintsstring[]Language hints for transcription
language_hints_strictbooleanEnforce strict language hints
enable_speaker_diarizationbooleanEnable speaker identification
enable_language_identificationbooleanEnable language detection
contextobjectContext for improved accuracy

Browse the API reference for a full list of supported options.

Supported audio formats

  • aac - Advanced Audio Coding
  • aiff - Audio Interchange File Format
  • amr - Adaptive Multi-Rate
  • asf - Advanced Systems Format
  • flac - Free Lossless Audio Codec
  • mp3 - MPEG Audio Layer III
  • ogg - Ogg Vorbis
  • wav - Waveform Audio File Format
  • webm - WebM Audio

Return value

The load() method returns an array containing a single Document object:

Document {
  pageContent: string, // The transcribed text
  metadata: SonioxTranscriptResponse // Full transcript with metadata
}

The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.