Soniox
Docs
SDKs

Web library

How to use Soniox Speech-to-Text Web Library to transcribe microphone audio in your web application.

Transcribe audio directly in your web application

Transcribing audio in a web application is a common use case — whether you're building live captioning, searchable audio interfaces, or voice-powered tools. To make this easy, Soniox provides a lightweight Web SDK that allows you to stream audio from the browser and receive real-time transcriptions with minimal setup.

The Soniox Web Library handles:

  • Capturing audio from the user's microphone
  • Streaming it to the Soniox WebSocket API
  • Receiving and displaying transcription results in real time
  • Additional optional features like speaker diarization

The library is framework-agnostic and works with plain JavaScript, as well as modern frontend frameworks like React or Vue.


Installation

Install via your preferred package manager:

npm install @soniox/speech-to-text-web

Or use the module directly from a CDN:

<script type="module">
  import { RecordTranscribe } from 'https://unpkg.com/@soniox/speech-to-text-web?module';
 
  var recordTranscribe = new RecordTranscribe({ ... })
  ...
</script>

Starting the transcription

To transcribe microphone audio, create an instance of the RecordTranscribe class and call the start() method.

import { RecordTranscribe } from '@soniox/speech-to-text-web';
 
const recordTranscribe = new RecordTranscribe({
  apiKey: '<SONIOX_API_KEY|TEMPORARY_API_KEY>',
});
 
recordTranscribe.start({
  model: 'stt-rt-preview',
  onPartialResult: (result) => {
    console.log(result.words);
  },
  onError: (status, message) => {
    console.error(status, message);
  },
});

Parameters

apiKeyRequiredstring | function

Static SONIOX_API_KEY string or async function that returns a temporary API key.

modelRequiredstring

The transcription model to use. Example: "stt-rt-preview".
Use the GET /models endpoint to retrieve a list of available models.

languageHintsOptionalArray<string>

Hints to guide transcription toward specific languages.
See supported languages for list of available ISO language codes.

contextOptionalstring

Provide domain-specific terms or phrases to improve recognition accuracy.
Max length: 10,000 characters.

enableSpeakerDiarizationOptionalboolean

Enables automatic speaker separation.

onStartedOptionalfunction

Called on transcription start.

onFinishedOptionalfunction

Called on transcription finish.

onPartialResultOptionalfunction

Called when partial results are received.

onFinalResultOptionalfunction

Called when a final result is received.

onErrorOptionalfunction

Called when an error occurs.

streamOptionalMediaStream

Provide a custom audio stream source.


Stopping the transcription

Use stop() for graceful exits and cancel() for abrupt stops, e.g. on component unmount.

recordTranscribe.stop();   // Waits for final results
recordTranscribe.cancel(); // Stops immediately

Using temporary API keys

You can defer API key generation until after the user initiates transcription:

const recordTranscribe = new RecordTranscribe({
  apiKey: async () => {
    const res = await fetch('/api/get-temporary-api-key', { method: 'POST' });
    const { apiKey } = await res.json();
    return apiKey;
  },
});

Buffered audio ensures no loss during WebSocket connection setup.


Event callbacks

Callbacks can be passed to either the constructor or the start method:

// Constructor-level
new RecordTranscribe({
  onPartialResult: (result) => console.log(result.words),
});
 
// Method-level
recordTranscribe.start({
  onPartialResult: (result) => console.log(result.words),
});

View full list of supported callbacks in the Github README.


Transcribing custom audio streams

To transcribe audio from sources like an <audio> or <video> element:

const audioElement = new Audio('https://example.com/audio.mp3');
audioElement.crossOrigin = 'anonymous';
 
const audioCtx = new AudioContext();
const source = audioCtx.createMediaElementSource(audioElement);
const destination = audioCtx.createMediaStreamDestination();
 
source.connect(destination);
source.connect(audioCtx.destination);
 
recordTranscribe.start({
  model: 'stt-rt-preview',
  stream: destination.stream,
  onFinished: () => audioElement.pause(),
});
 
audioElement.play();

You are responsible for managing the audio stream lifecycle.


Examples

On this page