Soniox
Docs

Web library

Use the Soniox Speech-to-Text Web Library to transcribe microphone audio in your web application.

Transcribing audio in a web application is a common use case, which is why Soniox provides a web library that allows you to easily transcribe audio in your web app. Check out the GitHub repository for source code, examples, and more.

Examples

Quick start

Install the library by running:

npm install @soniox/speech-to-text-web

If you prefer not to install the package, you can use unpkg to include it directly in your HTML file:

<script type="module">
  import { RecordTranscribe } from 'https://unpkg.com/@soniox/speech-to-text-web?module';
 
  var recordTranscribe = new RecordTranscribe({ ... })
  ...
</script>

Initializing RecordTranscribe

To start transcribing audio from a microphone, create an instance of RecordTranscribe with an API key:

import { RecordTranscribe } from '@soniox/speech-to-text-web';
 
const recordTranscribe = new RecordTranscribe({
  apiKey: '<SONIOX_API_KEY>',
});

Exposing the API key to the client is not a good practice. Instead, generate a temporary API key on your backend and use it to authenticate the WebSocket connection.

Starting the transcription

To start transcribing microphone audio:

recordTranscribe.start({
  model: 'stt-rt-preview',
 
  onError: (status, message) => {
    console.error(status, message);
  },
  onPartialResult: (result) => {
    console.log('Partial result:', result.words);
  },
});

Stopping the transcription

To stop the transcription, call stop() or cancel().

recordTranscribe.stop();  // Waits for final results
// or
recordTranscribe.cancel();  // Stops immediately`

Calling stop() waits until all final results are received before stopping, while cancel() immediately stops transcription without waiting for final results. We suggest using stop() when a user manually stops transcription and cancel() when an immediate stop is needed (e.g., on component unmount).

Calling stop() ensures all final results are received before stopping, whereas cancel() stops transcription immediately without waiting for final results. We recommend using stop() when a user manually stops transcription and cancel() when an immediate stop is needed (e.g., when a component unmounts).

Buffering and temporary API keys

If you generate a temporary API key when the user clicks "Start Transcription," you may not receive the key immediately. The Soniox Speech-to-Text Web Library buffers recorded audio in memory until the WebSocket connection is established. This allows recording to start as soon as the user clicks the button without needing to generate a temporary API key in advance.

To achieve this, pass an ApiKeyGetter function to the RecordTranscribe constructor instead of a static API key string:

const recordTranscribe = new RecordTranscribe({
  apiKey: async () => {
    const response = await fetch('/api/get-temporary-api-key', { method: 'POST' });
    const { apiKey } = await response.json();
    return apiKey;
  },
});

Callbacks

You can provide callbacks to handle different events during transcription.

Callbacks can be passed to either the RecordTranscribe constructor or the start() method.

// Callbacks passed to the constructor
const recordTranscribe = new RecordTranscribe({
  ...
  onPartialResult: (result) => {
    console.log('Partial result:', result.words);
  },
  ...
});
 
// Or callbacks passed to the start() method
recordTranscribe.start({
  ...
  onPartialResult: (result) => {
    console.log('Partial result:', result.words);
  },
  ...
});

For a list of all available callbacks with their description, check the documentation in the Github repository.

Custom audio streams

To transcribe audio from a custom source, you can pass a custom MediaStream to the stream option.

If you provide a custom MediaStream to the stream option, you are responsible for managing its lifecycle, including starting and stopping the stream. For instance, when using an HTML5 <audio> element (as shown below), you may want to pause playback when transcription is complete or an error occurs.

Example of transcribing audio from an HTML5 <audio> element:

// Create a new audio element
const audioElement = new Audio();
audioElement.volume = 1;
audioElement.crossOrigin = 'anonymous';
audioElement.src = 'https://soniox.com/media/examples/coffee_shop.mp3';
 
// Create a media stream from the audio element
const audioContext = new AudioContext();
const source = audioContext.createMediaElementSource(audioElement);
const destination = audioContext.createMediaStreamDestination();
source.connect(destination); // Connect to media stream
source.connect(audioContext.destination); // Connect to playback
 
// Start transcription
recordTranscribe.start({
  model: 'stt-rt-preview',
  stream: destination.stream,
 
  onFinished: () => {
    audioElement.pause();
  },
  onError: (status, message) => {
    audioElement.pause();
  },
});
 
// Play the audio element to activate the stream
audioElement.play();

Complete example

Here's a complete example of transcribing microphone audio in a single HTML file.

<!doctype html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Soniox Speech-to-Text Example</title>
  </head>
  <body>
    <div>
      <button id="start-button">Start</button>
      <button id="stop-button">Stop</button>
      <button id="cancel-button">Cancel</button>
      <br />
      <span id="transcript"></span>
      <br />
    </div>
 
    <script type="module">
      import { RecordTranscribe } from 'https://unpkg.com/@soniox/speech-to-text-web?module';
 
      var recordTranscribe = new RecordTranscribe({
        // Enter your API key here
        apiKey: '<API KEY>',
      });
 
      const transcript = document.getElementById('transcript');
 
      document.getElementById('start-button').onclick = () => {
        recordTranscribe?.cancel();
        transcript.textContent = '';
 
        recordTranscribe.start({
          model: 'stt-rt-preview',
 
          onStarted: () => {
            console.log('Transcribe started');
          },
          onPartialResult: (result) => {
            transcript.textContent += result.text;
          },
          onFinished: () => {
            console.log('Transcribe finished');
          },
          onError: (status, message) => {
            console.log('Error occurred', status, message);
          },
        });
      };
 
      document.getElementById('stop-button').onclick = function () {
        recordTranscribe?.stop();
      };
      document.getElementById('cancel-button').onclick = function () {
        recordTranscribe?.cancel();
      };
    </script>
  </body>
</html>

You can view all examples in the GitHub repository.

On this page