Web SDK
Soniox speech-to-text-web is the official JavaScript/TypeScript SDK for using the Soniox Real-Time API directly in the browser.
Overview
Soniox speech-to-text-web is the official JavaScript/TypeScript SDK for using the Soniox Real-time API directly in the browser. It lets you:
- Capture audio from the user's microphone
- Stream audio to Soniox in real time
- Receive transcription and translation results instantly
Enable advanced features such as language identification, speaker diarization, context, endpoint detection, and more.
👉 Use cases: live captions, multilingual meetings, dictation tools, accessibility overlays, customer support dashboards, education apps.
Installation
Install via your preferred package manager:
Or use the module directly from a CDN:
Quickstart
Use SonioxClient to start session:
The SonioxClient object processes audio from the user's microphone or a custom audio stream. It returns results by invoking the onPartialResult callback with transcription and translation data, depending on the configuration.
Stop or cancel transcription:
Translation
To enable real-time translation, you can add a TranslationConfig object to the parameters of the start method.
stop() vs cancel()
The key difference is that stop() gracefully waits for the server to process all buffered audio and send back final results. In contrast, cancel() terminates the session immediately without waiting.
For example, when a user clicks a "Stop Recording" button, you should call stop(). If you need to discard the session immediately (e.g., when a component unmounts in a web framework), call cancel().
Buffering and temporary API keys
If you want to avoid exposing your API key to the client, you can use temporary API keys. To generate a temporary API key, you can use temporary API key endpoint in the Soniox API.
If you want to fetch a temporary API key only when recording starts, you can pass a function to the apiKey option. The function will be called when the recording starts and should return the API key.
Until this function resolves and returns an API key, audio data is buffered in memory. When the temporary API key is fetched, the buffered audio data will be sent to the server and the processing will start.
For a full example with temporary API key generation, check the NextJS Example.
Custom audio streams
To transcribe audio from a custom source, you can pass a custom MediaStream to the stream option.
If you provide a custom MediaStream to the stream option, you are responsible for managing its lifecycle, including starting and stopping the stream. For instance, when using an HTML5 <audio> element (as shown below), you may want to pause playback when transcription is complete or an error occurs.
Examples
Simple transcription example in vanilla JavaScript.
Transcription and translation example with temporary API key generation.
A complete example rendering speaker tags, detected languages, and translations.
API Reference
SonioxClient
constructor(options)
Creates a new SonioxClient instance.
Parameters:
apiKeyRequiredstring | functionSoniox API key or an async function that returns the API key (see Buffering and temporary API keys).
bufferQueueSizenumberMaximum number of audio chunks to buffer in memory before the WebSocket connection is established. If this limit is exceeded, an error will be thrown.
onStartedfunctionCalled when the transcription starts. This happens after the API key is fetched and WebSocket connection is established.
onFinishedfunctionCalled when the transcription finishes successfully. After calling stop(), you should wait for this callback to ensure all final results have been received.
onPartialResultfunctionCalled when the transcription returns partial results. The result contains a list recognized tokens. To learn more about the tokens structure, see Speech-to-Text Websocket API reference.
onStateChangefunctionCalled when the state of the transcription changes. Useful for rerendering the UI based on the state.
onErrorfunctionCalled when the transcription encounters an error. Possible error statuses are:
get_user_media_failed: If the user denies the permission to use the microphone or the browser does not support audio recording.api_key_fetch_failed: In case you passed a function toapiKeyoption and the function throws an error.queue_limit_exceeded: While waiting for the temporary API key to be fetched, the local queue is full. You can increase the queue size by settingbufferQueueSizeoption.media_recorder_error: An error occurred while recording the audio.api_error: Error returned by the Soniox API. In this case, theerrorCodeproperty contains the HTTP status code equivalent to the error. For a list of possible error codes, see Speech-to-Text Websocket API reference.websocket_error: WebSocket error.
start(audioOptions)
Starts transcription or translation.
All callbacks which can be passed to SonioxClient constructor are also available in start method. In addition, the following parameters are available:
modelRequiredstringReal-time model to use. See models.
audioFormatstringAudio format to use. Using auto should be sufficient for microphone streams in all modern browsers.
If using custom audio streams, see audio formats.
numChannelsnumberRequired for raw audio formats. See audio formats.
sampleRatenumberRequired for raw audio formats. See audio formats.
languageHintsarray<string>See language hints.
contextstringSee context.
enableSpeakerDiarizationbooleanSee speaker diarization.
enableLanguageIdentificationbooleanenableEndpointDetectionbooleanSee endpoint detection.
clientReferenceIdstringOptional identifier to track this request (client-defined).
translationobjectTranslation configuration. See real-time translation
One-way translation
typeRequiredstringMust be set to one_way.
target_languageRequiredstringLanguage to translate the transcript into.
Two-way translation
typeRequiredstringMust be set to two_way.
language_aRequiredstringFirst language for two-way translation.
language_bRequiredstringSecond language for two-way translation.
streamMediaStreamIf you don't want to transcribe audio from microphone, you can pass a MediaStream to the stream option. This can be useful if you want to transcribe audio from a file or a custom source.
audioConstraintsobjectCan be used to set the properties, such as echoCancellation and noiseSuppression properties of the MediaTrackConstraints object. See MDN docs for MediaTrackConstraints.
mediaRecorderOptionsobjectMediaRecorder options. See MDN docs for MediaRecorder.
stop()
Gracefully stops transcription, waiting for the server to process all audio and return final results. For a detailed comparison, see the stop() vs cancel() section.
cancel()
Immediately terminates the transcription and closes all resources without waiting for final results. For a detailed comparison, see the stop() vs cancel() section.
finalize()
Trigger manual finalization. See manual finalization.