Web SDK
Soniox speech-to-text-web is the official JavaScript/TypeScript SDK for using the Soniox Real-Time API directly in the browser.
Overview
Soniox speech-to-text-web is the official JavaScript/TypeScript SDK for using the Soniox Real-time API directly in the browser. It lets you:
- Capture audio from the user's microphone
- Stream audio to Soniox in real time
- Receive transcription and translation results instantly
Enable advanced features such as language identification, speaker diarization, context, endpoint detection, and more.
👉 Use cases: live captions, multilingual meetings, dictation tools, accessibility overlays, customer support dashboards, education apps.
Installation
Install via your preferred package manager:
Or use the module directly from a CDN:
Quickstart
Use SonioxClient
to start session:
The SonioxClient
object processes audio from the user's microphone or a custom audio stream. It returns results by invoking the onPartialResult
callback with transcription and translation data, depending on the configuration.
Stop or cancel transcription:
Translation
To enable real-time translation, you can add a TranslationConfig
object to the parameters of the start
method.
stop()
vs cancel()
The key difference is that stop()
gracefully waits for the server to process all buffered audio and send back final results. In contrast, cancel()
terminates the session immediately without waiting.
For example, when a user clicks a "Stop Recording" button, you should call stop()
. If you need to discard the session immediately (e.g., when a component unmounts in a web framework), call cancel()
.
Buffering and temporary API keys
If you want to avoid exposing your API key to the client, you can use temporary API keys. To generate a temporary API key, you can use temporary API key endpoint in the Soniox API.
If you want to fetch a temporary API key only when recording starts, you can pass a function to the apiKey
option. The function will be called when the recording starts and should return the API key.
Until this function resolves and returns an API key, audio data is buffered in memory. When the temporary API key is fetched, the buffered audio data will be sent to the server and the processing will start.
For a full example with temporary API key generation, check the NextJS Example.
Custom audio streams
To transcribe audio from a custom source, you can pass a custom MediaStream
to the stream
option.
If you provide a custom MediaStream
to the stream
option, you are responsible for managing its lifecycle, including starting and stopping the stream. For instance, when using an HTML5 <audio>
element (as shown below), you may want to pause playback when transcription is complete or an error occurs.
Examples
Simple transcription example in vanilla JavaScript.
Transcription and translation example with temporary API key generation.
A complete example rendering speaker tags, detected languages, and translations.
API Reference
SonioxClient
constructor(options)
Creates a new SonioxClient
instance.
Parameters:
apiKey
Requiredstring | functionSoniox API key or an async function that returns the API key (see Buffering and temporary API keys).
bufferQueueSize
numberMaximum number of audio chunks to buffer in memory before the WebSocket connection is established. If this limit is exceeded, an error will be thrown.
onStarted
functionCalled when the transcription starts. This happens after the API key is fetched and WebSocket connection is established.
onFinished
functionCalled when the transcription finishes successfully. After calling stop()
, you should wait for this callback to ensure all final results have been received.
onPartialResult
functionCalled when the transcription returns partial results. The result contains a list recognized tokens
. To learn more about the tokens
structure, see Speech-to-Text Websocket API reference.
onStateChange
functionCalled when the state of the transcription changes. Useful for rerendering the UI based on the state.
onError
functionCalled when the transcription encounters an error. Possible error statuses are:
get_user_media_failed
: If the user denies the permission to use the microphone or the browser does not support audio recording.api_key_fetch_failed
: In case you passed a function toapiKey
option and the function throws an error.queue_limit_exceeded
: While waiting for the temporary API key to be fetched, the local queue is full. You can increase the queue size by settingbufferQueueSize
option.media_recorder_error
: An error occurred while recording the audio.api_error
: Error returned by the Soniox API. In this case, theerrorCode
property contains the HTTP status code equivalent to the error. For a list of possible error codes, see Speech-to-Text Websocket API reference.websocket_error
: WebSocket error.
start(audioOptions)
Starts transcription or translation.
All callbacks which can be passed to SonioxClient
constructor are also available in start
method. In addition, the following parameters are available:
model
RequiredstringReal-time model to use. See models.
audioFormat
stringAudio format to use. Using auto
should be sufficient for microphone streams in all modern browsers.
If using custom audio streams, see audio formats.
numChannels
numberRequired for raw audio formats. See audio formats.
sampleRate
numberRequired for raw audio formats. See audio formats.
languageHints
array<string>See language hints.
context
stringSee context.
enableSpeakerDiarization
booleanSee speaker diarization.
enableLanguageIdentification
booleanenableEndpointDetection
booleanSee endpoint detection.
clientReferenceId
stringOptional identifier to track this request (client-defined).
translation
objectTranslation configuration. See real-time translation
One-way translation
type
RequiredstringMust be set to one_way
.
target_language
RequiredstringLanguage to translate the transcript into.
Two-way translation
type
RequiredstringMust be set to two_way
.
language_a
RequiredstringFirst language for two-way translation.
language_b
RequiredstringSecond language for two-way translation.
stream
MediaStreamIf you don't want to transcribe audio from microphone, you can pass a MediaStream
to the stream option. This can be useful if you want to transcribe audio from a file or a custom source.
audioConstraints
objectCan be used to set the properties, such as echoCancellation
and noiseSuppression
properties of the MediaTrackConstraints
object. See MDN docs for MediaTrackConstraints.
mediaRecorderOptions
objectMediaRecorder options. See MDN docs for MediaRecorder.
stop()
Gracefully stops transcription, waiting for the server to process all audio and return final results. For a detailed comparison, see the stop() vs cancel() section.
cancel()
Immediately terminates the transcription and closes all resources without waiting for final results. For a detailed comparison, see the stop() vs cancel() section.
finalize()
Trigger manual finalization. See manual finalization.