WebSocket API
Learn how to use and integrate Soniox Speech-to-Text WebSocket API.
Overview
The Soniox WebSocket API provides real-time transcription and translation of live audio with ultra-low latency. It supports advanced features like speaker diarization, context customization, and manual finalization — all over a persistent WebSocket connection. Ideal for live scenarios such as meetings, broadcasts, multilingual communication, and voice interfaces.
WebSocket endpoint
Connect to the API using:
Configuration
Before streaming audio, configure the transcription session by sending a JSON message such as:
Parameters
api_keyRequiredstringYour Soniox API key. Create API keys in the Soniox Console. For client apps, generate a temporary API key from your server to keep secrets secure.
audio_formatRequiredstringAudio format of the stream. See audio formats.
num_channelsnumberRequired for raw audio formats. See audio formats.
sample_ratenumberRequired for raw audio formats. See audio formats.
language_hintsarray<string>See language hints.
language_hints_strictboolcontextobjectSee context.
enable_speaker_diarizationbooleanSee speaker diarization.
enable_language_identificationbooleanenable_endpoint_detectionbooleanSee endpoint detection.
max_endpoint_delay_msnumberMust be between 500 and 3000. Default value is 2000. See endpoint detection.
client_reference_idstringOptional identifier to track this request (client-defined).
translationobjectOne-way translation
typeRequiredstringMust be set to one_way.
target_languageRequiredstringLanguage to translate the transcript into.
Two-way translation
typeRequiredstringMust be set to two_way.
language_aRequiredstringFirst language for two-way translation.
language_bRequiredstringSecond language for two-way translation.
Audio streaming
After configuration, start streaming audio:
- Send audio as binary WebSocket frames.
- Each stream supports up to 300 minutes of audio.
Ending the stream
To gracefully close a streaming session:
- Send an empty WebSocket frame (binary or text).
- The server will return one or more responses, including finished response, and then close the connection.
Response
Soniox returns responses in JSON format. A typical successful response looks like:
Field descriptions
tokensarray<object>List of processed tokens (words or subwords).
Each token may include:
textstringToken text.
start_msOptionalnumberStart timestamp of the token (in milliseconds). Not included if translation_status is translation.
end_msOptionalnumberEnd timestamp of the token (in milliseconds). Not included if translation_status is translation.
confidencenumberConfidence score (0.0–1.0).
is_finalbooleanWhether the token is finalized.
speakerOptionalstringSpeaker label (if diarization enabled).
translation_statusOptionalstringlanguageOptionalstringLanguage of the token.text.
source_languageOptionalstringfinal_audio_proc_msnumberAudio processed into final tokens.
total_audio_proc_msnumberAudio processed into final + non-final tokens.
Finished response
At the end of a stream, Soniox sends a final message to indicate the session is complete:
After this, the server closes the WebSocket connection.
Error response
If an error occurs, the server returns an error message and immediately closes the connection:
error_codenumberStandard HTTP status code.
error_messagestringA description of the error encountered.
Full list of possible error codes and messages:
The request is malformed or contains invalid parameters.
Audio data channels must be specified for PCM formatsAudio data sample rate must be specified for PCM formatsAudio decode errorAudio is too long.Client reference ID is too long (max length 256)Context is too long (max length 10000).Control request invalid type.Control request is malformed.Invalid audio data format: aviInvalid base64.Invalid language hint.Invalid model specified.Invalid translation target language.Language hints must be unique.Missing audio format. Specify a valid audio format (e.g. s16le, f32le, wav, ogg, flac...) or "auto" for auto format detection.Model does not support translations.No audio received.Prompt too long for modelReceived too much audio data in total.Start request is malformed.Start request must be a text message.
Authentication is missing or incorrect. Ensure a valid API key is provided before retrying.
Invalid API key.Invalid/expired temporary API key.Missing API key.
The organization's balance or monthly usage limit has been reached. Additional credits are required before making further requests.
Organization balance exhausted. Please either add funds manually or enable autopay.Organization monthly budget exhausted. Please increase it.Project monthly budget exhausted. Please increase it.
The client did not send a start message or sufficient audio data within the required timeframe. The connection was closed due to inactivity.
Audio data decode timeoutInput too slowRequest timeout.Start request timeoutTimed out while waiting for the first audio chunk
A usage or rate limit has been exceeded. You may retry after a delay or request an increase in limits via the Soniox Console.
Rate limit for your organization has been exceeded.Rate limit for your project has been exceeded.Your organization has exceeded max number of concurrent requests.Your project has exceeded max number of concurrent requests.
An unexpected server-side error occurred. The request may be retried.
The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our support email support@soniox.com if you keep seeing this error.
Cannot continue request or accept new requests.
Cannot continue request (code N). Please restart the request. Refer to: https://soniox.com/url/cannot-continue-request