WebSocket API
Learn how to use and integrate Soniox Speech-to-Text WebSocket API.
Overview
The Soniox WebSocket API provides real-time transcription and translation of live audio with ultra-low latency. It supports advanced features like speaker diarization, context customization, and manual finalization — all over a persistent WebSocket connection. Ideal for live scenarios such as meetings, broadcasts, multilingual communication, and voice interfaces.
WebSocket endpoint
Connect to the API using:
Configuration
Before streaming audio, configure the transcription session by sending a JSON message such as:
Parameters
api_keyRequiredstringYour Soniox API key. Create API keys in the Soniox Console. For client apps, generate a temporary API key from your server to keep secrets secure.
audio_formatRequiredstringAudio format of the stream. See audio formats.
num_channelsnumberRequired for raw audio formats. See audio formats.
sample_ratenumberRequired for raw audio formats. See audio formats.
language_hintsarray<string>See language hints.
language_hints_strictboolcontextobjectSee context.
enable_speaker_diarizationbooleanSee speaker diarization.
enable_language_identificationbooleanenable_endpoint_detectionbooleanSee endpoint detection.
max_endpoint_delay_msnumberMust be between 500 and 3000. Default value is 2000. See endpoint detection.
client_reference_idstringOptional client-defined identifier recorded with this request in usage logs. Does not need to be unique. Ignored if the request authenticates with a temporary API key.
translationobjectOne-way translation
typeRequiredstringMust be set to one_way.
target_languageRequiredstringLanguage to translate the transcript into.
Two-way translation
typeRequiredstringMust be set to two_way.
language_aRequiredstringFirst language for two-way translation.
language_bRequiredstringSecond language for two-way translation.
Audio streaming
After configuration, start streaming audio:
- Send audio as binary WebSocket frames.
- Each stream supports up to 300 minutes of audio.
Ending the stream
To gracefully close a streaming session:
- Send an empty WebSocket frame (binary or text).
- The server will return one or more responses, including finished response, and then close the connection.
Response
Soniox returns responses in JSON format. A typical successful response looks like:
Field descriptions
tokensarray<object>List of processed tokens (words or subwords).
Each token may include:
textstringToken text.
start_msOptionalnumberStart timestamp of the token (in milliseconds). Not included if translation_status is translation.
end_msOptionalnumberEnd timestamp of the token (in milliseconds). Not included if translation_status is translation.
confidencenumberConfidence score (0.0–1.0).
is_finalbooleanWhether the token is finalized.
speakerOptionalstringSpeaker label (if diarization enabled).
translation_statusOptionalstringlanguageOptionalstringLanguage of the token.text.
source_languageOptionalstringfinal_audio_proc_msnumberAudio processed into final tokens.
total_audio_proc_msnumberAudio processed into final + non-final tokens.
Finished response
At the end of a stream, Soniox sends a final message to indicate the session is complete:
After this, the server closes the WebSocket connection.
Error response
If an error occurs, the server returns an error message and immediately closes the connection:
error_codenumberStandard HTTP status code of the error.
error_typestringStable, machine-readable identifier of the error. Branch on this, not on
error_message. See the Errors reference for the
full catalog and recovery steps.
error_messagestringHuman-readable description of the error.
more_infostringLink to the section on the Errors page describing
this error_type.
request_idstringUnique identifier of this request. Include it when contacting support@soniox.com; server logs are keyed on it.
For the full catalog of error_type values across all Soniox APIs, see the Errors reference.
Full list of possible error codes and messages:
The request is malformed or contains invalid parameters.
error_type is one of
invalid_request
or model_not_available.
Audio data channels must be specified for PCM formatsAudio data sample rate must be specified for PCM formatsAudio decode errorAudio is too long.Audio frame is not valid base64. Send audio as either a binary WebSocket frame, or a text frame containing standard base64-encoded bytes.`client_reference_id` is N characters, which exceeds the maximum allowed length of 256.Context is too long (max length 10000).Control request body is not valid JSON.Control request type is invalid. Valid values: "finalize", "keepalive".Field max_non_final_tokens_duration_ms cannot be less than N.Field max_non_final_tokens_duration_ms cannot be more than N.Field translation.exclude_source_languages is not supported by model X.Field translation.source_languages cannot be empty.Field translation.source_languages for model X should be empty or it can be one element list having string '*'.Invalid audio data format: aviInvalid language hint.Invalid language in translation.exclude_source_languages: X.Invalid language in translation.language_a.Invalid language in translation.language_b.Invalid language in translation.source_languages: X.Invalid language in translation.target_language.Language hints must be unique.Languages in translation.exclude_source_languages must be unique.Languages in translation.source_languages must be unique.Missing audio format. Specify a valid audio format (e.g. s16le, f32le, wav, ogg, flac...) or "auto" for auto format detection.Model does not support language_hints_strict.Model does not support max_endpoint_delay_ms.Model does not support one way translation.Model does not support two way translation.No audio received.Prompt too long for modelReceived too much audio data in total.Specified model X does not support real-time transcription. If you wish to use real-time transcription, specify model Y.Start request is malformed.Start request must be a text message.The requested model is not available. See https://soniox.com/docs/stt/models for the list of supported models.translation.language_a and translation.language_b must be different (both are X).translation.language_a must be present if translation.language_b is present.translation.language_b must be present if translation.language_a is present.Two way translation between translation.language_a=X and translation.language_b=Y is not supported.
Authentication is missing or incorrect. Ensure a valid API key is provided before retrying.
error_type: unauthenticated.
Incorrect API key provided. You can get an API key at https://console.soniox.comInvalid or expired temporary API key. Create a new temporary API key and retry. See https://soniox.com/docs/guides/temporary-api-keys for details.Missing API key. Provide API key as a header (i.e. Authorization: Bearer <SONIOX_API_KEY>). You can get an API key at https://console.soniox.comThe temporary API key cannot be used for this action. Each temporary API key is scoped to a specific `usage_type`; create a new key with the correct usage type.
The organization's balance or monthly usage limit has been reached.
Additional credits are required before making further requests.
error_type is one of
organization_balance_exhausted,
organization_monthly_budget_exhausted,
or project_monthly_budget_exhausted.
Organization balance exhausted. Please either add funds manually or enable autopay.Organization monthly budget exhausted. Please increase it.Project monthly budget exhausted. Please increase it.
The temporary API key in use was created with a max_session_duration_seconds cap,
and that duration has elapsed for the current session.
error_type: temp_api_key_session_expired.
Temporary API key session duration limit exceeded. Create a new temporary API key to start a new session.
A backend call exceeded its deadline before completing. Retry the request.
error_type: request_timeout.
Audio data decode timeoutInput too slowRequest timeout.Start request timeoutTimed out while waiting for the first audio chunk
A usage or rate limit has been exceeded. You may retry after a delay or request
an increase in limits via the Soniox Console.
error_type: limit_exceeded.
Concurrent requests limit for real-time transcription has been exceeded for your organization.Concurrent requests limit for real-time transcription has been exceeded for your project.Requests per minute limit for real-time transcription has been exceeded for your organization.Requests per minute limit for real-time transcription has been exceeded for your project.
An unexpected server-side error occurred. The request may be retried.
error_type: internal_error.
The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our support email support@soniox.com if you keep seeing this error.
The service cannot accept the request right now (upstream overload, cache exhausted, shutdown).
Retry with backoff. The numeric (code N) in the message identifies the sub-cause for support triage.
error_type: service_unavailable.
Cannot continue request (code N). Please restart the request. Refer to: https://soniox.com/url/cannot-continue-request