Soniox
API referenceSpeech-to-Text

WebSocket API

Learn how to use and integrate Soniox Speech-to-Text WebSocket API.

Overview

The Soniox WebSocket API provides real-time transcription and translation of live audio with ultra-low latency. It supports advanced features like speaker diarization, context customization, and manual finalization — all over a persistent WebSocket connection. Ideal for live scenarios such as meetings, broadcasts, multilingual communication, and voice interfaces.


WebSocket endpoint

Connect to the API using:

wss://stt-rt.soniox.com/transcribe-websocket

Configuration

Before streaming audio, configure the transcription session by sending a JSON message such as:

{
  "api_key": "<SONIOX_API_KEY|SONIOX_TEMPORARY_API_KEY>",
  "model": "stt-rt-preview",
  "audio_format": "auto",
  "language_hints": ["en", "es"],
  "context": {
    "general": [
      { "key": "domain", "value": "Healthcare" },
      { "key": "topic", "value": "Diabetes management consultation" },
      { "key": "doctor", "value": "Dr. Martha Smith" },
      { "key": "patient", "value": "Mr. David Miller" },
      { "key": "organization", "value": "St John's Hospital" }
    ],
    "text": "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
    "terms": [
      "Celebrex",
      "Zyrtec",
      "Xanax",
      "Prilosec",
      "Amoxicillin Clavulanate Potassium"
    ],
    "translation_terms": [
      { "source": "Mr. Smith", "target": "Sr. Smith" },
      { "source": "St John's", "target": "St John's" },
      { "source": "stroke", "target": "ictus" }
    ]
  },
  "enable_speaker_diarization": true,
  "enable_language_identification": true,
  "translation": {
    "type": "two_way",
    "language_a": "en",
    "language_b": "es"
  }
}

Parameters

api_keyRequiredstring

Your Soniox API key. Create API keys in the Soniox Console. For client apps, generate a temporary API key from your server to keep secrets secure.

modelRequiredstring

Real-time model to use. See models.

Example: "stt-rt-preview"
audio_formatRequiredstring

Audio format of the stream. See audio formats.

num_channelsnumber

Required for raw audio formats. See audio formats.

sample_ratenumber

Required for raw audio formats. See audio formats.

language_hintsarray<string>

See language hints.

language_hints_strictbool

See language restrictions.

contextobject

See context.

enable_speaker_diarizationboolean

See speaker diarization.

enable_language_identificationboolean

See language identification.

enable_endpoint_detectionboolean

See endpoint detection.

max_endpoint_delay_msnumber

Must be between 500 and 3000. Default value is 2000. See endpoint detection.

client_reference_idstring

Optional client-defined identifier recorded with this request in usage logs. Does not need to be unique. Ignored if the request authenticates with a temporary API key.

translationobject

See real-time translation.

One-way translation

typeRequiredstring

Must be set to one_way.

target_languageRequiredstring

Language to translate the transcript into.

Two-way translation

typeRequiredstring

Must be set to two_way.

language_aRequiredstring

First language for two-way translation.

language_bRequiredstring

Second language for two-way translation.


Audio streaming

After configuration, start streaming audio:

  • Send audio as binary WebSocket frames.
  • Each stream supports up to 300 minutes of audio.

Ending the stream

To gracefully close a streaming session:

  • Send an empty WebSocket frame (binary or text).
  • The server will return one or more responses, including finished response, and then close the connection.

Response

Soniox returns responses in JSON format. A typical successful response looks like:

{
  "tokens": [
    {
      "text": "Hello",
      "start_ms": 600,
      "end_ms": 760,
      "confidence": 0.97,
      "is_final": true,
      "speaker": "1"
    }
  ],
  "final_audio_proc_ms": 760,
  "total_audio_proc_ms": 880
}

Field descriptions

tokensarray<object>

List of processed tokens (words or subwords).

Each token may include:

textstring

Token text.

start_msOptionalnumber

Start timestamp of the token (in milliseconds). Not included if translation_status is translation.

end_msOptionalnumber

End timestamp of the token (in milliseconds). Not included if translation_status is translation.

confidencenumber

Confidence score (0.01.0).

is_finalboolean

Whether the token is finalized.

speakerOptionalstring

Speaker label (if diarization enabled).

translation_statusOptionalstring

See real-time translation.

languageOptionalstring

Language of the token.text.

source_languageOptionalstring

See real-time translation.

final_audio_proc_msnumber

Audio processed into final tokens.

total_audio_proc_msnumber

Audio processed into final + non-final tokens.


Finished response

At the end of a stream, Soniox sends a final message to indicate the session is complete:

{
  "tokens": [],
  "final_audio_proc_ms": 1560,
  "total_audio_proc_ms": 1680,
  "finished": true
}

After this, the server closes the WebSocket connection.


Error response

If an error occurs, the server returns an error message and immediately closes the connection:

{
  "tokens": [],
  "error_code": 503,
  "error_type": "service_unavailable",
  "error_message": "Cannot continue request (code 11). Please restart the request. Refer to: https://soniox.com/url/cannot-continue-request (request ID 3d37a3bd-5078-47ee-a369-b204e3bbedda)",
  "more_info": "https://soniox.com/docs/api-reference/errors#service-unavailable",
  "request_id": "3d37a3bd-5078-47ee-a369-b204e3bbedda"
}
error_codenumber

Standard HTTP status code of the error.

error_typestring

Stable, machine-readable identifier of the error. Branch on this, not on error_message. See the Errors reference for the full catalog and recovery steps.

error_messagestring

Human-readable description of the error.

more_infostring

Link to the section on the Errors page describing this error_type.

request_idstring

Unique identifier of this request. Include it when contacting support@soniox.com; server logs are keyed on it.

For the full catalog of error_type values across all Soniox APIs, see the Errors reference.

Full list of possible error codes and messages:

The request is malformed or contains invalid parameters. error_type is one of invalid_request or model_not_available.

  • Audio data channels must be specified for PCM formats
  • Audio data sample rate must be specified for PCM formats
  • Audio decode error
  • Audio is too long.
  • Audio frame is not valid base64. Send audio as either a binary WebSocket frame, or a text frame containing standard base64-encoded bytes.
  • `client_reference_id` is N characters, which exceeds the maximum allowed length of 256.
  • Context is too long (max length 10000).
  • Control request body is not valid JSON.
  • Control request type is invalid. Valid values: "finalize", "keepalive".
  • Field max_non_final_tokens_duration_ms cannot be less than N.
  • Field max_non_final_tokens_duration_ms cannot be more than N.
  • Field translation.exclude_source_languages is not supported by model X.
  • Field translation.source_languages cannot be empty.
  • Field translation.source_languages for model X should be empty or it can be one element list having string '*'.
  • Invalid audio data format: avi
  • Invalid language hint.
  • Invalid language in translation.exclude_source_languages: X.
  • Invalid language in translation.language_a.
  • Invalid language in translation.language_b.
  • Invalid language in translation.source_languages: X.
  • Invalid language in translation.target_language.
  • Language hints must be unique.
  • Languages in translation.exclude_source_languages must be unique.
  • Languages in translation.source_languages must be unique.
  • Missing audio format. Specify a valid audio format (e.g. s16le, f32le, wav, ogg, flac...) or "auto" for auto format detection.
  • Model does not support language_hints_strict.
  • Model does not support max_endpoint_delay_ms.
  • Model does not support one way translation.
  • Model does not support two way translation.
  • No audio received.
  • Prompt too long for model
  • Received too much audio data in total.
  • Specified model X does not support real-time transcription. If you wish to use real-time transcription, specify model Y.
  • Start request is malformed.
  • Start request must be a text message.
  • The requested model is not available. See https://soniox.com/docs/stt/models for the list of supported models.
  • translation.language_a and translation.language_b must be different (both are X).
  • translation.language_a must be present if translation.language_b is present.
  • translation.language_b must be present if translation.language_a is present.
  • Two way translation between translation.language_a=X and translation.language_b=Y is not supported.

Authentication is missing or incorrect. Ensure a valid API key is provided before retrying. error_type: unauthenticated.

  • Incorrect API key provided. You can get an API key at https://console.soniox.com
  • Invalid or expired temporary API key. Create a new temporary API key and retry. See https://soniox.com/docs/guides/temporary-api-keys for details.
  • Missing API key. Provide API key as a header (i.e. Authorization: Bearer <SONIOX_API_KEY>). You can get an API key at https://console.soniox.com
  • The temporary API key cannot be used for this action. Each temporary API key is scoped to a specific `usage_type`; create a new key with the correct usage type.

The organization's balance or monthly usage limit has been reached. Additional credits are required before making further requests. error_type is one of organization_balance_exhausted, organization_monthly_budget_exhausted, or project_monthly_budget_exhausted.

  • Organization balance exhausted. Please either add funds manually or enable autopay.
  • Organization monthly budget exhausted. Please increase it.
  • Project monthly budget exhausted. Please increase it.

The temporary API key in use was created with a max_session_duration_seconds cap, and that duration has elapsed for the current session. error_type: temp_api_key_session_expired.

  • Temporary API key session duration limit exceeded. Create a new temporary API key to start a new session.

A backend call exceeded its deadline before completing. Retry the request. error_type: request_timeout.

  • Audio data decode timeout
  • Input too slow
  • Request timeout.
  • Start request timeout
  • Timed out while waiting for the first audio chunk

A usage or rate limit has been exceeded. You may retry after a delay or request an increase in limits via the Soniox Console. error_type: limit_exceeded.

  • Concurrent requests limit for real-time transcription has been exceeded for your organization.
  • Concurrent requests limit for real-time transcription has been exceeded for your project.
  • Requests per minute limit for real-time transcription has been exceeded for your organization.
  • Requests per minute limit for real-time transcription has been exceeded for your project.

An unexpected server-side error occurred. The request may be retried. error_type: internal_error.

  • The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our support email support@soniox.com if you keep seeing this error.

The service cannot accept the request right now (upstream overload, cache exhausted, shutdown). Retry with backoff. The numeric (code N) in the message identifies the sub-cause for support triage. error_type: service_unavailable.

  • Cannot continue request (code N). Please restart the request. Refer to: https://soniox.com/url/cannot-continue-request