Soniox

WebSocket API

Learn how to use and integrate Soniox Speech-to-Text WebSocket API.

Overview

The Soniox WebSocket API provides real-time transcription and translation of live audio with ultra-low latency. It supports advanced features like speaker diarization, context customization, and manual finalization — all over a persistent WebSocket connection. Ideal for live scenarios such as meetings, broadcasts, multilingual communication, and voice interfaces.


WebSocket endpoint

Connect to the API using:

wss://stt-rt.soniox.com/transcribe-websocket

Configuration

Before streaming audio, configure the transcription session by sending a JSON message such as:

{
  "api_key": "<SONIOX_API_KEY|SONIOX_TEMPORARY_API_KEY>",
  "model": "stt-rt-preview",
  "audio_format": "auto",
  "language_hints": ["en", "es"],
  "context": {
    "general": [
      { "key": "domain", "value": "Healthcare" },
      { "key": "topic", "value": "Diabetes management consultation" },
      { "key": "doctor", "value": "Dr. Martha Smith" },
      { "key": "patient", "value": "Mr. David Miller" },
      { "key": "organization", "value": "St John's Hospital" }
    ],
    "text": "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
    "terms": [
      "Celebrex",
      "Zyrtec",
      "Xanax",
      "Prilosec",
      "Amoxicillin Clavulanate Potassium"
    ],
    "translation_terms": [
      { "source": "Mr. Smith", "target": "Sr. Smith" },
      { "source": "St John's", "target": "St John's" },
      { "source": "stroke", "target": "ictus" }
    ]
  },
  "enable_speaker_diarization": true,
  "enable_language_identification": true,
  "translation": {
    "type": "two_way",
    "language_a": "en",
    "language_b": "es"
  }
}

Parameters

api_keyRequiredstring

Your Soniox API key. Create API keys in the Soniox Console. For client apps, generate a temporary API key from your server to keep secrets secure.

modelRequiredstring

Real-time model to use. See models.

Example: "stt-rt-preview"
audio_formatRequiredstring

Audio format of the stream. See audio formats.

num_channelsnumber

Required for raw audio formats. See audio formats.

sample_ratenumber

Required for raw audio formats. See audio formats.

language_hintsarray<string>

See language hints.

language_hints_strictbool

See language restrictions.

contextobject

See context.

enable_speaker_diarizationboolean

See speaker diarization.

enable_language_identificationboolean

See language identification.

enable_endpoint_detectionboolean

See endpoint detection.

max_endpoint_delay_msnumber

Must be between 500 and 3000. Default value is 2000. See endpoint detection.

client_reference_idstring

Optional identifier to track this request (client-defined).

translationobject

See real-time translation.

One-way translation

typeRequiredstring

Must be set to one_way.

target_languageRequiredstring

Language to translate the transcript into.

Two-way translation

typeRequiredstring

Must be set to two_way.

language_aRequiredstring

First language for two-way translation.

language_bRequiredstring

Second language for two-way translation.


Audio streaming

After configuration, start streaming audio:

  • Send audio as binary WebSocket frames.
  • Each stream supports up to 300 minutes of audio.

Ending the stream

To gracefully close a streaming session:

  • Send an empty WebSocket frame (binary or text).
  • The server will return one or more responses, including finished response, and then close the connection.

Response

Soniox returns responses in JSON format. A typical successful response looks like:

{
  "tokens": [
    {
      "text": "Hello",
      "start_ms": 600,
      "end_ms": 760,
      "confidence": 0.97,
      "is_final": true,
      "speaker": "1"
    }
  ],
  "final_audio_proc_ms": 760,
  "total_audio_proc_ms": 880
}

Field descriptions

tokensarray<object>

List of processed tokens (words or subwords).

Each token may include:

textstring

Token text.

start_msOptionalnumber

Start timestamp of the token (in milliseconds). Not included if translation_status is translation.

end_msOptionalnumber

End timestamp of the token (in milliseconds). Not included if translation_status is translation.

confidencenumber

Confidence score (0.01.0).

is_finalboolean

Whether the token is finalized.

speakerOptionalstring

Speaker label (if diarization enabled).

translation_statusOptionalstring

See real-time translation.

languageOptionalstring

Language of the token.text.

source_languageOptionalstring

See real-time translation.

final_audio_proc_msnumber

Audio processed into final tokens.

total_audio_proc_msnumber

Audio processed into final + non-final tokens.


Finished response

At the end of a stream, Soniox sends a final message to indicate the session is complete:

{
  "tokens": [],
  "final_audio_proc_ms": 1560,
  "total_audio_proc_ms": 1680,
  "finished": true
}

After this, the server closes the WebSocket connection.


Error response

If an error occurs, the server returns an error message and immediately closes the connection:

{
  "tokens": [],
  "error_code": 503,
  "error_message": "Cannot continue request (code N). Please restart the request. ..."
}
error_codenumber

Standard HTTP status code.

error_messagestring

A description of the error encountered.

Full list of possible error codes and messages:

The request is malformed or contains invalid parameters.

  • Audio data channels must be specified for PCM formats
  • Audio data sample rate must be specified for PCM formats
  • Audio decode error
  • Audio is too long.
  • Client reference ID is too long (max length 256)
  • Context is too long (max length 10000).
  • Control request invalid type.
  • Control request is malformed.
  • Invalid audio data format: avi
  • Invalid base64.
  • Invalid language hint.
  • Invalid model specified.
  • Invalid translation target language.
  • Language hints must be unique.
  • Missing audio format. Specify a valid audio format (e.g. s16le, f32le, wav, ogg, flac...) or "auto" for auto format detection.
  • Model does not support translations.
  • No audio received.
  • Prompt too long for model
  • Received too much audio data in total.
  • Start request is malformed.
  • Start request must be a text message.

Authentication is missing or incorrect. Ensure a valid API key is provided before retrying.

  • Invalid API key.
  • Invalid/expired temporary API key.
  • Missing API key.

The organization's balance or monthly usage limit has been reached. Additional credits are required before making further requests.

  • Organization balance exhausted. Please either add funds manually or enable autopay.
  • Organization monthly budget exhausted. Please increase it.
  • Project monthly budget exhausted. Please increase it.

The client did not send a start message or sufficient audio data within the required timeframe. The connection was closed due to inactivity.

  • Audio data decode timeout
  • Input too slow
  • Request timeout.
  • Start request timeout
  • Timed out while waiting for the first audio chunk

A usage or rate limit has been exceeded. You may retry after a delay or request an increase in limits via the Soniox Console.

  • Rate limit for your organization has been exceeded.
  • Rate limit for your project has been exceeded.
  • Your organization has exceeded max number of concurrent requests.
  • Your project has exceeded max number of concurrent requests.

An unexpected server-side error occurred. The request may be retried.

  • The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our support email support@soniox.com if you keep seeing this error.

Cannot continue request or accept new requests.

  • Cannot continue request (code N). Please restart the request. Refer to: https://soniox.com/url/cannot-continue-request