Soniox
Docs

WebSocket API

Learn how to use and integrate Soniox Speech-to-Text WebSocket API.

Overview

The Soniox WebSocket API provides real-time transcription and translation of live audio with ultra-low latency. It supports advanced features like speaker diarization, context customization, and manual finalization — all over a persistent WebSocket connection. Ideal for live scenarios such as meetings, broadcasts, multilingual communication, and voice interfaces.


WebSocket endpoint

Connect to the API using:

wss://stt-rt.soniox.com/transcribe-websocket

Configuration

Before streaming audio, configure the transcription session by sending a JSON message such as:

{
  "api_key": "<SONIOX_API_KEY|SONIOX_TEMPORARY_API_KEY>",
  "model": "stt-rt-preview",
  "audio_format": "auto",
  "language_hints": ["en", "es"],
  "context": "Celebrex, Zyrtec, Xanax, Prilosec, Amoxicillin Clavulanate
  Potassium The customer, Maria Lopez, contacted BrightWay Insurance to
  update her auto policy after purchasing a new vehicle.",
  "enable_speaker_diarization": true,
  "enable_language_identification": true,
  "translation": {
    "type": "two_way",
    "language_a": "en",
    "language_b": "es"
  }
}

Parameters

api_keyRequiredstring

Your Soniox API key. Create keys in the Soniox Console. For client apps, use a short-lived key generated on your server to keep secrets safe.

modelRequiredstring

Real-time model to use. See models.

Example: "stt-rt-preview"
audio_formatRequiredstring

Audio format of the stream. See audio formats.

num_channelsnumber

Required for raw audio formats. See audio formats.

sample_ratenumber

Required for raw audio formats. See audio formats.

language_hintsarray<string>

See language hints.

contextstring

See context.

enable_speaker_diarizationboolean

See speaker diarization.

enable_language_identificationboolean

See language identification.

enable_non_final_tokensboolean

See final vs non-final tokens.

enable_endpoint_detectionboolean

See endpoint detection.

client_reference_idstring

Optional identifier to track this request (client-defined).

translationobject

See real-time translation.

One-way translation

typeRequiredstring

Must be set to one_way.

target_languageRequiredstring

Language to translate the transcript into.

Two-way translation

typeRequiredstring

Must be set to two_way.

language_aRequiredstring

First language for two-way translation.

language_bRequiredstring

Second language for two-way translation.


Audio streaming

After configuration, start streaming audio:

  • Send audio as binary WebSocket frames.
  • Each stream supports up to 60 minutes of audio.

Ending the stream

To gracefully close a streaming session:

  • Send an empty WebSocket frame (binary or text).
  • The server will return one or more responses, including finished response, and then close the connection.

Response

Soniox returns responses in JSON format. A typical successful response looks like:

{
  "tokens": [
    {
      "text": "Hello",
      "start_ms": 600,
      "end_ms": 760,
      "confidence": 0.97,
      "is_final": true,
      "speaker": "1",
    }
  ],
  "final_audio_proc_ms": 760,
  "total_audio_proc_ms": 880
}

Field descriptions

tokensarray<object>

List of processed tokens (words or subwords).

Each token may include:

textstring

Token text.

start_msOptionalnumber

Start timestamp of the token (in milliseconds). Not included if translation_status is translation.

end_msOptionalnumber

End timestamp of the token (in milliseconds). Not included if translation_status is translation.

confidencenumber

Confidence score (0.01.0).

is_finalboolean

Whether the token is finalized.

speakerOptionalstring

Speaker label (if diarization enabled).

translation_statusOptionalstring

See real-time translation.

languageOptionalstring

Language of the token.text.

source_languageOptionalstring

See real-time translation.

final_audio_proc_msnumber

Audio processed into final tokens.

total_audio_proc_msnumber

Audio processed into final + non-final tokens.


Finished response

At the end of a stream, Soniox sends a final message to indicate the session is complete:

{
  "tokens": [],
  "final_audio_proc_ms": 1560,
  "total_audio_proc_ms": 1680,
  "finished": true
}

After this, the server closes the WebSocket connection.


Error response

If an error occurs, the server returns an error message and immediately closes the connection:

{
  "tokens": [],
  "error_code": 503,
  "error_message": "Cannot continue request (code N). Please restart the request. ..."
}
error_codenumber

Standard HTTP status code.

error_messagestring

A description of the error encountered.

Full list of possible error codes and messages: