Soniox
Docs

Endpoint detection

Learn how speech endpoint detection works.

Overview

Soniox Speech-to-Text AI supports endpoint detection — the ability to detect when a speaker has finished speaking. This is especially useful for voice AI assistants, command-and-response systems, or any application where you want to reduce latency and act as soon as the user stops talking.


What it does

When endpoint detection is enabled:

  • The model listens for natural pauses and identifies when the utterance has ended
  • When this happens, it emits a special <end> token
  • All preceding tokens are finalized immediately
  • The <end> token itself is always final

This allows you to:

  • Know exactly when the speaker has finished
  • Immediately use all final tokens for downstream processing (e.g., sending to an LLM)
  • Reduce delay in conversational systems

How to enable

Set the following flag in your real-time transcription request:

{
  "enable_endpoint_detection": true
}

You can use this with WebSocket and streaming SDK integrations.


Output format

When the model detects that the speaker has stopped speaking, it returns a special token:

{
  "text": "<end>",
  "is_final": true
}

Important notes

  • <end> is treated like a regular token in the stream.
  • It will never appear as non-final.
  • You can use it as a reliable signal that the speaker has stopped or paused talking for an extended period.

Example use case

  1. User speaks:

    What's the weather in San Francisco tomorrow?

  2. Soniox returns all tokens as final:

    {"text": "Wh", "is_final": true}
    {"text": "at", "is_final": true}
    {"text": "'s", "is_final": true}
    {"text": " the", "is_final": true}
    {"text": " we", "is_final": true}
    {"text": "ather", "is_final": true}
    {"text": " in", "is_final": true}
    {"text": " San", "is_final": true}
    {"text": " Franc", "is_final": true}
    {"text": "isc", "is_final": true}
    {"text": "o,", "is_final": true}
    {"text": " tom", "is_final": true}
    {"text": "or", "is_final": true}
    {"text": "row", "is_final": true}
    {"text": "?", "is_final": true}
    {"text": "<end>", "is_final": true}
  3. Your system can now:

    • Send the full final transcript to a text-based LLM

Example

This example demonstrates how to use endpoint detection.

import json
import os
import threading
import time

from websockets import ConnectionClosedOK
from websockets.sync.client import connect

# Retrieve the API key from environment variable (ensure SONIOX_API_KEY is set)
api_key = os.environ.get("SONIOX_API_KEY")
websocket_url = "wss://stt-rt.soniox.com/transcribe-websocket"
file_to_transcribe = "coffee_shop.pcm_s16le"


def stream_audio(ws):
    with open(file_to_transcribe, "rb") as fh:
        while True:
            data = fh.read(3840)
            if len(data) == 0:
                break
            ws.send(data)
            time.sleep(0.12)  # sleep for 120 ms
    ws.send("")  # signal end of stream


def main():
    print("Opening WebSocket connection...")

    with connect(websocket_url) as ws:
        # Send start request
        ws.send(
            json.dumps(
                {
                    "api_key": api_key,
                    "audio_format": "pcm_s16le",
                    "sample_rate": 16000,
                    "num_channels": 1,
                    "model": "stt-rt-preview-v2",
                    "language_hints": ["en", "es"],
                    "enable_non_final_tokens": False,
                    "enable_endpoint_detection": True,
                }
            )
        )

        # Start streaming audio in background
        threading.Thread(target=stream_audio, args=(ws,), daemon=True).start()

        print("Transcription started")

        current_text = ""

        try:
            while True:
                message = ws.recv()
                res = json.loads(message)

                if res.get("error_code"):
                    print(f"Error: {res['error_code']} - {res['error_message']}")
                    break

                for token in res.get("tokens", []):
                    if token.get("text"):
                        if token["text"] == "<end>":
                            print(current_text)
                            current_text = ""
                        elif not current_text:
                            current_text = token["text"].lstrip()
                        else:
                            current_text += token["text"]

                if res.get("finished"):
                    if current_text:
                        print(current_text)

                    print("\nTranscription complete.")
        except ConnectionClosedOK:
            pass
        except Exception as e:
            print(f"Error: {e}")


if __name__ == "__main__":
    main()
View example on GitHub
import { createReadStream } from "fs";

// Retrieve the API key from environment variable (ensure SONIOX_API_KEY is set)
const apiKey = process.env.SONIOX_API_KEY;
const websocketUrl = "wss://stt-rt.soniox.com/transcribe-websocket";
const fileToTranscribe = "coffee_shop.pcm_s16le";

// Connect to WebSocket API
const ws = new WebSocket(websocketUrl);

console.log("Opening WebSocket connection...");
ws.addEventListener("open", async () => {
  // Send start request
  ws.send(
    JSON.stringify({
      api_key: apiKey,
      audio_format: "pcm_s16le",
      sample_rate: 16000,
      num_channels: 1,
      model: "stt-rt-preview-v2",
      language_hints: ["en", "es"],
      enable_non_final_tokens: false,
      enable_endpoint_detection: true,
    })
  );

  // Read and send audio data from file over WebSocket connection
  const audioStream = createReadStream(fileToTranscribe, {
    highWaterMark: 3840,
  });

  console.log("Transcription started.");

  for await (const chunk of audioStream) {
    ws.send(chunk);
    await new Promise((resolve) => setTimeout(resolve, 120));
  }

  // Signal end of file
  ws.send("");
});

let currentText = "";

ws.addEventListener("message", (event) => {
  // Receive and process messages
  const res = JSON.parse(event.data);

  if (res.error_code) {
    console.log(`\nError: ${res.error_code} ${res.error_message}`);
    process.exit(1);
  }

  for (const token of res.tokens || []) {
    if (token.text) {
      if (token.text == "<end>") {
        console.log(currentText);
        currentText = "";
      } else if (!currentText) {
        currentText += token.text.trimStart();
      } else {
        currentText += token.text;
      }
    }
  }

  if (res.finished) {
    if (currentText) {
      console.log(currentText);
    }
    console.log("\nTranscription done.");
  }
});

ws.addEventListener("close", () => {});

ws.addEventListener("error", (error) => {
  console.error("Connection error occurred:", error);
});
View example on GitHub

Output