Soniox
Docs
Core concepts

Endpoint detection

Learn how speech endpoint detection works.

Overview

Soniox Speech-to-Text AI supports endpoint detection — the ability to detect when a speaker has finished speaking. This is especially useful for voice AI assistants, command-and-response systems, or any application where you want to reduce latency and act as soon as the user stops talking.


What it does

When endpoint detection is enabled:

  • The model listens for natural pauses and identifies when the utterance has ended
  • When this happens, it emits a special <end> token
  • All preceding tokens are finalized immediately
  • The <end> token itself is always final

This allows you to:

  • Know exactly when the speaker has finished
  • Immediately use all final tokens for downstream processing (e.g., sending to an LLM)
  • Reduce delay in conversational systems

How to enable

Set the following flag in your real-time transcription request:

{
  "enable_endpoint_detection": true
}

You can use this with WebSocket and streaming SDK integrations.


Output format

When the model detects that the speaker has stopped speaking, it returns a special token:

{
  "text": "<end>",
  "is_final": true
}

Important notes

  • <end> is treated like a regular token in the stream
  • It will never appear as non-final
  • You can use it as a reliable signal that the speaker has stopped or paused talking for an extended period.

Example use case

  1. User speaks:

    What's the weather in San Francisco tomorrow?

  2. Soniox returns all tokens as final:

    {"text": "Wh", "is_final": true}
    {"text": "at", "is_final": true}
    {"text": "'s", "is_final": true}
    {"text": " the", "is_final": true}
    {"text": " we", "is_final": true}
    {"text": "ather", "is_final": true}
    {"text": " in", "is_final": true}
    {"text": " San", "is_final": true}
    {"text": " Franc", "is_final": true}
    {"text": "isc", "is_final": true}
    {"text": "o,", "is_final": true}
    {"text": " tom", "is_final": true}
    {"text": "or", "is_final": true}
    {"text": "row", "is_final": true}
    {"text": "?", "is_final": true}
    {"text": "<end>", "is_final": true}
  3. Your system can now:

    • Send the full final transcript to a text-based LLM

Example

This example demonstrates how to use endpoint detection.

import json
import os
import threading
import time
 
from websockets import ConnectionClosedOK
from websockets.sync.client import connect
 
# Retrieve the API key from environment variable (ensure SONIOX_API_KEY is set)
api_key = os.environ.get("SONIOX_API_KEY")
websocket_url = "wss://stt-rt.soniox.com/transcribe-websocket"
file_to_transcribe = "coffee_shop.pcm_s16le"
 
 
def stream_audio(ws):
    with open(file_to_transcribe, "rb") as fh:
        while True:
            data = fh.read(3840)
            if len(data) == 0:
                break
            ws.send(data)
            time.sleep(0.12)  # sleep for 120 ms
    ws.send("")  # signal end of stream
 
 
def main():
    print("Opening WebSocket connection...")
 
    with connect(websocket_url) as ws:
        # Send start request
        ws.send(
            json.dumps(
                {
                    "api_key": api_key,
                    "audio_format": "pcm_s16le",
                    "sample_rate": 16000,
                    "num_channels": 1,
                    "model": "stt-rt-preview",
                    "language_hints": ["en", "es"],
                    "enable_non_final_tokens": False,
                    "enable_endpoint_detection": True,
                }
            )
        )
 
        # Start streaming audio in background
        threading.Thread(target=stream_audio, args=(ws,), daemon=True).start()
 
        print("Transcription started")
 
        current_text = ""
 
        try:
            while True:
                message = ws.recv()
                res = json.loads(message)
 
                if res.get("error_code"):
                    print(f"Error: {res['error_code']} - {res['error_message']}")
                    break
 
                for token in res.get("tokens", []):
                    if token.get("text"):
                        if token["text"] == "<end>":
                            print(current_text)
                            current_text = ""
                        elif not current_text:
                            current_text = token["text"].lstrip()
                        else:
                            current_text += token["text"]
 
                if res.get("finished"):
                    if current_text:
                        print(current_text)
 
                    print("\nTranscription complete.")
        except ConnectionClosedOK:
            pass
        except Exception as e:
            print(f"Error: {e}")
 
 
if __name__ == "__main__":
    main()
View example on GitHub

Output

On this page