Soniox
Docs
Real-time API

Manual finalization

Learn how manual finalization works.

Overview

Soniox supports manual finalization in addition to automatic mechanisms like endpoint detection. Manual finalization gives you precise control over when audio should be finalized — useful for:

  • Push-to-talk systems.
  • Client-side voice activity detection (VAD).
  • Segment-based transcription pipelines.
  • Applications where automatic endpoint detection is not ideal.

How to finalize

Send a control message over the WebSocket connection:

{"type": "finalize"}

When received:

  • Soniox finalizes all audio up to that point.
  • All tokens from that audio are returned with "is_final": true.
  • The model emits a special marker token:
{"text": "<fin>", "is_final": true}

The <fin> token signals that finalization is complete.


Key points

  • You can call finalize multiple times per session.
  • You may continue streaming audio after each finalize call.
  • The <fin> token is always returned as final and can be used to trigger downstream processing.
  • Do not send finalize too frequently (every few seconds is fine; too often may cause disconnections).
  • You are charged for the full stream duration, not just the audio processed.

Trailing silence

If you already added silence before sending finalize, you can reduce extra padding and improve latency by specifying:

{
  "type": "finalize",
  "trailing_silence_ms": 300
}

This tells Soniox how much silence you already included.


Connection keepalive

Combine with connection keepalive: use keepalive messages to prevent timeouts when no audio is being sent (e.g., during long pauses).