Soniox
Docs
Core concepts

Connection keepalive

Learn how connection keepalive works.

Overview

In real-time transcription sessions, it's important to keep the WebSocket connection alive, even during periods when no audio is being streamed — such as during silence, pauses, or voice activity detection (VAD) on the client side.

To support this, Soniox provides a special control message:

{"type": "keepalive"}

Sending this message ensures that your WebSocket session remains active and avoids timing out when no audio is being sent.


When to use keepalive

Use the {"type": "keepalive"} message when:

  • You're using client-side VAD and only stream audio during speech
  • There's silence or inactivity between utterances
  • You want to pause streaming without losing session context

This ensures that:

  • The connection remains open
  • Session-level context (e.g., speaker labels, language tracking, prompt) is preserved

Keepalive requirements

  • You must send a {"type": "keepalive"} message at least once every 20 seconds during periods when you're not sending audio.
  • Failure to do so may result in the connection being closed due to inactivity.
  • You may send it more frequently (e.g., every 5–10 seconds) if desired.

Example message

{"type": "keepalive"}

Send this during idle periods in your stream to avoid timeouts while maintaining context.


Best practices

  • Send a keepalive message at least every 20 seconds during silence
  • Use alongside {"type": "finalize"} if you're segmenting audio manually
  • Maintain your connection across short pauses to preserve context

Example

The following example demonstrates how to send keepalive messages to keep the connection alive:

import json
import os
import threading
import time
 
import requests
from websockets import ConnectionClosedOK
from websockets.sync.client import connect
 
# Retrieve the API key from environment variable (ensure SONIOX_API_KEY is set)
api_key = os.environ.get("SONIOX_API_KEY")
websocket_url = "wss://stt-rt.soniox.com/transcribe-websocket"
file_to_transcribe = "coffee_shop.pcm_s16le"
 
 
def stream_audio(ws):
    # Wait for 20 seconds before sending audio but send keepalive every 5
    # seconds.
    for _ in range(4):
        time.sleep(5)
        print("Keepalive")
        ws.send('{"type": "keepalive"}')
 
    with open(file_to_transcribe, "rb") as fh:
        start = time.monotonic()
        finalized = False
        while True:
            if time.monotonic() - start > 6 and not finalized:
                # Finalize current audio.
                ws.send('{"type": "finalize"}')
                finalized = True
                # Wait for 30 seconds but send keepalive every 5 seconds.
                for _ in range(6):
                    time.sleep(5)
                    print("Keepalive")
                    ws.send('{"type": "keepalive"}')
            data = fh.read(3840)
            if len(data) == 0:
                break
            ws.send(data)
            time.sleep(0.12)  # sleep for 120 ms
    ws.send("")  # signal end of stream
 
 
def main():
    print("Opening WebSocket connection...")
 
    with connect(websocket_url) as ws:
        # Send start request
        ws.send(
            json.dumps(
                {
                    "api_key": api_key,
                    "audio_format": "pcm_s16le",
                    "sample_rate": 16000,
                    "num_channels": 1,
                    "model": "stt-rt-preview",
                    "language_hints": ["en", "es"],
                }
            )
        )
 
        # Start streaming audio in background
        threading.Thread(target=stream_audio, args=(ws,), daemon=True).start()
 
        print("Transcription started")
 
        final_text = ""
 
        try:
            while True:
                message = ws.recv()
                res = json.loads(message)
 
                if res.get("error_code"):
                    print(f"Error: {res['error_code']} - {res['error_message']}")
                    break
 
                non_final_text = ""
 
                for token in res.get("tokens", []):
                    if token.get("text"):
                        if token.get("is_final"):
                            final_text += token["text"]
                        else:
                            non_final_text += token["text"]
 
                print(
                    "\033[2J\033[H"  # clear the screen, move to top-left corner
                    + final_text  # write final text
                    + "\033[34m"  # change text color to blue
                    + non_final_text  # write non-final text
                    + "\033[39m"  # reset text color
                )
 
                if res.get("finished"):
                    print("\nTranscription complete.")
        except ConnectionClosedOK:
            pass
        except Exception as e:
            print(f"Error: {e}")
 
 
if __name__ == "__main__":
    main()

View example on GitHub

On this page