Direct stream
Stream directly from microphone to Soniox Speech-to-Text WebSocket API to minimize latency.
Overview
This guide walks you through capturing and transcribing microphone audio in real time using the Soniox WebSocket API — optimized for the lowest possible latency.
The direct stream approach enables the browser to send audio directly to the Soniox WebSocket API over a WebSocket connection, eliminating the need for any intermediary server. This results in faster transcription and a simpler architecture.
Soniox's Web Library handles everything client-side — capturing microphone input, managing the WebSocket connection, and authenticating using temporary API keys.
Use this setup when you want real-time speech-to-text performance directly in the browser with minimal delay.
Temporary API keys
Temporary API keys
(obtained from REST API)
are required solely to establish the WebSocket connection. Once the connection is established,
it will be kept alive as long it remains active. The expires_in_seconds
configuration parameter
should be set to a short duration.
Following parameters are required to create a temporary API key:
API request limits apply when creating temporary API keys. See Limits section in the Soniox Console.
Example
This is an example of a browser-based transcription, but same principle applies to any other type of client - you minimize latency by connecting the client directly to the WebSocket API using a temporary API key.
First we create a simple HTTP server that on request:
- Renders the
index.html
template. - Exposes an endpoint to serve the temporary API key (
/temporary-api-key
).
Our HTML client template contains a single "Start" button, that when clicked:
- Requests microphone permissions.
- Calls the
/temporary-api-key
endpoint to obtain a temporary API key. - Creates a new
RecordTranscribe
class instance passing temporary api key asapiKey
parameter. - Connects to the WebSocket API.
- Starts transcribing from microphone input and renders transcribed text into a
div
in real-time.