Proxy stream
How to stream audio from a client app to Soniox Speech-to-Text WebSocket API through a proxy server.
Overview
This guide explains how to stream microphone audio from a client to the Soniox WebSocket API through a proxy server.
In this architecture, the client captures audio and sends it over WebSocket to a proxy server. The proxy server establishes a connection to the Soniox WebSocket API, authenticates the session, streams the audio for transcription, and relays the transcribed results back to the client in real time.
This setup is useful when you want to inspect, transform, or store audio and transcription data on the server side before passing it to the client. If your goal is simply to transcribe audio and return results with the lowest possible latency, consider using the direct stream approach instead.
Example
In the following example, we create a proxy HTTP server that:
- Listens for incoming WebSocket connections from the client.
- Forwards audio data from the client to the WebSocket API.
- Relays transcription results back to the client.
Authentication with the WebSocket API is handled by the proxy server using the SONIOX_API_KEY
.
Next, we create a basic HTML page as the client (same concept works for any other app framework).
The HTML client:
- Connects to the proxy server via WebSocket.
- Captures audio stream from the microphone through the
MediaRecorder
. - Streams audio data to the proxy server.
- Receives messages from the proxy server and renders transcribed text into a
div
.