Soniox
Demo apps

Speech-to-speech translation

Demo app showing how to add real-time speech-to-speech translation with Soniox.

Overview

Soniox Speech-to-speech Translation demo app shows how to combine the Soniox real-time speech-to-text with translation and real-time text-to-speech WebSocket APIs into a complete speech-to-speech translation pipeline - voice input in one language, hear the translation in another, in real time.

This is a reference implementation for developers who want to learn how to wire the two real-time APIs together at the protocol level, without an SDK. The backend is a small FastAPI service; the frontend is a vanilla HTML/JS page.

Features

  • Stream audio from an audio file or your mic to Soniox in real time
  • Live transcription in 60+ languages, with automatic source-language detection
  • Mid-sentence speech translation to 60+ languages - translation tokens stream as you talk
  • Live spoken translation through one of the Soniox voices, played back to you
  • Optional speaker diarization and language identification
  • Toggle for text-only translation mode that skips TTS entirely

Usage flow

  1. Pick a target language and a voice in the sidebar
  2. Tap Start talking to begin streaming from your mic
  3. The original transcript appears on the left; the translation appears on the right, word by word
  4. The translated speech plays through your speakers in the chosen voice
  5. Tap Stop to end the session

Architecture

  • Server (Python / FastAPI): Holds your Soniox API key, accepts a WebSocket from the browser, and proxies audio and tokens between the browser and Soniox. Manages the per-utterance TTS stream lifecycle, pre-warming, and connection keepalive.
  • Frontend (vanilla HTML / JS): Captures audio file or microphone audio with MediaRecorder, streams the bytes to the backend over WebSocket, renders incoming token JSON into the transcript columns, and plays incoming PCM audio through the Web Audio API.

Source code available in our GitHub examples repo.