Real-time translation
Learn how real-time translation works.
Overview
Soniox Speech-to-Text AI supports real-time speech translation with high accuracy and ultra-low latency. As you stream audio, Soniox transcribes speech and can translate it into another language in real time. Both transcription and translation are returned together in a single unified stream.
How it works
- Always transcribes speech: Every spoken word is transcribed, regardless of translation settings.
- Optional translation: Choose between:
- One-way translation → Translate all speech into a single target language.
- Two-way translation → Translate back and forth between two languages (ideal for conversations).
- Low latency: Translations are streamed in chunks, balancing speed and accuracy.
- Unified token stream: Transcriptions and translations arrive together, labeled for easy handling.
Translation modes
One-way translation
Convert all speech into one target language.
Two-way translation
Translate both ways between two specified languages. Other spoken languages are only transcribed (not translated).
Supported languages
- All pairs supported — translate between any two languages.
- To get the full list, call the Get Models API endpoint.
Token format
Each result (transcription or translation) is returned as a token with clear metadata.
Field | Description |
---|---|
text | Token text |
translation_status | "none" (not translated), "original" (spoken text), "translation" (translated text) |
language | Language of the token |
source_language | Original language (only for translated tokens) |
Real-time translation examples
Soniox supports two translation modes:
- One-way translation → Convert all spoken languages into a single target language.
- Two-way translation → Translate between two specific languages only; other languages are just transcribed.
Below are examples of both modes
One-way translation
Config
Behavior
All spoken languages are translated into English.
Text flow
JSON flow
🔑 Key takeaways
- In one-way mode, every spoken language (FR, DE, ES, …) is translated to the chosen target (EN here).
- The
translation_status
field makes it easy to separate originals vs. translations. source_language
on translated tokens always tells you what the spoken language was.
Two-way translation (EN ⟷ DE)
Config
Behavior
- English ↔ German are translated both ways.
- French (and any other language) is transcribed only.
Text flow
JSON flow
🔑 Key takeaways
- EN and DE speech gets translated both ways.
- FR speech is only transcribed (
translation_status: "none"
). - The
source_language
field on translations shows the original language.
🔥 With this setup, you can build live multilingual apps that transcribe, translate, and stream results in real time — all from one API.