Soniox
Docs

Real-time translation

Learn how real-time translation works.

Overview

Soniox Speech-to-Text AI supports real-time speech translation with high accuracy and ultra-low latency. As you stream audio, Soniox transcribes speech and can translate it into another language in real time. Both transcription and translation are returned together in a single unified stream.


How it works

  • Always transcribes speech: Every spoken word is transcribed, regardless of translation settings.
  • Optional translation: Choose between:
    • One-way translation → Translate all speech into a single target language.
    • Two-way translation → Translate back and forth between two languages (ideal for conversations).
  • Low latency: Translations are streamed in chunks, balancing speed and accuracy.
  • Unified token stream: Transcriptions and translations arrive together, labeled for easy handling.

Translation modes

One-way translation

Convert all speech into one target language.

Two-way translation

Translate both ways between two specified languages. Other spoken languages are only transcribed (not translated).


Supported languages

  • All pairs supported — translate between any two languages.
  • To get the full list, call the Get Models API endpoint.

Token format

Each result (transcription or translation) is returned as a token with clear metadata.

FieldDescription
textToken text
translation_status"none" (not translated), "original" (spoken text), "translation" (translated text)
languageLanguage of the token
source_languageOriginal language (only for translated tokens)

Real-time translation examples

Soniox supports two translation modes:

  • One-way translation → Convert all spoken languages into a single target language.
  • Two-way translation → Translate between two specific languages only; other languages are just transcribed.

Below are examples of both modes


One-way translation

Config

{
  "translation": {
    "type": "one_way",
    "target_language": "en"
  }
}

Behavior

All spoken languages are translated into English.

Text flow

[fr] Bonjour
[en] Hello

[de] Wie geht’s?
[en] How are you?

[fr] Très bien, merci.
[en] Very well, thank you.

JSON flow

{
  "text": "Bonjour",
  "translation_status": "original",
  "language": "fr"
}
{
  "text": "Hello",
  "translation_status": "translation",
  "language": "en",
  "source_language": "fr"
}
{
  "text": "Wie geht’s?",
  "translation_status": "original",
  "language": "de"
}
{
  "text": "How are you?",
  "translation_status": "translation",
  "language": "en",
  "source_language": "de"
}
{
  "text": "Très bien, merci.",
  "translation_status": "original",
  "language": "fr"
}
{
  "text": "Very well, thank you.",
  "translation_status": "translation",
  "language": "en",
  "source_language": "fr"
}

🔑 Key takeaways

  • In one-way mode, every spoken language (FR, DE, ES, …) is translated to the chosen target (EN here).
  • The translation_status field makes it easy to separate originals vs. translations.
  • source_language on translated tokens always tells you what the spoken language was.

Two-way translation (EN ⟷ DE)

Config

{
  "translation": {
    "type": "two_way",
    "language_a": "en",
    "language_b": "de"
  }
}

Behavior

  • English ↔ German are translated both ways.
  • French (and any other language) is transcribed only.

Text flow

[en] Good morning
[de] Guten Morgen

[de] Wie geht’s?
[en] How are you?

[fr] Bonjour à tous
(fr is only transcribed, not translated)

[en] I’m fine, thanks.
[de] Mir geht’s gut, danke.

JSON flow

{
  "text": "Good morning",
  "start_ms": 1000,
  "end_ms": 1600,
  "translation_status": "original",
  "language": "en"
}
{
  "text": "Guten Morgen",
  "translation_status": "translation",
  "language": "de",
  "source_language": "en"
}
{
  "text": "Wie geht’s?",
  "start_ms": 1700,
  "end_ms": 2400,
  "translation_status": "original",
  "language": "de"
}
{
  "text": "How are you?",
  "translation_status": "translation",
  "language": "en",
  "source_language": "de"
}
{
  "text": "Bonjour à tous",
  "start_ms": 2500,
  "end_ms": 3100,
  "translation_status": "none",
  "language": "fr"
}
{
  "text": "I’m fine, thanks.",
  "start_ms": 3200,
  "end_ms": 3800,
  "translation_status": "original",
  "language": "en"
}
{
  "text": "Mir geht’s gut, danke.",
  "translation_status": "translation",
  "language": "de",
  "source_language": "en"
}

🔑 Key takeaways

  • EN and DE speech gets translated both ways.
  • FR speech is only transcribed (translation_status: "none").
  • The source_language field on translations shows the original language.

🔥 With this setup, you can build live multilingual apps that transcribe, translate, and stream results in real time — all from one API.