Real-time translation

Overview

Soniox Speech-to-Text AI supports real-time speech translation with high accuracy and ultra-low latency. As you stream audio, Soniox transcribes speech and can translate it into another language in real time. Both transcription and translation are returned together in a single unified stream.

How it works

Always transcribes speech: Every spoken word is transcribed, regardless of translation settings.
Optional translation: Choose between:
- One-way translation → Translate all speech into a single target language.
- Two-way translation → Translate back and forth between two languages (ideal for conversations).
Low latency: Translations are streamed in chunks, balancing speed and accuracy.
Unified token stream: Transcriptions and translations arrive together, labeled for easy handling.

Translation modes

One-way translation

Convert all speech into one target language.

Two-way translation

Translate both ways between two specified languages. Other spoken languages are only transcribed (not translated).

Supported languages

All pairs supported — translate between any two languages.
To get the full list, call the Get Models API endpoint.

Token format

Each result (transcription or translation) is returned as a token with clear metadata.

Field	Description
`text`	Token text
`translation_status`	`"none"` (not translated), `"original"` (spoken text), `"translation"` (translated text)
`language`	Language of the token
`source_language`	Original language (only for translated tokens)

Real-time translation examples

Soniox supports two translation modes:

One-way translation → Convert all spoken languages into a single target language.
Two-way translation → Translate between two specific languages only; other languages are just transcribed.

Below are examples of both modes

One-way translation

Config

{
  "translation": {
    "type": "one_way",
    "target_language": "en"
  }
}

Behavior

All spoken languages are translated into English.

Text flow

[fr] Bonjour
[en] Hello

[de] Wie geht’s?
[en] How are you?

[fr] Très bien, merci.
[en] Very well, thank you.

JSON flow

{
  "text": "Bonjour",
  "translation_status": "original",
  "language": "fr"
}
{
  "text": "Hello",
  "translation_status": "translation",
  "language": "en",
  "source_language": "fr"
}
{
  "text": "Wie geht’s?",
  "translation_status": "original",
  "language": "de"
}
{
  "text": "How are you?",
  "translation_status": "translation",
  "language": "en",
  "source_language": "de"
}
{
  "text": "Très bien, merci.",
  "translation_status": "original",
  "language": "fr"
}
{
  "text": "Very well, thank you.",
  "translation_status": "translation",
  "language": "en",
  "source_language": "fr"
}

🔑 Key takeaways

In one-way mode, every spoken language (FR, DE, ES, …) is translated to the chosen target (EN here).
The translation_status field makes it easy to separate originals vs. translations.
source_language on translated tokens always tells you what the spoken language was.

Two-way translation (EN ⟷ DE)

Config

{
  "translation": {
    "type": "two_way",
    "language_a": "en",
    "language_b": "de"
  }
}

Behavior

English ↔ German are translated both ways.
French (and any other language) is transcribed only.

Text flow

[en] Good morning
[de] Guten Morgen

[de] Wie geht’s?
[en] How are you?

[fr] Bonjour à tous
(fr is only transcribed, not translated)

[en] I’m fine, thanks.
[de] Mir geht’s gut, danke.

JSON flow

{
  "text": "Good morning",
  "start_ms": 1000,
  "end_ms": 1600,
  "translation_status": "original",
  "language": "en"
}
{
  "text": "Guten Morgen",
  "translation_status": "translation",
  "language": "de",
  "source_language": "en"
}
{
  "text": "Wie geht’s?",
  "start_ms": 1700,
  "end_ms": 2400,
  "translation_status": "original",
  "language": "de"
}
{
  "text": "How are you?",
  "translation_status": "translation",
  "language": "en",
  "source_language": "de"
}
{
  "text": "Bonjour à tous",
  "start_ms": 2500,
  "end_ms": 3100,
  "translation_status": "none",
  "language": "fr"
}
{
  "text": "I’m fine, thanks.",
  "start_ms": 3200,
  "end_ms": 3800,
  "translation_status": "original",
  "language": "en"
}
{
  "text": "Mir geht’s gut, danke.",
  "translation_status": "translation",
  "language": "de",
  "source_language": "en"
}

🔑 Key takeaways

EN and DE speech gets translated both ways.
FR speech is only transcribed (translation_status: "none").
The source_language field on translations shows the original language.

🔥 With this setup, you can build live multilingual apps that transcribe, translate, and stream results in real time — all from one API.

Example: Translate everything to Spanish

Example: Japanese ⟷ Korean

On this page