Live speech translation API

Translate live speech across 60+ languages and 3,600 language pairs with ultra-low latency, high quality, and true real-time streaming.

Trusted by

Built for the hardest parts of speech translation

Speech translation breaks when systems wait too long, mistranscribe speech, or only work well for a few major languages.

Real conversations are messy. People speak with accents, switch languages, say names, addresses, emails, phone numbers, IDs, and domain-specific terms. In live conversation, every second of delay makes the experience feel broken.

Soniox speech translation is built for this reality.

It combines native-speaker speech recognition, real-time streaming translation, and high-fidelity text-to-speech into one platform for production speech translation across 60+ languages and 3,600 language pairs.

A breakthrough in real-time speech translation

Translate before the sentence ends

Soniox streams translation while speech is still happening, so users see or hear meaning immediately.

3,600 language pairs

Translate between any supported languages across 60+ languages, not just English-centric workflows.

High quality across languages

High-quality translation across 60+ languages, including historically underserved languages.

Built on native-speaker STT accuracy

Accurate translation starts with accurate recognition across accents, multilingual speech, and language switching.

Handles names, numbers, and domain terms

Soniox preserves critical details, including names, phone numbers, emails, IDs, and domain-specific terminology.

config.json
{
  "model": "stt-rt-v4",
  "translation": {
    "type": "one_way",
    "target_language": "en"
  }
}

Transcription + translation through a single stream

Soniox speech-to-text translation is built into Soniox Speech-to-Text API. Soniox transcribes every spoken word and translates mid-sentence. Both arrive together in a single labeled token stream.

Turn it on by adding a translation config to your speech-to-text API request and translation will run on the same WebSocket.

Translate live speech to text or to spoken output

Use Soniox STT alone to stream translated text alongside the transcript, or combine STT and TTS to speak the translation in the target language. Both run on the same real-time pipeline.

Speech-to-text translation

Live speechTranscript + translation

Translate live speech into written text using the Soniox STT API. Enable real-time translation with a simple configuration change — Soniox streams transcripts and translated text as speech happens.

Use it for captions, subtitles, meeting translation, agent assist, accessibility tools, and multilingual transcription.

Speech-to-speech translation

Live speechTranslated speech

Build full spoken translation by combining Soniox STT and Soniox TTS. Soniox recognizes speech, translates it in real time, and speaks the output in the target language with low latency.

Use it for live interpreters, bilingual voice agents, travel assistants, customer support, and real-time multilingual communication.

One-way or two-way translation

One-way translates every speaker into a single target language. Two-way runs a live bilingual conversation between two languages, so each side speaks naturally and hears the other in their own.

Speakers are talking in Spanish, Japanese, Arabic, Hindi, Polish, and Korean languages.
Entire conversation is translated into English in real time.

One-way translation

Translate speech from any supported language into target language.

Ideal for live captions, multilingual meetings, broadcasts, lectures, events, customer calls, and products where many speakers need to be understood in one language.

Japanese speaker talks in Japanese.
English speaker hears English.
English speaker replies in English.
Japanese speaker hears Japanese.

Two-way translation

Translate between two languages for live bilingual conversation.

Soniox supports real-time two-way translation between any two supported languages, so both sides can speak naturally and understand each other instantly.

Speech translation for global products

Voice agents

Build multilingual voice agents that understand users in one language and respond in another.

Use for support, sales, scheduling, healthcare, and global voice automation.

Live interpreters

Create real-time interpreter experiences for conversations, meetings, events, and business communication.

Bilingual conversation feels immediate instead of delayed.

Multilingual meetings

Translate live meetings across languages with captions, transcripts, summaries, and action items.

Support global teams without forcing everyone into English.

Customer support and contact centers

Translate live customer calls while preserving names, numbers, addresses, and verification codes.

Give agents and customers a smoother multilingual experience.

Captions and subtitles

Generate real-time translated captions for broadcasts, webinars, classrooms, and live streams.

Translate speech as it happens, without long caption delays.

Accessibility and communication tools

Build assistive products that help people understand speech across languages in real time.

Live captions, translated transcripts, and spoken translation.

Built on the Soniox speech AI platform

Soniox speech translation is powered by the same infrastructure behind Soniox STT and TTS.

Speech-to-Text

Native-speaker accuracy across 60+ languages, with support for multilingual speech, alphanumerics, speaker diarization, context.

Translation

Real-time streaming translation across 3,600 language pairs, built for high quality and low delay across all supported languages.

Text-to-speech

High-fidelity speech generation in 60+ languages, built for names, alphanumerics, language switching, and ultra-low-latency streaming.

Together, they create a complete real-time low-latency speech AI platform.

Simple, usage-based pricing

Start translating live audio streams from ~$0.18/hour.

Translation is already built into Soniox Speech-to-Text API. When turned on, it adds about ~$0.06/hour in output token costs.

Frequently asked questions

What is Soniox speech translation?
Soniox speech translation is a real-time platform that translates live speech across 60+ languages and 3,600 language pairs with ultra-low latency. It combines native-speaker speech recognition, real-time streaming translation, and high-fidelity text-to-speech into one platform.
How many languages are supported?
Soniox supports 60+ languages and 3,600 language pairs for translation between any supported languages.
What is the difference between speech-to-text and speech-to-speech translation?
Speech-to-text translation takes live speech and outputs a transcript plus real-time translated text, using the Soniox STT API. Speech-to-speech translation combines Soniox STT and Soniox TTS to take live speech and output translated speech in the target language with low latency.
What is the difference between one-way and two-way translation?
One-way translation translates speech from any supported language into one target language — ideal for live captions, multilingual meetings, broadcasts, lectures, and customer calls. Two-way translation translates between two languages for live bilingual conversation, so both sides can speak naturally and understand each other instantly.
Does Soniox handle accents and multilingual speech?
Yes. Soniox STT is built for native-speaker accuracy and handles accents, multilingual speech, and language switching across 60+ languages.
Can Soniox handle names, numbers, and domain-specific terms?
Yes. Soniox preserves the details that matter, including names, phone numbers, emails, IDs, addresses, verification codes, and domain-specific terminology.
What can I build with Soniox speech translation?
Common use cases include multilingual voice agents, live interpreters, multilingual meetings, customer support and contact centers, real-time translated captions and subtitles, and accessibility and communication tools.
How fast is the translation?
Soniox streams translation while speech is still happening, so users see or hear meaning immediately — translation arrives before the sentence ends, instead of waiting for long delays.

Ready to get started?

Create an account instantly, or contact us to design a custom package for your business.

Build with API

Documentation

Get up and running in minutes and spend your time building the product, not wrestling with the API.

Explore docs

See what you’ll pay

Pay only for what you use with our flexible pricing. Built to scale with you.

Pricing details