Soniox

One speech AI for the world

60+ languages, real-time — all in one model

April 25, 2025 by Soniox Team

Today, we’re excited to introduce something we believe is foundational not just for voice technology, but for how billions of people around the world will interact with AI:

Soniox Speech-to-Text AI — a single, universal model that understands speech in over 60 languages, in real time, with high accuracy. And more importantly, it works the way people actually speak.


Why this matters

Despite all the advances in generative AI, the foundational challenge of understanding human speech — in all its languages, dialects, and blended forms — has still not been solved for most of the world.

Voice recognition has been limited to a small number of major languages, often with inconsistent accuracy and limited availability. The result? Billions of people and many languages have been excluded from meaningful use of voice technology.

That changes today.

At Soniox, we’ve trained a new kind of AI model: a universal speech recognition system that works for 60+ languages with high accuracy. More importantly, it works in real time — and it can handle conversations where people switch languages mid-sentence.

This unlocks the ability to build real-time, voice-powered applications for the entire world.


Understands what’s said — and who said it

Soniox Speech-to-Text AI doesn’t just transcribe words — it understands who’s speaking. Real-time speaker separation is built in and works across all 60+ languages.

That means instead of a block of unstructured text, you get a clear, labeled conversation — one that’s useful, searchable, and accurate.

Think about a doctor and a patient. A teacher and a student. A business team or a podcast panel. Knowing who said what isn’t optional — it’s essential to understanding the conversation.

And now, it’s available in real time, with the kind of accuracy you can actually trust in production.


One model. Problem solved.

  • One model for 60+ languages with high accuracy
  • Real-time, streaming speech recognition
  • Seamless recognition of mixed-language speech
  • High accuracy across accents and environments
  • Real-time, streaming speaker separation

This means a developer in Japan, a startup in India, or a healthcare team in Finland can finally build voice applications in their own language — with accuracy they can trust.


Supporting file transcription — even higher accuracy

In addition to real-time recognition, we’re also launching support for asynchronous (file-based) transcription.

Just upload an audio or video file and get back a high-quality, speaker-separated transcript. The async model offers even higher recognition accuracy and speaker labeling performance, making it perfect for processing:

  • Meetings and interviews
  • Podcasts and webinars
  • Call recordings and support logs
  • Video archives and media content

It’s the same universal model — just optimized for batch processing with maximum precision.


Simple, all-inclusive pricing — truly for everyone

With Soniox Speech-to-Text AI, you get full access to every feature at one price — no hidden fees, no paywalls, and no premium tiers for things that should just be standard.

Because of our efficient architecture — a single model for all languages and highly optimized inference — we’re able to keep pricing affordable for everyone:

  • Real-time: $0.12/hour
  • Async (file): $0.10/hour

Whether you’re building for a global user base, processing massive archives, or developing a consumer voice app — this pricing unlocks what wasn’t previously viable with traditional APIs.

This is speech AI that’s built — and priced — for everyone.


Try it today

You can experience Soniox in two ways:

👉 Use our API: Visit soniox.com/docs to start integrating our speech AI into your own applications. Receive $200 free credits to start.

👉 Try the Soniox App: Our mobile app lets you transcribe conversations live, separate speakers, create summaries, key points and more — right from your phone.


The future of speech starts here

This is foundational AI for the audio world. Soniox Speech-to-Text AI makes spoken language truly accessible and understandable — across all languages, for everyone.

And this is just the beginning. We’re excited to see what you create with it.


Soniox Team