With accuracies as much as 300% higher than legacy speech systems, Soniox lays the crucial foundation for any voice application.

Many believe that speech recognition is a solved problem. And this may be true in 100% perfect, lab-like environments, where the speaker talks in perfect English and complete sentences, where there is no crosstalk or background noise, and where the audio is recorded with $1000 microphones. Living in the real world, however, we are working in home offices with kids inside and constructions outside; we are speaking to customer service on low-quality phone lines; we are having meetings where we are constantly interrupted; and we are speaking in jargons that only our colleagues understand. It is precisely in these moments that legacy speech systems fail us, often yielding transcriptions with errors as high as 70%.

Go Beyond Speech-to-Text

Soniox is the first and only artificial intelligence that understands real-world spoken conversations. Regardless of background noises, poor audio quality, heavy accents or domain-specific languages, Soniox consistently delivers transcriptions with unrivaled accuracy.

Rethinking the Problem

Understanding the spoken language is an insanely difficult problem. To solve it, we had to completely rethink the essence of the problem. Contrary to popular belief, speech recognition is not merely translating acoustic data to text. So what is speech recognition really? Simply put, it is language understanding driven by audio signal. Solving this problem requires an enormous amount of knowledge about:

  • The World - people, places, food, sports, science, technology, health etc
  • Language - grammar, punctuations, capitalizations etc. and
  • Audio - relations between frequencies, amplitudes and so on

Currently, legacy speech systems attempt to address these knowledge gaps by acquiring human-transcribed audios, which not only costs millions of dollars and takes years to collect, but also throttles accuracy due to the limited range of domains and languages labeled data encompass. The speech industry had been completely bogged down by endless logistics efforts to label data until Soniox decisively stepped up to the challenge.

Shattering the Paradigm

With our groundbreaking self-learning AI, Soniox completely shattered the legacy paradigm of labeling data, training data then rinsing and repeating. Trained with publicly available unlabeled audio and text data from the internet, Soniox is now the most accurate speech recognition AI in history. Learning from boundless amounts of data from all domains, all corners of the world and all walks of life, Soniox is condition-agnostic, delivering unrivaled accuracy in any environment, any media, any accent and any domain, thereby enabling traditionally challenging applications such as medical transcriptions.

In addition to achieving stunning accuracy in speech recognition, Soniox reinvented neural network architectures that propeled real-time transcription accuracy and latency to a whole new level, making truly automatic live captioning a reality. Finally, it’s time that traditional TV broadcasting parts ways with human-labeled transcriptions with latencies on the order of 8-10 seconds. It’s time that webinar, remote collaboration and contact center platforms embrace truly accurate, real-time low-latency Speech AI.

So shed legacy speech-to-text that simply doesn’t work and build real-world voice applications with Soniox today.

