General AISpeech AI


June 21, 2022 by Cathy Xi, Klemen Simonic

Soniox Speaker – Diarization 5x More Accurate than Leading Providers

Building Conversational AI or Voice Applications?

Conversations = People Talking

Knowing who said what in conversations is a crucial and fundamental element when building any voice application or conversational AI. Yet this seemingly basic and even obvious piece of information had eluded the greatest AI researchers for decades. Leading providers of speaker diarization, such as Google, Amazon and Microsoft, incur on average an error of 25%, meaning 1/4 of speech is assigned to the wrong speaker, making recognized speaker information completely unusable for any downstream applications.

Actual Conversation Wrong Transcription
Speaker A: Have you seen our Europe vacation deals?
Speaker B: No, I cannot enter Europe.
Speaker C: Have you seen
Speaker B: our Europe vacation deals?
Speaker A: No, I cannot
Speaker C: enter Europe.

Attempting to develop conversational AI without knowing who asked a question or who gave an answer is like a blindman putting drakes and hens in a pen then eagerly waiting for fertile eggs. In fact, without reliable speaker recognition, we have effectively been living in the dark ages of conversational AI.

Soniox Speaker Diarization

Today, we are thrilled to share that Soniox made a breakthrough in speaker diarization, speaker identification and speaker recognition. In fact, Soniox Speaker Diarization AI has achieved a stunning accuracy of 96%, meaning 96% of speech is assigned to the correct speaker. With an error rate of just 4%, Soniox Speaker Diarization is 5x more accurate than other providers, smashing the status quo. This groundbreaking technology not only transforms a single audio file into a multi-speaker transcription, but also serves real-time & low-latency applications. With this breakthrough, Soniox unlocked the essential component to building conversation AI, opening up a completely new dimension for downstream applications that so many have hungered for.

With Soniox Speaker Diarization, you can now build the voice and conversation applications you’ve always envisioned. You can now develop applications that allow contact center managers to listen to the voice of their customers, or systems that enable recruiters to analyze interviewer and candidate performance, or medical documentation software that auto-identifies doctor’s prescriptions. With Soniox Speaker Diarization AI, the possibilities are endless.

On top of unlocking a new dimension in conversational intelligence, Soniox is taking it one step further and introducing yet another revolutionary product - Soniox Speaker Identification AI.

Soniox Speaker Identification

No two snowflakes are the same. No two fingerprints are identical, not even that of a pair of identical twins. Our voices too have unique signatures. When it comes to identifying people by their voices, however, the greatest minds in AI are once again at a loss.

Soniox Speaker Identification AI, yet another breakthrough technology, determines speaker identities with stunning accuracy. With just a 10-second voice sample, Soniox Speaker Identification is able to identify the speaker based on their unique voice profiles, making Soniox the first in the world to support applications with these requirements.

When would you use Speaker Identification?

If you want speaker names to be associated with sentences on your transcripts, you will need Speaker Identification AI.

For instance:

Dr. Spiegel : I prescribed an antibiotic for your infection.

Patient Baldwin : Thank you doctor!

How does Speaker Identification associate speaker names with sentences?

It is teamwork between Speech AI, Speaker Diarization and Speaker Identification that enables the association of speaker names with sentences.

ProductDeterminesExample OutputTranscript
Speech AIWhat words and sentences were said in audios Sentence 1: I prescribed an antibiotic for your infection.
Sentence 2: Thank you doctor!
Speech AI
I prescribed an antibiotic for your infection. Thank you doctor!
Speaker DiarizationWhich speaker said which sentence Speaker A said sentence #1
Speaker B said sentence #2
Speech AI + Speaker Diarization
Speaker A: I prescribed an antibiotic for your infection.
Speaker B: Thank you doctor!
Speaker IdentificationThe identity* of each speaker Speaker A is Dr Spiegel
Speaker B is Patient Baldwin
Speech AI + Speaker Diarization + Speaker Identification
Dr Spiegel: I prescribed an antibiotic for your infection. Patient Baldwin: Thank you doctor!
* Identity could be any unique identifier of your choice. E.g. patient_name, provider_name, patient_id, provider_id, patient_email, provider_email

How is Speaker Identification useful for you?

Being able to associate an exact person with a question, answer or comment is essential for any conversational intelligence. Below are just a few examples:

IndustryPossible Unique IdentifiersSample Transcript
Contact Centeragent_id, customer_id agent-437138: How may I help you today?
customer-1150: My internet is slow.
Salesagent_email, prospect_email Jira has an advanced roadmap feature. That sounds interesting. How does it work?
Recruitinginterviewer_email, candidate_email What programming language do you code with? I code with C++.
Medicalpatient_id, provider_id provider-446770: I prescribed an antibiotic for your infection.
patient-880356: Thank you doctor!

Whether you are building revenue intelligence systems, sales coaching applications, medical transcription software or recruiting platforms, speaker information is crucial. The launch of Soniox Speaker AI today marks the end of the dark ages in conversational AI. Today is the beginning of a new era- the era where true conversational intelligence surges to revolutionize every facet of life. So embark on this thrilling journey with us and start building real-world conversational AI today.