Speaker Recognition AI
Most robust speaker recognition AI
We rethought the essence of how to build AI models that result in robust speaker recognition for conversations in real-world environments. Soniox is the leader in speaker recognition and speaker identification AI technology.
Improve conversational AI with speaker tags
Knowing who said what is a fundamental element when building conversational AI applications. For example, knowing "who asked the question" and "who answered the question" is a must to properly understand the conversations.
Soniox Speaker Recognition AI supports recognition of up to 20 speakers in a given conversation from audio alone. It supports both speaker diarization (or separation) and speaker identification.
Learn moreSpeaker diarization
Speaker diarization recognizes different speakers, but it does not identify the speakers. It recognizes there are two different speakers in the audio ("Speaker-1" and "Speaker-2"), but it does not know who these speakers are.
Speaker diarization does not require any additional input to recognize different speakers. The recognition is performed based on the audio input alone.
Speaker-1: Hi, how are you?
Speaker-2: I am great. What about you?
Speaker-1: Fantastic, thank you.
Speaker identification
Speaker identification associates recognized speakers with a unique speaker identity. With speaker identification, the AI model knows Speaker-1 is Mike and Speaker-2 is John and provides speakers names (identities).
Speaker identification requires speakers to register their voice ahead of time by providing a short audio example with their speech.
Mike: Hi, how are you?
John: I am great. What about you?
Mike: Fantastic, thank you.
Recognize speakers in live streams, not just files
Streaming speaker recognition
Soniox supports recognizing speakers in live streams, enabling you to instantly recognize speakers when they start speaking.
Global speaker recognition
Soniox also supports recognizing speakers from files, optimized for the highest accuracy, as the AI model can leverage context from the entire audio file.
Integrated into speech recognition API
Speaker recognition is seamlessly integrated into speech recognition API, i.e. with one API call you get the transcript which contains the recognized words, and each recognized word comes with a speaker tag.
{
text: "YouTube";
start_ms: 1450;
duration_ms: 350;
is_final: true;
speaker: 1;
confidence: 0.98;
}
Ready to get started?
Explore Soniox Docs or create an account and start building your audio AI application. You can also contact us to design a custom package for your business.
Always know what you pay
Pay only for what you use. Integrated per-usage pricing with no hidden fees.