General AISpeech AI


Speaker Recognition AI

Most robust speaker recognition AI

We rethought the essence of how to build AI models that result in robust speaker recognition for conversations in real-world environments. Soniox is the leader in speaker recognition and speaker identification AI technology.

Improve conversational AI with speaker tags

Knowing who said what is a fundamental element when building conversational AI applications. For example, knowing "who asked the question" and "who answered the question" is a must to properly understand the conversations.

Soniox Speaker Recognition AI supports recognition of up to 20 speakers in a given conversation from audio alone. It supports both speaker diarization (or separation) and speaker identification.

Learn more

Speaker diarization

Speaker diarization recognizes different speakers, but it does not identify the speakers. It recognizes there are two different speakers in the audio ("Speaker-1" and "Speaker-2"), but it does not know who these speakers are.

Speaker diarization does not require any additional input to recognize different speakers. The recognition is performed based on the audio input alone.

Explore docs

Speaker-1: Hi, how are you?

Speaker-2: I am great. What about you?

Speaker-1: Fantastic, thank you.

Speaker identification

Speaker identification associates recognized speakers with a unique speaker identity. With speaker identification, the AI model knows Speaker-1 is Mike and Speaker-2 is John and provides speakers names (identities).

Speaker identification requires speakers to register their voice ahead of time by providing a short audio example with their speech.

Explore docs

Mike: Hi, how are you?

John: I am great. What about you?

Mike: Fantastic, thank you.

Recognize speakers in live streams, not just files

Streaming speaker recognition

Soniox supports recognizing speakers in live streams, enabling you to instantly recognize speakers when they start speaking.

Explore docs

Global speaker recognition

Soniox also supports recognizing speakers from files, optimized for the highest accuracy, as the AI model can leverage context from the entire audio file.

Explore docs

Integrated into speech recognition API

Speaker recognition is seamlessly integrated into speech recognition API, i.e. with one API call you get the transcript which contains the recognized words, and each recognized word comes with a speaker tag.

Explore docs

text: "YouTube";
start_ms: 1450;
duration_ms: 350;
is_final: true;
speaker: 1;
confidence: 0.98;

Ready to get started?

Explore Soniox Docs or create an account and start building your audio AI application. You can also contact us to design a custom package for your business.

Always know what you pay

Pay only for what you use. Integrated per-usage pricing with no hidden fees.

Pricing details

Start your integration

Get up and running with Soniox in as little as 5 minutes.

API reference