Identify speakers
In this example, we will identify different speakers using Soniox Speaker Identification.
Speaker identification associates recognized speakers with a unique speaker identity (e.g. Speaker-1 is Mike, Speaker-2 is John). Speaker Identification works by using pre-registered voice profiles.
Note that speaker diarization does NOT require voice profiles.
Speaker diarization
Speaker diarization + speaker sdentification
Register voice profiles
Before using speaker identification, speakers must be registered using the Speaker Management API.
For testing, a command-line tool manage_speakers
for speaker management is included in the Soniox Python package. If you have not
installed the Soniox Python package yet, refer to Quickstart (Python Lib).
Below is an example of adding speakers and their voice samples. To see all capabilities of this tool, run it with the --help
flag.
Note that in a real application, speaker names can be arbitrary identifiers.
Use speaker identification
To use speaker identification when transcribing audio, the following must be done in TranscriptionConfig
:
- Speaker diarization must be enabled (set either
enable_streaming_speaker_diarization
orenable_global_speaker_diarization
totrue
). - Speaker identification must be enabled by setting
enable_speaker_identification
totrue
. - The names of registered speakers that might occur in the audio must be specified using
candidate_speaker_names
.
The maximum number of specified candidate speakers is 50. Speaker identification only considers the specified candidate speakers, not all the registered speakers.
When speaker identification is enabled, the Result.speakers
field determines the associations between speaker-number and speaker-name.
Note, this field does not contain entries for recognized speakers that were not associated with any of the specified candidate speaker voice profiles.
See Transcription Results for more info.
Global speaker identification
This example demonstrates how to transcribe a file with speaker identification. Make sure you have successfully completed Step 1 of registering voice profiles for speakers John and Judy. The input audio has three speakers, two of them are identified (John and Judy) and the third speaker was recognized but not identified, which is the correct output.
Streaming speaker identification
Our API also supports streaming speaker diarization and identification. This examples demonstrates how to recognize speech, diarize and identify speakers from a live stream in real-time and low-latency settings. We simulate the live stream by reading a file in small chunks.
streaming_speaker_diarization_speaker_id.py
Run
Output
The script prints recognized tokens with assigned speaker numbers and names from a live audio stream. Speaker number 0 means the speaker has not been assigned yet to that recognized token.