Identify Speakers#
Speaker identification associates recognized speakers with a unique speaker identity (e.g. Speaker-1 is Mike, Speaker-2 is John). Speaker Identification works by using pre-registered voice profiles. Note that speaker diarization does NOT require voice profiles.
Product |
Transcript |
---|---|
Speaker Diarization |
Speaker-1: I prescribed an antibiotic for your infection.
Speaker-2: Thank you doctor!
|
Speaker Diarization + Speaker Identification |
Dr. Spiegel: I prescribed an antibiotic for your infection.
Patient Bal: Thank you doctor!
|
Step 1: Register Voice Profiles#
Before using speaker identification, speakers must be registered using the Speaker Management API.
For testing, a command-line tool manage_speakers for speaker management is included in the Soniox Python package. If you have not installed the Soniox Python package yet, refer to Quickstart (Python Lib).
Below is an example of adding speakers and their voice samples. To see all capabilities of this tool, run it with the --help
flag.
# Clone soniox_examples GitHub repository if not already.
git clone https://github.com/soniox/soniox_examples.git
# Enter the test_data directory in soniox_examples with test audio files.
cd soniox_examples/test_data
# Make sure to set your API key.
export SONIOX_API_KEY="<YOUR-API-KEY>"
# Add speakers with their voice samples.
python3 -m soniox.manage_speakers --add_speaker --speaker_name John
python3 -m soniox.manage_speakers --add_audio --speaker_name John --audio_name test --audio_fn test_audio_sd_spk1.flac
python3 -m soniox.manage_speakers --add_speaker --speaker_name Judy
python3 -m soniox.manage_speakers --add_audio --speaker_name Judy --audio_name test --audio_fn test_audio_sd_spk2.flac
python3 -m soniox.manage_speakers --list
Note that in a real application, speaker names can be arbitrary identifiers.
Step 2: Use Speaker Identification#
To use speaker identification when transcribing audio, the following must be done in TranscriptionConfig
:
Speaker diarization must be enabled (set either
enable_streaming_speaker_diarization
orenable_global_speaker_diarization
totrue
).Speaker identification must be enabled by setting
enable_speaker_identification
totrue
.The names of registered speakers that might occur in the audio must be specified using
candidate_speaker_names
.
The maximum number of specified candidate speakers is 50. Speaker identification only considers the specified candidate speakers, not all the registered speakers.
When speaker identification is enabled, the Result.speakers
field determines the associations between speaker-number and speaker-name.
Note, this field does not contain entries for recognized speakers that were not associated with any of the specified candidate speaker voice profiles.
See Transcription Results for more info.
Global Speaker Identification#
This example demonstrates how to transcribe a file with speaker identification. Make sure you have successfully completed Step 1 of registering voice profiles for speakers John and Judy. The input audio has three speakers, two of them are identified (John and Judy) and the third speaker was recognized but not identified, which is the correct output.
global_speaker_diarization_speaker_id.py
from soniox.transcribe_file import transcribe_file_short
from soniox.speech_service import SpeechClient
# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
with SpeechClient() as client:
result = transcribe_file_short(
"../test_data/test_audio_sd.flac",
client,
model="en_v2",
enable_global_speaker_diarization=True,
min_num_speakers=1,
max_num_speakers=6,
enable_speaker_identification=True,
cand_speaker_names=["John", "Judy"],
)
# Build map from speaker number to name.
speaker_num_to_name = {entry.speaker: entry.name for entry in result.speakers}
# Print results with each speaker segment on its own line.
speaker = None
line = ""
for word in result.words:
if word.speaker != speaker:
if len(line) > 0:
print(line)
speaker = word.speaker
if speaker in speaker_num_to_name:
speaker_name = speaker_num_to_name[speaker]
else:
speaker_name = "unknown"
line = f"Speaker {speaker} ({speaker_name}): "
if word.text == " ":
continue
line += word.text
print(line)
if __name__ == "__main__":
main()
Run
python3 global_speaker_diarization_speaker_id.py
Output
Speaker 1 (John): First forward, a nationwide program started ...
Speaker 2 (Judy): I would love to see all 115 community colleges ...
Speaker 3 (unknown): All students should have access to these kinds ...
Speaker 1 (John): These students say college offers a chance to change ...
global_speaker_diarization_speaker_id.js
const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");
// Do not forget to set your API key in the SONIOX_API_KEY environment variable.
const speechClient = new SpeechClient();
(async function () {
const result = await speechClient.transcribeFileShort(
"../test_data/test_audio_sd.flac",
{
model: "en_v2",
enable_global_speaker_diarization: true,
min_num_speakers: 1,
max_num_speakers: 6,
enable_speaker_identification: true,
cand_speaker_names: ["John", "Judy"]
}
);
// Build map from speaker number to name.
let speaker_num_to_name = {}
for (const entry of result.speakers) {
speaker_num_to_name[entry.speaker] = entry.name;
}
// Print results with each speaker segment on its own line.
let speaker = 0;
let line = "";
for (const word of result.words) {
if (word.speaker !== speaker) {
if (line.length > 0) {
console.log(line);
}
speaker = word.speaker;
let speaker_name;
if (speaker in speaker_num_to_name) {
speaker_name = speaker_num_to_name[speaker]
} else {
speaker_name = "unknown"
}
line = `Speaker ${speaker} (${speaker_name}): `;
if (word.text == " ") {
// Avoid printing leading space at speaker change.
continue;
}
}
line += word.text;
}
console.log(line);
})();
Run
node global_speaker_diarization_speaker_id.js
Output
Speaker 1 (John): First forward, a nationwide program started ...
Speaker 2 (Judy): I would love to see all 115 community colleges ...
Speaker 3 (unknown): All students should have access to these kinds ...
Speaker 1 (John): These students say college offers a chance to change ...
Streaming Speaker Identification#
Our API also supports streaming speaker diarization and identification. This examples demonstrates how to recognize speech, diarize and identify speakers from a live stream in real-time and low-latency settings. We simulate the live stream by reading a file in small chunks.
streaming_speaker_diarization_speaker_id.py
from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient
def iter_audio() -> Iterable[bytes]:
# This function should yield audio bytes from your stream.
# Here we simulate the stream by reading a file in small chunks.
with open("../test_data/test_audio_sd.flac", "rb") as fh:
while True:
audio = fh.read(1024)
if len(audio) == 0:
break
yield audio
# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
with SpeechClient() as client:
for result in transcribe_stream(
iter_audio(),
client,
model="en_v2_lowlatency",
include_nonfinal=True,
enable_streaming_speaker_diarization=True,
enable_speaker_identification=True,
cand_speaker_names=["John", "Judy"],
):
speaker_num_to_name = {entry.speaker: entry.name for entry in result.speakers}
def get_name(speaker):
if speaker in speaker_num_to_name:
return speaker_num_to_name[speaker]
else:
return "unknown"
print(" ".join(f"'{w.text}'/{w.speaker}({get_name(w.speaker)})" for w in result.words))
if __name__ == "__main__":
main()
Run
python3 streaming_speaker_diarization_speaker_id.py
Output
The script prints recognized tokens with assigned speaker numbers and names from a live audio stream. Speaker number 0 means the speaker has not been assigned yet to that recognized token.
'First'/0(unknown)
'First'/1(John)
'First'/1(John) ' '/1(John) 'forward'/1(John)
'First'/1(John) ' '/1(John) 'forward'/1(John)
'First'/1(John) ' '/1(John) 'forward'/1(John)
'First'/1(John) ' '/1(John) 'forward,'/1(John) ' '/0(unknown) 'a'/0(unknown)
'First'/1(John) ' '/1(John) 'forward,'/1(John) ' '/1(John) 'a'/1(John) ' '/0(unknown) 'nation'/0(unknown)
'First'/1(John) ' '/1(John) 'forward,'/1(John) ' '/1(John) 'a'/1(John) ' '/1(John) 'nationwide'/1(John)
streaming_speaker_diarization_speaker_id.js
const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");
// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();
(async function () {
const onDataHandler = async (result) => {
let speaker_num_to_name = {}
for (const entry of result.speakers) {
speaker_num_to_name[entry.speaker] = entry.name
}
const getName = (speaker) => {
if (speaker in speaker_num_to_name) {
return speaker_num_to_name[speaker]
} else {
return "unknown"
}
};
console.log(result.words.map((word) =>
`'${word.text}'/${word.speaker}(${getName(word.speaker)})`).join(" ")
);
};
const onEndHandler = (error) => {
if (error) {
console.log(`Transcription error: ${error}`);
}
};
// transcribeStream() returns object with ".writeAsync()" and ".end()" methods.
// Use them to send data and end the stream when done.
const stream = speechClient.transcribeStream(
{
model: "en_v2_lowlatency",
include_nonfinal: true,
enable_streaming_speaker_diarization: true,
enable_speaker_identification: true,
cand_speaker_names: ["John", "Judy"]
},
onDataHandler,
onEndHandler
);
// Here we simulate the stream by reading a file in small chunks.
const CHUNK_SIZE = 1024;
const readable = fs.createReadStream("../test_data/test_audio_sd.flac", {
highWaterMark: CHUNK_SIZE,
});
for await (const chunk of readable) {
await stream.writeAsync(chunk);
}
stream.end();
})();
Run
node streaming_speaker_diarization_speaker_id.js
Output
The script prints recognized tokens with assigned speaker numbers and names from a live audio stream. Speaker number 0 means the speaker has not been assigned yet to that recognized token.
'First'/0(unknown)
'First'/1(John)
'First'/1(John) ' '/1(John) 'forward'/1(John)
'First'/1(John) ' '/1(John) 'forward'/1(John)
'First'/1(John) ' '/1(John) 'forward'/1(John)
'First'/1(John) ' '/1(John) 'forward,'/1(John) ' '/0(unknown) 'a'/0(unknown)
'First'/1(John) ' '/1(John) 'forward,'/1(John) ' '/1(John) 'a'/1(John) ' '/0(unknown) 'nation'/0(unknown)
'First'/1(John) ' '/1(John) 'forward,'/1(John) ' '/1(John) 'a'/1(John) ' '/1(John) 'nationwide'/1(John)