Multi-Channel Audio
Certain audio recordings have multiple channels. For example, audio of two people talking over the phone may contain two channels, where each line is recorded separately.
To transcribe audio data that includes multiple channels, specify the following:
- Set
num_audio_channels
to the number of channels in your audio. - Set
enable_separate_recognition_per_channel
to true.
The returned Result
objects have now the channel
field set to the channel number the result corresponds to.
Separate recognition per channel is support with all of our APIs: Transcribe
(short audio), TranscribeAsync
(files) and TranscribeStream
(live streams).
The default limit on the max number of channels is set to 4. If you require a higher limit, contact our support team.
Transcribe Short Audio
In this example, we will transcribe a short audio file (< 60 seconds) with two channels, each channel being transcribed independently.
transcribe_file_short_separate_recognition.py
from soniox.transcribe_file import transcribe_file_short
from soniox.speech_service import SpeechClient, set_api_key
set_api_key("<YOUR-API-KEY>")
def main():
with SpeechClient() as client:
channel_results = transcribe_file_short(
"../test_data/test_audio_multi_channel.flac",
client,
num_audio_channels=2,
enable_separate_recognition_per_channel=True,
)
for result in channel_results:
print(f"Channel {result.channel}: " + " ".join(word.text for word in result.words))
if __name__ == "__main__":
main()
Run
python3 transcribe_file_short_separate_recognition.py
Output
Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .
Note that transcribe_file_short()
function returns now a list of Result
objects, one for each channel.
transcribe_file_short_separate_recognition.js
const { SpeechClient } = require("@soniox/soniox-node");
// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();
(async function () {
const channel_results = await speechClient.transcribeFileShort(
"../test_data/test_audio_multi_channel.flac",
{
num_audio_channels: 2,
enable_separate_recognition_per_channel: true,
}
);
for (const result of channel_results) {
console.log(
`Channel ${result.channel}: ${result.words.map((word) => word.text).join(" ")}`
);
}
})();
Run
node transcribe_file_short_separate_recognition.js
Output
Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .
Note that transcribeFileShort()
function returns now a list of Result
objects, one for each channel.
Transcribe Files
transcribe_file_async_separate_recognition.py
The code is nearly identical to our previous example
with the exception that num_audio_channels
and enable_separate_recognition_per_channel
have to be set in transcribe_file_async()
call.
Note that GetTranscribeAsyncResult()
function will return a list of Result
objects
(one for each channel) if separate recognition per channel was enabled.
Run
python3 transcribe_file_async_separate_recognition.py
Output
Uploading file.
File ID: 3476
Calling GetTranscribeAsyncFileStatus.
Status: QUEUED
Calling GetTranscribeAsyncFileStatus.
Status: TRANSCRIBING
Calling GetTranscribeAsyncFileStatus.
Status: COMPLETED
Calling GetTranscribeAsyncResult
Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .
Calling DeleteTranscribeAsyncFile.
transcribe_file_async_separate_recognition.js
The code is nearly identical to our previous example
with the exception that num_audio_channels
and enable_separate_recognition_per_channel
have to be set in transcribeFileAsync()
call.
Note that GetTranscribeAsyncResult()
function will return a list of Result
objects
(one for each channel) if separate recognition per channel was enabled.
Run
node transcribe_file_async_separate_recognition.js
Output
Uploading file.
File ID: 3477
Calling getTranscribeAsyncStatus.
Status: QUEUED
Calling getTranscribeAsyncStatus.
Status: TRANSCRIBING
Calling getTranscribeAsyncStatus.
Status: COMPLETED
Calling GetTranscribeAsyncResult
Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .
Calling deleteTranscribeAsyncFile.
Transcribe Streams
When transcribing a multi-channel stream with separate recognition per channel, all the channels are being transcribed in parallel in real-time and low-latency. This makes it especially suitable for transcribing a meeting with multiple participants, where each participant is transcribed independently of others to achieve the highest level of accuracy.
In this example, we will transcribe a live stream with two channels, each channel being transcribed independently. We will simulate the stream by reading a file in small chunks.
transcribe_stream_separate_recognition.py
from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient, set_api_key
set_api_key("<YOUR-API-KEY>")
def iter_audio() -> Iterable[bytes]:
# This function should yield audio bytes from your stream.
# Here we simulate the stream by reading a file in small chunks.
with open("../test_data/test_audio_multi_channel.flac", "rb") as fh:
while True:
audio = fh.read(1024)
if len(audio) == 0:
break
yield audio
def main():
with SpeechClient() as client:
for result in transcribe_stream(
iter_audio(),
client,
num_audio_channels=2,
enable_separate_recognition_per_channel=True,
):
print(f"Channel {result.channel}: " + " ".join(w.text for w in result.words))
if __name__ == "__main__":
main()
Run
python3 transcribe_stream_separate_recognition.py
Output
Channel 0:But
Channel 1:
Channel 0: But there
Channel 1:
Channel 1:
Channel 0: But there is
Channel 1:
Channel 0: But there is always
Channel 1:
Channel 0: But there is always a
Channel 1:
Channel 0: But there is always a strong
Channel 1:
Channel 1:
Channel 0: But there is always a stronger
Channel 1:
Channel 0: But there is always a stronger sense
Channel 1:
Channel 0: But there is always a stronger sense of
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 0: But there is always a stronger sense of life
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was two
transcribe_stream_separate_recognition.js
const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");
// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();
(async function () {
const onDataHandler = async (result) => {
console.log(
`Channel ${result.channel}: ${result.words.map((word) => word.text).join(" ")}`
);
};
const onEndHandler = (error) => {
console.log("END!", error);
};
// transcribeStream() returns object with ".writeAsync()" and ".end()" methods.
// Use them to send data and end the stream when done.
const stream = speechClient.transcribeStream(
{
include_nonfinal: true,
num_audio_channels: 2,
enable_separate_recognition_per_channel: true,
},
onDataHandler,
onEndHandler
);
// Here we simulate the stream by reading a file in small chunks.
const CHUNK_SIZE = 1024;
const streamSource = fs.createReadStream(
"../test_data/test_audio_multi_channel.flac",
{
highWaterMark: CHUNK_SIZE,
}
);
for await (const audioChunk of streamSource) {
await stream.writeAsync(audioChunk);
}
stream.end();
})();
Run
node transcribe_stream_separate_recognition.js
Output
Channel 0:But
Channel 1:
Channel 0: But there
Channel 1:
Channel 1:
Channel 0: But there is
Channel 1:
Channel 0: But there is always
Channel 1:
Channel 0: But there is always a
Channel 1:
Channel 0: But there is always a strong
Channel 1:
Channel 1:
Channel 0: But there is always a stronger
Channel 1:
Channel 0: But there is always a stronger sense
Channel 1:
Channel 0: But there is always a stronger sense of
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 0: But there is always a stronger sense of life
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was two