Multi-Channel Audio

Certain audio recordings have multiple channels. For example, audio of two people talking over the phone may contain two channels, where each line is recorded separately.

To transcribe audio data that includes multiple channels, specify the following:

  1. Set num_audio_channels to the number of channels in your audio.
  2. Set enable_separate_recognition_per_channel to true.

The returned Result objects have now the channel field set to the channel number the result corresponds to.

Separate recognition per channel is support with all of our APIs: Transcribe (short audio), TranscribeAsync (files) and TranscribeStream (live streams).

The default limit on the max number of channels is set to 4. If you require a higher limit, contact our support team.

Transcribe Short Audio

In this example, we will transcribe a short audio file (< 60 seconds) with two channels, each channel being transcribed independently.

transcribe_file_short_separate_recognition.py

from soniox.transcribe_file import transcribe_file_short
from soniox.speech_service import SpeechClient, set_api_key

set_api_key("<YOUR-API-KEY>")


def main():
    with SpeechClient() as client:
        channel_results = transcribe_file_short(
            "../test_data/test_audio_multi_channel.flac",
            client,
            num_audio_channels=2,
            enable_separate_recognition_per_channel=True,
        )
        for result in channel_results:
            print(f"Channel {result.channel}: " + " ".join(word.text for word in result.words))


if __name__ == "__main__":
    main()

Run

python3 transcribe_file_short_separate_recognition.py

Output

Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .

Note that transcribe_file_short() function returns now a list of Result objects, one for each channel.

transcribe_file_short_separate_recognition.js

const { SpeechClient } = require("@soniox/soniox-node");

// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();

(async function () {
    const channel_results = await speechClient.transcribeFileShort(
        "../test_data/test_audio_multi_channel.flac",
        {
            num_audio_channels: 2,
            enable_separate_recognition_per_channel: true,
        }
    );

    for (const result of channel_results) {
        console.log(
            `Channel ${result.channel}: ${result.words.map((word) => word.text).join(" ")}`
        );
    }
})();

Run

node transcribe_file_short_separate_recognition.js

Output

Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .

Note that transcribeFileShort() function returns now a list of Result objects, one for each channel.

Transcribe Files

transcribe_file_async_separate_recognition.py

The code is nearly identical to our previous example with the exception that num_audio_channels and enable_separate_recognition_per_channel have to be set in transcribe_file_async() call.

Note that GetTranscribeAsyncResult() function will return a list of Result objects (one for each channel) if separate recognition per channel was enabled.

Run

python3 transcribe_file_async_separate_recognition.py

Output

Uploading file.
File ID: 3476
Calling GetTranscribeAsyncFileStatus.
Status: QUEUED
Calling GetTranscribeAsyncFileStatus.
Status: TRANSCRIBING
Calling GetTranscribeAsyncFileStatus.
Status: COMPLETED
Calling GetTranscribeAsyncResult
Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .
Calling DeleteTranscribeAsyncFile.

transcribe_file_async_separate_recognition.js

The code is nearly identical to our previous example with the exception that num_audio_channels and enable_separate_recognition_per_channel have to be set in transcribeFileAsync() call.

Note that GetTranscribeAsyncResult() function will return a list of Result objects (one for each channel) if separate recognition per channel was enabled.

Run

node transcribe_file_async_separate_recognition.js

Output

Uploading file.
File ID: 3477
Calling getTranscribeAsyncStatus.
Status: QUEUED
Calling getTranscribeAsyncStatus.
Status: TRANSCRIBING
Calling getTranscribeAsyncStatus.
Status: COMPLETED
Calling GetTranscribeAsyncResult
Channel 0: But there is always a stronger sense of life . And now he is pouring down his beams
Channel 1: He was two years out from the east .
Calling deleteTranscribeAsyncFile.

Transcribe Streams

When transcribing a multi-channel stream with separate recognition per channel, all the channels are being transcribed in parallel in real-time and low-latency. This makes it especially suitable for transcribing a meeting with multiple participants, where each participant is transcribed independently of others to achieve the highest level of accuracy.

In this example, we will transcribe a live stream with two channels, each channel being transcribed independently. We will simulate the stream by reading a file in small chunks.

transcribe_stream_separate_recognition.py

from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient, set_api_key

set_api_key("<YOUR-API-KEY>")


def iter_audio() -> Iterable[bytes]:
    # This function should yield audio bytes from your stream.

    # Here we simulate the stream by reading a file in small chunks.
    with open("../test_data/test_audio_multi_channel.flac", "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio


def main():
    with SpeechClient() as client:
        for result in transcribe_stream(
            iter_audio(),
            client,
            num_audio_channels=2,
            enable_separate_recognition_per_channel=True,
        ):
            print(f"Channel {result.channel}: " + " ".join(w.text for w in result.words))


if __name__ == "__main__":
    main()

Run

python3 transcribe_stream_separate_recognition.py

Output

Channel 0:But
Channel 1:
Channel 0: But there
Channel 1:
Channel 1:
Channel 0: But there is
Channel 1:
Channel 0: But there is always
Channel 1:
Channel 0: But there is always a
Channel 1:
Channel 0: But there is always a strong
Channel 1:
Channel 1:
Channel 0: But there is always a stronger
Channel 1:
Channel 0: But there is always a stronger sense
Channel 1:
Channel 0: But there is always a stronger sense of
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 0: But there is always a stronger sense of life
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was two

transcribe_stream_separate_recognition.js

const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");

// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();

(async function () {
    const onDataHandler = async (result) => {
        console.log(
            `Channel ${result.channel}: ${result.words.map((word) => word.text).join(" ")}`
        );
    };

    const onEndHandler = (error) => {
        console.log("END!", error);
    };

    // transcribeStream() returns object with ".writeAsync()" and ".end()" methods.
    // Use them to send data and end the stream when done.
    const stream = speechClient.transcribeStream(
        {
            include_nonfinal: true,
            num_audio_channels: 2,
            enable_separate_recognition_per_channel: true,
        },
        onDataHandler,
        onEndHandler
    );

    // Here we simulate the stream by reading a file in small chunks.
    const CHUNK_SIZE = 1024;
    const streamSource = fs.createReadStream(
        "../test_data/test_audio_multi_channel.flac",
        {
            highWaterMark: CHUNK_SIZE,
        }
    );

    for await (const audioChunk of streamSource) {
        await stream.writeAsync(audioChunk);
    }

    stream.end();
})();

Run

node transcribe_stream_separate_recognition.js

Output

Channel 0:But
Channel 1:
Channel 0: But there
Channel 1:
Channel 1:
Channel 0: But there is
Channel 1:
Channel 0: But there is always
Channel 1:
Channel 0: But there is always a
Channel 1:
Channel 0: But there is always a strong
Channel 1:
Channel 1:
Channel 0: But there is always a stronger
Channel 1:
Channel 0: But there is always a stronger sense
Channel 1:
Channel 0: But there is always a stronger sense of
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 0: But there is always a stronger sense of life
Channel 1:
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was
Channel 0: But there is always a stronger sense of life
Channel 1: He was two

cookie Change your cookie preferences