Multi-channel audio
This page explains how to enable transcription for multi-channeled audio.
Certain audio recordings have multiple channels. For example, audio of two people talking over the phone may contain two channels, where each line is recorded separately.
By default, multiple audio channels are mixed into one channel before transcription. But it is possible to transcribe each channel individually.
To transcribe audio with separate recognition per channel, specify the following TranscriptionConfig
fields:
- Set
num_audio_channels
to the number of channels in your audio. - Set
enable_separate_recognition_per_channel
to true.
Separate recognition per channel is supported with all transcription APIs: Transcribe
(short audio), TranscribeAsync
(files) and TranscribeStream
(live streams).
When using separate recognition per channel, Transcribe
and TranscribeAsync
return a list of Result
objects for consecutive channels
instead of a single result. In all cases, each Result
object has the channel
field indicating the channel it is for.
The maximum number of channels for separate recognition per channel is 4. If you require a higher limit, contact our support team.
Using separate recognition per channel increases transcription cost.
Transcription with separate recognition per channel and N channels is billed the same as transcrption of N times that duration of audio without separate recognition per channel.
Transcribe short audio
In this example, we will transcribe a short audio file (< 60 seconds) with two channels using separate recognition per channel.
transcribe_file_short_separate_recognition.py
Note that transcribe_file_short()
now returns a list of Result
objects, one for each channel.
Run
Output
Transcribe files
In this example, we will transcribe a long audio file (> 60 seconds) with two channels using separate recognition per channel.
transcribe_file_async_separate_recognition.py
The code is nearly identical to our previous example
with the exception that num_audio_channels
and enable_separate_recognition_per_channel
have to be set in transcribe_file_async()
call.
Note that GetTranscribeAsyncResult()
will return a list of Result
objects
(one for each channel) when separate recognition per channel was enabled.
Run
Output
Transcribe streams
When transcribing a multi-channel stream with separate recognition per channel, all the channels are being transcribed in parallel in real-time and low-latency. This makes it especially suitable for transcribing a meeting with a fixed number of participants, where each participant is transcribed independently of others to achieve the highest level of accuracy.
In this example, we will transcribe a live stream with two channels, each channel being transcribed independently. We will simulate the stream by reading a file in small chunks.