2. Transcribe Files in Stream

In the previous example, we transcribed a short audio file. With short audio files, we can transfer the entire audio at once to Soniox Cloud, and wait to receive the entire transcription result.

In order to transcribe arbitrary large audio files, we need a bidirectional streaming solution, where we transfer the audio in small chunks, and at the same time receive partial transcription results.

In this example, we will transcribe a file in a bidirectional streaming paradigm using transcribe_file_stream().

examples/transcribe_file_stream.py GitHub

from soniox.transcribe_file import transcribe_file_stream
from soniox.speech_service import Client, set_api_key
from soniox.test_data import TEST_AUDIO_LONG_FLAC

set_api_key("<YOUR-API-KEY>")

def main():
    with Client() as client:
        for result in transcribe_file_stream(TEST_AUDIO_LONG_FLAC, client):
            print(" ".join(w.text for w in result.words))

if __name__ == "__main__":
    main()

To transcribe file in a streaming paradigm, we call transcribe_file_stream() generator in a for loop. Under the hood, the generator does two things asynchronously:

  1. Reads and transfers the content of the file in small chunks to Soniox Cloud.
  2. Receives partial transcription results from Soniox Cloud.

The generator returns a sequence of Result structures, each containing the recognized words.

Run!

python3 transcribe_files_in_stream.py

Output

but there
is always a stronger sense of life when the sun is brilliant after rain and now he 
lighting up every patch of vivid green moss and the red tiles of the cow shed and  
the channel to the drain  
into a mirror for the yellow billed ducks who are seizing the opportunity of getting 

Audio-Formats

The transcribe_file_stream() supports the same audio-formats as transcribe_file_short(), i.e. automatic audio-format detection and raw audio transcription.

Transcribe From Memory

If we have the entire audio file in memory, we can use transcribe_bytes_stream().

with open(TEST_AUDIO_LONG_FLAC, "rb") as fh:
    audio_bytes = fh.read()

for result in transcribe_bytes_stream(audio_bytes, client):
    print(" ".join(w.text for w in result.words))

If we are reading the audio in chunks, we can use transcribe_iter_bytes_stream() by passing a generator over the audio chunks.

def iter_audio():
    with open(TEST_AUDIO_LONG_FLAC, "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio

for result in transcribe_iter_bytes_stream(iter_audio(), client):
    print(" ".join(w.text for w in result.words))

Processed Audio Duration

The transcribed duration of the audio in milliseconds is available in the final_proc_time_ms field in each received result object.

duration_ms = result.final_proc_time_ms