2. Transcribe Files in Stream
In the previous example, we transcribed a short audio file. With short audio files, we can transfer the entire audio at once to Soniox Cloud, and wait to receive the entire transcription result.
In order to transcribe arbitrary large audio files, we need a bidirectional streaming solution, where we transfer the audio in small chunks, and at the same time receive partial transcription results.
In this example, we will transcribe a file in a bidirectional streaming paradigm using transcribe_file_stream().
from soniox.transcribe_file import transcribe_file_stream from soniox.speech_service import Client, set_api_key from soniox.test_data import TEST_AUDIO_LONG_FLAC set_api_key("<YOUR-API-KEY>") def main(): with Client() as client: for result in transcribe_file_stream(TEST_AUDIO_LONG_FLAC, client): print(" ".join(w.text for w in result.words)) if __name__ == "__main__": main()
To transcribe file in a streaming paradigm, we call transcribe_file_stream() generator in a for loop. Under the hood, the generator does two things asynchronously:
- Reads and transfers the content of the file in small chunks to Soniox Cloud.
- Receives partial transcription results from Soniox Cloud.
The generator returns a sequence of Result structures, each containing the recognized words.
but there is always a stronger sense of life when the sun is brilliant after rain and now he lighting up every patch of vivid green moss and the red tiles of the cow shed and the channel to the drain into a mirror for the yellow billed ducks who are seizing the opportunity of getting
The transcribe_file_stream() supports the same audio-formats as transcribe_file_short(), i.e. automatic audio-format detection and raw audio transcription.
Transcribe From Memory
If we have the entire audio file in memory, we can use transcribe_bytes_stream().
with open(TEST_AUDIO_LONG_FLAC, "rb") as fh: audio_bytes = fh.read() for result in transcribe_bytes_stream(audio_bytes, client): print(" ".join(w.text for w in result.words))
If we are reading the audio in chunks, we can use transcribe_iter_bytes_stream() by passing a generator over the audio chunks.
def iter_audio(): with open(TEST_AUDIO_LONG_FLAC, "rb") as fh: while True: audio = fh.read(1024) if len(audio) == 0: break yield audio for result in transcribe_iter_bytes_stream(iter_audio(), client): print(" ".join(w.text for w in result.words))
Processed Audio Duration
The transcribed duration of the audio in milliseconds is available in the final_proc_time_ms field in each received result object.
duration_ms = result.final_proc_time_ms