1. Transcribe Short Files

In this example, we will transcribe a short audio file using the transcribe_file_short() function and print out the recognized words.

examples/transcribe_file_short.py GitHub

from soniox.transcribe_file import transcribe_file_short
from soniox.speech_service import Client, set_api_key
from soniox.test_data import TEST_AUDIO_FLAC

set_api_key("<YOUR-API-KEY>")

def main():
    with Client() as client:
        result = transcribe_file_short(TEST_AUDIO_FLAC, client)
        for word in result.words:
            print(f"{word.text} {word.start_ms} {word.duration_ms}")

if __name__ == "__main__":
    main()

First, set your API key, which you can find under the Developer tab in Soniox Cloud.

We create a Client object which handles all the communication between your program and Soniox Cloud. We use the with statement to cleanup the client object after it is no longer needed.

We then call the transcribe_file_short() function with the created client object and a short audio file TEST_AUDIO_FLAC, which is part of soniox Python package. The function reads the entire audio file and sends it to Soniox Cloud for transcription. It returns an instance of the Result structure, which contains the recognized words. Each word is an instance of the Word structure, which contains the text of the word as well as the timestamps.

Run!

python3 transcribe_file_short.py

Output

he 480 240
was 800 80
two 1200 160

Automatic Audio-Format Detection

Soniox supports and automatically detects most common audio formats from file headers. Current list of support file formats include mp3, wav, flac, ogg, aac, aiff, amr, and asf.

Transcribe Raw Audio

It is possible to send raw audio samples instead of a container format. The following PCM formats are supported: pcm_f32le, pcm_f32be, pcm_s32le, pcm_s32be, pcm_s16le, pcm_s16be. The example below shows how to transcribe audio encoded in PCM 16-bit little endian at 16 kHz sample rate and using one channel.

result = transcribe_file_short(
    TEST_AUDIO_RAW, 
    client, 
    audio_format="pcm_s16le",
    sample_rate_hertz=16000,
    num_audio_channels=1)

Transcribe From Memory

If we have the audio file already in memory, we can use the transcribe_bytes_short() function.

with open(TEST_AUDIO_FLAC, "rb") as fh:
    audio_bytes = fh.read()

result = transcribe_bytes_short(audio_bytes, client)