Get Audio#

You can retrieve stored audio using the GetAudio API call. To retrieve audio, you need to specify the object ID and the segment of audio to retrieve. There are two ways of doing the latter:

  1. By specifying a time range using start_ms and duration_ms.
  2. By specifying a token range using token_start and token_end.

The first option returns the audio segment corresponding precisely to the specified time range.

The second option uses token/word timestamps to compute the start and end time of the audio segment based on the specified token range. The time range is automatically extended by a small amount on each side in order to fully include the specified tokens/words in the audio segment.

The result is a stream of responses:

message GetAudioResponse {
    string object_id = 1;
    int32 start_ms = 2;
    int32 duration_ms = 3;
    int32 total_duration_ms = 4;
    int32 num_audio_channels = 6;
    bytes data = 5;
}

The first response has metadata fields only (all except data). All subsequent responses have data only and invalid (zero/empty) values in other fields.

Audio can be returned in one of the following formats as specified by the audio_bytes_format parameter: wav or pcm_s16le (PCM 16-bit signed little-endian).

In both cases, the returned audio will have sample rate 16 kHz. The number of audio channels is indicated in the num_audio_channels field in the first response (this is important to know for pcm_s16le). In practice, the number of audio channels will be equal to 1 if separate recognition per channel was not used, or equal to the number of audio channels transcribed is it was.

Example#

In this example we are going to retrieve two audio segments in wav format, one for a time range starting at 1 second and lasting 3 seconds, and one for a token range starting at token 3 and ending at token 10 (non-inclusive).

get_audio.py

from typing import Iterable
from soniox.speech_service import SpeechClient, set_api_key
from soniox.speech_service_pb2 import GetAudioResponse
from soniox.storage import get_audio


# Helper function to save audio returned by GetAudio and print information.
def save_audio(responses_iter: Iterable[GetAudioResponse], file_path: str) -> None:
    print(f"Saving audio to: {file_path}")
    audio_file_size = 0
    with open(file_path, "wb") as fh:
        for response in responses_iter:
            fh.write(response.data)
            audio_file_size += len(response.data)
    print(f"Audio file size: {audio_file_size}")


# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
    with SpeechClient() as client:
        object_id = "my_id_for_audio"

        # Get audio by time range.
        save_audio(
            get_audio(object_id, client, start_ms=1167, duration_ms=976, audio_bytes_format="wav"),
            "get_audio_time.wav",
        )

        # Get audio by token range.
        save_audio(
            get_audio(object_id, client, token_start=17, token_end=19, audio_bytes_format="wav"),
            "get_audio_token.wav",
        )


if __name__ == "__main__":
    main()

Run

python3 get_audio.py

Output

Saving audio to: get_audio_time.wav
Audio file size: 31276
Saving audio to: get_audio_token.wav
Audio file size: 25004