Audio format

This page outlines audio formats supported for transcription and explains how to configure the audio format in transcribe requests.

This API is deprecated and is being phased out. Please switch to our new multilingual Speech-to-Text API.

Automatic audio format detection

Soniox supports and automatically detects most common audio formats from file headers, so you don't need to manually set audio configs when using the supported file formats.

Supported file formats

mp3, wav, flac, ogg, aac, aiff, amr, and asf.
When using supported file formats, the audio_format, sample_rate_hertz and num_audio_channels TranscriptionConfig fields should not be set.

Raw audio samples

It is possible to send raw (PCM) audio samples instead of a container format. The supported formats are listed below. For example, pcm_f32le means float-32 little endian.

When using a raw format, the following TranscriptionConfig fields must be set.

Field	Type	Permitted Values
audio_format	string	pcm_f32le, pcm_f32be, pcm_s32le, pcm_s32be, pcm_s16le, pcm_s16be, mulaw, alaw
sample_rate_hertz	int32	2000 to 96000 Hz
num_audio_channels	int32	1 to 8

Example

This example shows how to transcribe audio encoded in PCM 16-bit little endian at 16 kHz sample rate and using 1 channel.

transcribe_any_stream_audio_format.py

for result in transcribe_stream(
    iter_audio(),
    client,
    model="en_v2_lowlatency",
    include_nonfinal=True,
    audio_format="pcm_s16le",
    sample_rate_hertz=16000,
    num_audio_channels=1,
):

Automatic audio format detection

Supported file formats

Raw audio samples

Example

On this page