Audio format
This page outlines audio formats supported for transcription and explains how to configure the audio format in transcribe requests.
Automatic audio format detection
Soniox supports and automatically detects most common audio formats from file headers, so you don't need to manually set audio configs when using the supported file formats.
Supported file formats
- mp3, wav, flac, ogg, aac, aiff, amr, and asf.
- When using supported file formats, the
audio_format,sample_rate_hertzandnum_audio_channelsTranscriptionConfigfields should not be set.
Raw audio samples
It is possible to send raw (PCM) audio samples instead of a container format. The supported formats are listed below. For example, pcm_f32le means float-32 little endian.
When using a raw format, the following TranscriptionConfig fields must be set.
| Field | Type | Permitted Values |
|---|---|---|
| audio_format | string | pcm_f32le, pcm_f32be, pcm_s32le, pcm_s32be, pcm_s16le, pcm_s16be, mulaw, alaw |
| sample_rate_hertz | int32 | 2000 to 96000 Hz |
| num_audio_channels | int32 | 1 to 8 |
Example
This example shows how to transcribe audio encoded in PCM 16-bit little endian at 16 kHz sample rate and using 1 channel.
transcribe_any_stream_audio_format.py
transcribe_any_stream_audio_format.js
TranscribeAnyStreamAudioFormat.cs