Audio formats
Information about audio formats supported by Soniox Speech-to-text AI.
Overview
Soniox Speech-to-Text AI supports a wide range of audio formats for both file-based and real-time transcription. In most cases, Soniox automatically detects the format using file or stream headers, requiring no additional configuration.
This page outlines which formats are supported in each mode and how to configure raw audio formats when automatic detection is not applicable.
Automatic audio format detection
Soniox can automatically detect common audio and video container formats by inspecting the file or stream header. No configuration is needed in these cases.
Supported formats (auto-detected)
Mode | Supported formats |
---|---|
File transcription | aac, aiff, amr, asf, flac, mp3, ogg, wav, webm, m4a, mp4 |
Real-time transcription | aac, aiff, amr, asf, flac, mp3, ogg, wav, webm |
No configuration required — Soniox automatically detects the format based on the file or stream header.
Raw audio formats (manual configuration required)
For raw audio formats that do not include headers (such as PCM), you must manually specify the format using the following parameters:
audio_format
: The encoding of the raw audio data (e.g., pcm_s16le, pcm_f32be, mulaw)sample_rate
: The number of audio samples per second, in Hz (e.g., 16000)num_channels
: The number of audio channels (e.g., 1 for mono, 2 for stereo)
Supported raw formats
Soniox supports a wide range of raw audio encodings, including:
Format | Description |
---|---|
pcm_s8 | Signed 8-bit |
pcm_s16le | Signed 16-bit, little-endian |
pcm_s16be | Signed 16-bit, big-endian |
pcm_s24le | Signed 24-bit, little-endian |
pcm_s24be | Signed 24-bit, big-endian |
pcm_s32le | Signed 32-bit, little-endian |
pcm_s32be | Signed 32-bit, big-endian |
pcm_u8 | Unsigned 8-bit |
pcm_u16le | Unsigned 16-bit, little-endian |
pcm_u16be | Unsigned 16-bit, big-endian |
pcm_u24le | Unsigned 24-bit, little-endian |
pcm_u24be | Unsigned 24-bit, big-endian |
pcm_u32le | Unsigned 32-bit, little-endian |
pcm_u32be | Unsigned 32-bit, big-endian |
pcm_f32le | 32-bit float, little-endian |
pcm_f32be | 32-bit float, big-endian |
pcm_f64le | 64-bit float, little-endian |
pcm_f64be | 64-bit float, big-endian |
mulaw | μ-law encoding (usually sample rate 8000 and 1 channel) |
alaw | A-law encoding (usually sample rate 8000 and 1 channel) |
These formats require explicit configuration of format, sample rate, and channel count.
Example
The following example demonstrates how to transcribe an audio stream encoded in 16-bit PCM (little-endian), with a 16 kHz sample rate and 1 channel:
Output