Shared concepts
Audio formats
Supported audio formats for Soniox Text-to-Speech.
Overview
Soniox Text-to-Speech AI supports raw, lossy compressed, and lossless compressed audio formats, so you can pick the best trade-off between quality, bandwidth, and latency.
- Raw →
pcm_f32le,pcm_s16le,pcm_mulaw,pcm_alaw,wav. - Lossless compressed →
flac. - Lossy compressed →
mp3,opus,aac.
Set the audio_format field in your request to choose the output format. See the full format reference below for supported sample rates and bitrates per format.
Raw PCM details
For PCM output, use one of the raw PCM encodings and set:
audio_format— the encoding type (pcm_f32le,pcm_s16le,pcm_mulaw, orpcm_alaw)sample_rate(optional) — output sample rate in Hz. Defaults to the format's default (see full format reference).
Example:
Lossy compressed format details
For lossy compressed formats (mp3, opus, aac), you can set both sample_rate and bitrate independently:
sample_rate(optional) — output sample rate in Hz.bitrate(optional) — codec bitrate in bps.
Both fields fall back to the format's default when omitted (see full format reference).
Example:
Full format reference
Supported sample rates and bitrates for every format. Defaults are shown in bold.
| Format | Sample rates (Hz) | Bitrates (bps) |
|---|---|---|
pcm_f32le | 8000, 16000, 24000, 44100, 48000 | — |
pcm_s16le | 8000, 16000, 24000, 44100, 48000 | — |
pcm_mulaw | 8000 | — |
pcm_alaw | 8000 | — |
wav | 8000, 16000, 24000, 44100, 48000 | — |
flac | 16000, 24000, 44100, 48000 | — |
mp3 | 16000, 24000, 32000, 44100, 48000 | 32000, 64000, 96000, 128000, 192000, 256000, 320000 |
opus | 8000, 16000, 24000, 48000 | 16000, 32000, 64000, 96000, 128000, 256000 |
aac | 16000, 24000, 44100, 48000 | 32000, 64000, 96000, 128000, 192000, 256000, 320000 |