Soniox
Shared concepts

Audio formats

Supported audio formats for Soniox Text-to-Speech.

Overview

Soniox Text-to-Speech AI supports raw, lossy compressed, and lossless compressed audio formats, so you can pick the best trade-off between quality, bandwidth, and latency.

  • Rawpcm_f32le, pcm_s16le, pcm_mulaw, pcm_alaw, wav.
  • Lossless compressedflac.
  • Lossy compressedmp3, opus, aac.

Set the audio_format field in your request to choose the output format. See the full format reference below for supported sample rates and bitrates per format.


Raw PCM details

For PCM output, use one of the raw PCM encodings and set:

  • audio_format — the encoding type (pcm_f32le, pcm_s16le, pcm_mulaw, or pcm_alaw)
  • sample_rate (optional) — output sample rate in Hz. Defaults to the format's default (see full format reference).

Example:

{
  "audio_format": "pcm_s16le",
  "sample_rate": 16000
}

Lossy compressed format details

For lossy compressed formats (mp3, opus, aac), you can set both sample_rate and bitrate independently:

  • sample_rate (optional) — output sample rate in Hz.
  • bitrate (optional) — codec bitrate in bps.

Both fields fall back to the format's default when omitted (see full format reference).

Example:

{
  "audio_format": "mp3",
  "sample_rate": 44100,
  "bitrate": 128000
}

Full format reference

Supported sample rates and bitrates for every format. Defaults are shown in bold.

FormatSample rates (Hz)Bitrates (bps)
pcm_f32le8000, 16000, 24000, 44100, 48000
pcm_s16le8000, 16000, 24000, 44100, 48000
pcm_mulaw8000
pcm_alaw8000
wav8000, 16000, 24000, 44100, 48000
flac16000, 24000, 44100, 48000
mp316000, 24000, 32000, 44100, 4800032000, 64000, 96000, 128000, 192000, 256000, 320000
opus8000, 16000, 24000, 4800016000, 32000, 64000, 96000, 128000, 256000
aac16000, 24000, 44100, 4800032000, 64000, 96000, 128000, 192000, 256000, 320000