Audio Format#
Automatic Audio-Format Detection#
Soniox supports and automatically detects most common audio formats from file headers, so you don’t need to manually set audio configs when using the supported file formats.
Supported File Formats#
mp3, wav, flac, ogg, aac, aiff, amr, and asf.
When using supported file formats, the
audio_format
,sample_rate_hertz
andnum_audio_channels
TranscriptionConfig
fields should not be set.
Raw Audio Samples#
It is possible to send raw (PCM) audio samples instead of a container format. The supported formats are listed below. For example, pcm_f32le
means float-32 little endian.
When using a raw format, the following TranscriptionConfig fields must be set.
Field |
Type |
Permitted Values |
---|---|---|
audio_format |
string |
pcm_f32le, pcm_f32be, pcm_s32le, pcm_s32be, pcm_s16le, pcm_s16be, mulaw, alaw |
sample_rate_hertz |
int32 |
2000 to 96000 Hz |
num_audio_channels |
int32 |
1 to 8 |
Example#
This example shows how to transcribe audio encoded in PCM 16-bit little endian at 16 kHz sample rate and using 1 channel.
transcribe_any_stream_audio_format.py
for result in transcribe_stream(
iter_audio(),
client,
model="en_v2_lowlatency",
include_nonfinal=True,
audio_format="pcm_s16le",
sample_rate_hertz=16000,
num_audio_channels=1,
):
transcribe_any_stream_audio_format.js
const stream = speechClient.transcribeStream(
{
model: "en_v2_lowlatency",
audio_format: "pcm_s16le",
sample_rate_hertz: 16000,
num_audio_channels: 1,
include_nonfinal: true
},
onDataHandler,
onEndHandler
);
TranscribeAnyStreamAudioFormat.cs
IAsyncEnumerable<Result> resultsEnumerable = client.TranscribeStream(
EnumerateAudioChunks(),
new TranscriptionConfig
{
Model = "en_v2_lowlatency",
IncludeNonfinal = true,
AudioFormat = "pcm_s16le",
SampleRateHertz = 16000,
NumAudioChannels = 1,
});