General AISpeech AI


Speech Recognition AI

Most accurate speech recognition AI

We have released groundbreaking speech recognition AI, achieving extreme levels of accuracy and unlocking new possibilities in human-machine interaction. Compared to other providers, Soniox’s speech recognition AI is in a league of its own.

Any audio input and format


Upload files and get back highly accurate transcripts within seconds to minutes. Fast turnaround with large number of files.

Explore docs

Audio format

Soniox automatically detects most common audio formats including mp3, wav, flac, ogg, aac, aiff, amr, asf, and raw PCM samples.

Explore docs

Live streams

Transcribe live streams with the highest accuracy and sub 200ms latency. Best auto-captioning experience with the highest comprehension quality.

Explore docs

Multi-channel audio

Merge multi-channels into one channel or transcribe each channel independently with a single API call.

Explore docs

Complete transcription result

Soniox returns a complete transcription result including the words being recognized, timestamps, confidence scores and speaker tags.

In streaming speech recognition, Soniox returns back "interim results" containing final words and non-final words (can change in the future) as more audio is transcribed.

Explore docs

text: "YouTube";
start_ms: 1450;
duration_ms: 350;
is_final: true;
speaker: 1;
confidence: 0.98;

Speech customization

We invented a novel procedure that effectively and on-the-fly customizes speech recognition AI to the specified context. Simply provide a list of words and phrases and Soniox will automatically recognize them when spoken in audio.

Explore docs

# Create speech context on-the-fly.
speech_context = SpeechContext(
phrases=["acetylcarnitine", "Zestoretic"],

# Pass speech context to transcribe API call.
result = transcribe_file_short(

Support for major languages

We build only high accuracy speech recognition AI solutions that enable you to transcribe any audio and get back highly accurate transcripts.

Support for major languages including English, Spanish, French, Korean, and Chinese.

For all non-English languages, Soniox’s speech recognition AI is a bilingual solution, meaning that it can recognize both the native and English language simultaneously.

See all models and languages

Ready to get started?

Explore Soniox Docs or create an account and start building your audio AI application. You can also contact us to design a custom package for your business.

Always know what you pay

Pay only for what you use. Integrated per-usage pricing with no hidden fees.

Pricing details

Start your integration

Get up and running with Soniox in as little as 5 minutes.

API reference