Speech Recognition AI
Most accurate speech recognition AI
We have released groundbreaking speech recognition AI, achieving extreme levels of accuracy and unlocking new possibilities in human-machine interaction. Compared to other providers, Soniox’s speech recognition AI is in a league of its own.
Any audio input and format
Files
Upload files and get back highly accurate transcripts within seconds to minutes. Fast turnaround with large number of files.
Audio format
Soniox automatically detects most common audio formats including mp3, wav, flac, ogg, aac, aiff, amr, asf, and raw PCM samples.
Live streams
Transcribe live streams with the highest accuracy and sub 200ms latency. Best auto-captioning experience with the highest comprehension quality.
Multi-channel audio
Merge multi-channels into one channel or transcribe each channel independently with a single API call.
Complete transcription result
Soniox returns a complete transcription result including the words being recognized, timestamps, confidence scores and speaker tags.
In streaming speech recognition, Soniox returns back "interim results" containing final words and non-final words (can change in the future) as more audio is transcribed.
{
text: "YouTube";
start_ms: 1450;
duration_ms: 350;
is_final: true;
speaker: 1;
confidence: 0.98;
}
Speech customization
We invented a novel procedure that effectively and on-the-fly customizes speech recognition AI to the specified context. Simply provide a list of words and phrases and Soniox will automatically recognize them when spoken in audio.
speech_context = SpeechContext(
entries=[
SpeechContextEntry(
phrases=["acetylcarnitine", "Zestoretic"],
boost=15,
)
]
)
# Pass speech context to transcribe API call.
result = transcribe_file_short(
"../test_data/acetylcarnitine_zestoretic.flac",
client,
model=en_v2,
speech_context=speech_context,
)
Support for major languages
We build only high accuracy speech recognition AI solutions that enable you to transcribe any audio and get back highly accurate transcripts.
Support for major languages including English, Spanish, French, Korean, and Chinese.
For all non-English languages, Soniox’s speech recognition AI is a bilingual solution, meaning that it can recognize both the native and English language simultaneously.
Ready to get started?
Explore Soniox Docs or create an account and start building your audio AI application. You can also contact us to design a custom package for your business.
Always know what you pay
Pay only for what you use. Integrated per-usage pricing with no hidden fees.