We conducted an extensive evaluation on word recognition accuracy of different speech recognition providers in the industry. The benchmarks are summarized as follows:
- Providers evaluated: Soniox, Google, AWS, Azure, Rev AI, Deepgram, AssemblyAI, OpenAI Whisper, Speechmatics and NVIDIA Riva.
- Languages evaluated: German and Spanish.
- Processing modes evaluated: asynchronous transcription (file) and streaming transcription.
- Evaluation datasets: real-world datasets varying in acoustic conditions, speaking styles, accents and topics.
- Ground truth transcriptions were transcribed and double-reviewed by humans then normalized to ensure a fair evaluation across different providers.
- Processing modes evaluated: asynchronous transcription (file) and streaming transcription.
- Results:
- Overall, Soniox achieved the most accurate speech recognition results in both async and streaming modes across all German and Spanish datasets. The second place belongs to Speechmatics (German) and Azure (Spanish).
- Soniox achieved 23% higher accuracy on Spanish and 27% higher accuracy on German compared to the second place provider, i.e. about every 4th word misrecognized by Speechmatics or Azure, was correctly recognized by Soniox.
- The lowest overall performance was obtained by Google and AWS. In the middle of the pack are the remaining providers.
- The benchmarks were conducted with a high level of professionalism. We invested significant engineering resources to develop a benchmarking framework that tries to fairly evaluate the accuracy of different speech recognition providers.