Soniox has conducted an extensive evaluation on word recognition accuracy of different speech-to-text providers in the industry. The benchmarks are summarized as follows:
- Providers evaluated: Soniox, Google, AWS, Azure, Rev AI, Deepgram, AssemblyAI, OpenAI and Speechmatics.
- Processing modes evaluated: asynchronous transcription (file) and streaming transcription.
- Evaluation datasets: 4 different real-world datasets varying in acoustic conditions, speaking styles, accents and topics in the English language.
- Ground truth transcriptions were transcribed and double-reviewed by humans then normalized to ensure a fair evaluation across different providers.
- Results:
- Overall, Soniox achieved the most accurate speech recognition results in both async and streaming modes across all 4 datasets, followed by Azure and Speechmatics. The lowest overall performance was obtained by Deepgram, AssemblyAI and Google. In the middle of the pack are Rev AI and AWS.
- In streaming mode, Soniox leads with a wider margin compared to other providers. AssemblyAI and Deepgram had the lowest performance in the streaming mode.
- The benchmarks were conducted with a high level of professionalism. Hundreds of hours of human time were invested to develop a benchmarking framework that tries to fairly evaluate the accuracy of different speech-to-text providers.