We conducted an extensive evaluation on word recognition accuracy of Soniox and OpenAI Whisper speech recognition AI. The benchmarks are summarized as follows:
- Evaluation datasets: 5 different real-world datasets varying in acoustic conditions, speaking styles, accents and topics in the English language.
- Ground truth transcriptions were transcribed and double-reviewed by humans then normalized to ensure a fair evaluation across different providers.
- Results:
- Soniox achieved the most accurate speech recognition results across all 5 datasets
- Soniox was 32.61% more accurate than Whisper, meaning that on average almost every 3rd word incorrectly recognized by Whisper was correctly recognized by Soniox.
- Whisper sometimes had a high insertion rate and recognized (hallucinated) words not spoken in the audio. Similarly, Whisper sometimes also had a high deletion rate and did not recognize words clearly spoken in the audio.
- The benchmarks were conducted with a high level of professionalism. We invested significant engineering resources to develop a benchmarking framework that tries to fairly evaluate the accuracy of different speech recognition providers.