Models

Soniox Speech-to-Text AI provides multiple models for real-time and asynchronous transcription and translation. This page lists the currently available models, their capabilities, and important updates.

Current models

Model	Type	Status
stt-rt-v4	Real-time	Active
stt-async-v4	Async	Active
stt-rt-v3	Real-time	Active (After 2026-02-28, requests will automatically route to `stt-rt-v4` with no service interruption. No API changes required.)
stt-async-v3	Async	Active (After 2026-02-28, requests will automatically route to `stt-async-v4` with no service interruption. No API changes required.)

Aliases

Aliases provide a stable reference so you don’t need to change your code when newer versions are released.

Alias	Points to	Notes
stt-rt-v3-preview	`stt-rt-v3`	Always points to the latest real-time active model
stt-rt-preview-v2	`stt-rt-v3`
stt-async-preview-v1	`stt-async-v3`

Changelog

February 5, 2026

New models: stt-rt-v4

Replaces: stt-rt-v3

Overview

Soniox v4 Real-Time is a next-generation real-time speech recognition model built for low-latency voice interactions. It delivers speaker-native accuracy across 60+ languages with improved latency, reliability, and conversational behavior. The model is production-ready and fully backward-compatible with v3 Real-Time.

Key improvements

Higher accuracy across all supported languages
Better multilingual detection and mid-sentence language switching
Lower endpoint latency with faster final transcription
Improved semantic endpointing for more natural turn-taking
Lower manual finalization latency with faster final transcription
More stable, higher-quality transcription on long and multi-hour recordings
Stronger use of provided context for domain-specific accuracy
More fluent, accurate, and consistent translation across all supported languages
Added max_endpoint_delay_ms for controlling end-of-speech endpoint delay

API compatibility

The stt-rt-v4 model is fully compatible with the existing stt-rt-v3 model and Soniox API
To upgrade, simply replace the model name in your API request:
- { "model": "stt-rt-v4" } for real-time

Deprecation notice

The stt-rt-v3 model will be removed on February 28, 2026
After February 28, 2026, requests will automatically route to stt-rt-v4 with no service interruption. No API changes required

January 29, 2026

New models: stt-async-v4

Replaces: stt-async-v3

Overview

Soniox v4 Async is the latest generation of Soniox’s asynchronous speech recognition and translation model. This release delivers a significant improvement in accuracy, robustness, and multilingual performance across more than 60 languages. v4 Async reaches human-parity transcription quality in real-world scenarios, while also introducing stronger long-form processing, improved speaker diarization, richer context handling, and higher-quality translation output. The model is designed for production-scale workloads and consistent, high-fidelity results across diverse acoustic environments and language mixes.

Key improvements

Higher transcription accuracy across all languages, reaching speaker-native quality in many domains
More robust performance in noise, accents, overlapping speech, and poor audio
Better language identification and smoother mid-sentence language switching
Improved speaker separation and more consistent labeling in multi-speaker audio
Better normalization of dates, numbers, phone/email addresses, and other structured content
More stable, higher-quality transcription on long and multi-hour recordings
Stronger use of provided context for domain-specific accuracy
More fluent, accurate, and consistent translation across all supported languages

API compatibility

The stt-async-v4 model is fully compatible with the existing stt-async-v3 model and Soniox API
To upgrade, simply replace the model name in your API request:
- { "model": "stt-async-v4" } for async

Deprecation notice

The stt-async-v3 model will be removed on February 28, 2026
After February 28, 2026, requests will automatically route to stt-async-v4 with no service interruption. No API changes required

October 31, 2025

Model retirement and upgrade

We have accelerated the retirement of older models following the overwhelmingly positive response to the new v3 models. The following models have been retired:

stt-async-preview-v1
stt-rt-preview-v2

Both models have been aliased to the new Soniox v3 models. This means all existing requests using the old model names are now automatically served with v3, giving every user our most accurate, capable, and intelligent voice AI experience, without any code changes required.

Context compatibility

The context feature is now backward compatible with v3 models, ensuring smooth migration from older versions. However, we strongly recommend updating to the new context structure for best results and future flexibility. Learn more about context.

October 29, 2025

Model update: v3 enhancements

Applies to: stt-rt-v3, stt-async-v3

New features

Extended audio duration support: both real-time (stt-rt-v3) and asynchronous (stt-async-v3) models now support audio up to 5 hours in a single request.

Quality improvements

Higher transcription accuracy across challenging audio conditions and diverse languages.

Notes

No API changes are required; existing integrations continue to work seamlessly.
For asynchronous processing, large files up to 5 hours can now be uploaded directly without chunking.
For real-time streaming, sessions up to 5 hours are supported under the same WebSocket connection.

Higher transcription accuracy across 60+ languages
Improved multilingual switching — seamless recognition when speakers change language mid-sentence
Significantly higher translation quality, especially for languages such as German and Korean
The async model now also supports translation
Support for new advanced structured context, enabling richer domain- and task-specific adaptation
Enhanced alphanumeric accuracy (addresses, IDs, codes, serials)
More accurate speaker diarization, even in overlapping speech
Extended maximum audio duration to 5 hours for both async and real-time models

API compatibility

The v3 models are fully compatible with the existing Soniox API, if you are not using the context feature.
To upgrade, simply replace the model name in your API request:
- { "model": "stt-rt-v3" } for real-time
- { "model": "stt-async-v3" } for async
If you are using the context feature, update to the new structured context for improved accuracy.

Deprecation notice

The following preview models are deprecated and will be retired on November 30, 2025:

stt-async-preview-v1
stt-rt-preview-v2

Please migrate to the v3 models before that date to ensure uninterrupted service.

August 15, 2025

Deprecated stt-rt-preview-v1

August 5, 2025

Released stt-rt-preview-v2
- Higher transcription accuracy
- Improved translation quality
- Expanded to support all translation pairs
- More reliable automatic language switching
- Replaces: stt-rt-preview-v2, stt-async-preview-v1

Current models

Aliases

Changelog

February 5, 2026

Overview

Key improvements

API compatibility

Deprecation notice

January 29, 2026

Overview

Key improvements

API compatibility

Deprecation notice

October 31, 2025

Model retirement and upgrade

Context compatibility

October 29, 2025

New features

Quality improvements

Notes

October 21, 2025

Overview

Key improvements

API compatibility

Deprecation notice

August 15, 2025

August 5, 2025

On this page