Soniox

Models

Learn about latest Text-to-Speech models, changelog, and deprecations.

Soniox Text-to-Speech is built for the hardest parts of speech generation. It delivers native-speaker-quality speech in 60+ languages, with hallucination-free output and accurate pronunciation of alphanumerics such as phone numbers, email addresses, and IDs.

This page lists the currently available models, along with release notes and important updates.


Current models

Model
Type
Status
tts-rt-v1-previewReal-timeActive

Changelog

April 23, 2026

Overview

tts-rt-v1-preview is the first Soniox Text-to-Speech model, released in preview to gather developer feedback and guide further improvements before general availability.

Key capabilities

  • Native-speaker-quality speech in 60+ languages
  • Hallucination-free generation, with no invented words, dropped content, or unexpected substitutions
  • Accurate rendering of alphanumerics such as email addresses, phone numbers, street addresses, IDs, and codes
  • Streaming generation before the sentence ends for ultra-low-latency voice systems
  • Multiple voices that work across all supported languages
  • Configurable audio formats, sample rates, and bitrates
  • Support for both WebSocket and REST APIs