New: Soniox v5 Real-Time is here
API pricing

Fair, flexible pricing.
Built to scale with you.

Pricing calculator

Stop overpaying for speech AI

Sonioxvs

1,000 hours of audio / month

1025501002505001k2.5k5k10k100k

Pricing assumptions

Based on public pay-as-you-go pricing. Enterprise discounts and committed-use contracts may differ. Some providers charge separately for certain features. The calculator uses the public price for the provider configuration that most closely matches Soniox.

Speech-to-Text API

Token-based pricing

All API costs are calculated based on tokens.

Equivalent to about $0.10/hour for async (file) and $0.12/hour for real-time (streaming) transcription.

 
Async (file)
Real-time (streaming)
Input audio tokens

Duration of audio or streaming session

$1.50 per 1M tokens
$2.00 per 1M tokens
Input text tokens

Custom instructions or context you provide

$3.50 per 1M tokens
$4.00 per 1M tokens
Output text tokens

Transcription and optionally translation or other text returned by the model

$3.50 per 1M tokens
$4.00 per 1M tokens

Usage reference:
1 hour of audio is ~30,000 input audio tokens • 1 hour of speech is ~15,000 output text tokens • 1 character of output is ~0.3 tokens

Text-to-Speech API

Token-based pricing

All API costs are calculated based on tokens.

Equivalent to about $0.70/hour of generated speech.

 
Real-time (streaming)
Input text tokens

Text input to generate

$4.00 per 1M tokens
Output audio tokens

Duration of generated audio

$21.50 per 1M tokens

Usage reference:
1 character ≈ 0.3 input text tokens • 15,000 input text tokens ≈ 1 hour of generated speech • 1 hour of generated speech ≈ 30,000 output audio tokens

How the pricing works

Breakthrough innovation is why Soniox costs less

Soniox costs less because the technology is fundamentally more efficient. We built the full speech AI stack ourselves, from models to inference to real-time cloud infrastructure, and optimized every layer to process more audio with lower latency and less wasted compute.

That efficiency is what lets us offer production-grade speech AI at a fraction of the price of traditional providers.

Built for real-time speech AI

Soniox models are built from scratch for real-time speech understanding and generation, not adapted from general-purpose models that waste compute.

Custom inference engine

Our inference stack is built for low-latency audio streaming, batching, scheduling, and GPU utilization, so the same hardware processes more audio at lower cost.

Massive concurrency

The Soniox platform is engineered to run hundreds of thousands of concurrent streams efficiently, turning infrastructure scale into lower prices for every customer.