New: Soniox Text-to-Speech is here
API pricing

Fair, flexible pricing. Built to scale with you.

Speech-to-Text API

Token-based pricing

All API costs are calculated based on tokens.

Equivalent to about $0.10/hour for async (file) and $0.12/hour for real-time (streaming) transcription.

 
Async (file)
Real-time (streaming)
Input audio tokens

Duration of audio or streaming session

$1.50 per 1M tokens
$2.00 per 1M tokens
Input text tokens

Custom instructions or context you provide

$3.50 per 1M tokens
$4.00 per 1M tokens
Output text tokens

Transcription and optionally translation or other text returned by the model

$3.50 per 1M tokens
$4.00 per 1M tokens

Usage reference:
1 hour of audio is ~30,000 input audio tokens • 1 hour of speech is ~15,000 output text tokens • 1 character of output is ~0.3 tokens

Text-to-Speech API

Token-based pricing

All API costs are calculated based on tokens.

Equivalent to about $0.70/hour of generated speech.

 
 
Real-time (streaming)
Input text tokens

Text input to generate

 
$4.00 per 1M tokens
Output audio tokens

Duration of generated audio

 
$21.50 per 1M tokens

Usage reference:
1 character ≈ 0.3 input text tokens • 15,000 input text tokens ≈ 1 hour of generated speech • 1 hour of generated speech ≈ 30,000 output audio tokens