Pricing

Fair, flexible pricing. Built to scale with you.

Speech-to-Text API

Pay only for what you use.

Token-based pricing

All API costs are calculated based on tokens.

Equivalent to about $0.10/hour for async (file) and $0.12/hour for real-time (streaming) transcription.

 
Async (file)
Real-time (streaming)
Input audio tokens

Duration of audio or streaming session

$1.50 per 1M tokens
$2.00 per 1M tokens
Input text tokens

Custom instructions or context you provide

$3.50 per 1M tokens
$4.00 per 1M tokens
Output text tokens

Transcription and optionally translation or other text returned by the model

$3.50 per 1M tokens
$4.00 per 1M tokens

Usage reference:
1 hour of audio is ~30,000 input audio tokens • 1 hour of speech is ~15,000 output text tokens • 1 character of output is ~0.3 tokens