Pricing
Fair, flexible pricing. Built to scale with you.
Speech-to-Text API
Pay only for what you use.
Token-based pricing
All API costs are calculated based on tokens.
Equivalent to about $0.10/hour for async (file) and $0.12/hour for real-time (streaming) transcription.
Async (file)
Real-time (streaming)
Input audio tokens
Duration of audio or streaming session
$1.50 per 1M tokens
$2.00 per 1M tokens
Input text tokens
Custom instructions or context you provide
$3.50 per 1M tokens
$4.00 per 1M tokens
Output text tokens
Transcription and optionally translation or other text returned by the model
$3.50 per 1M tokens
$4.00 per 1M tokens
Usage reference:
1 hour of audio is ~30,000 input audio tokens • 1 hour of speech is ~15,000 output text tokens • 1 character of output is ~0.3 tokens