Fair, flexible pricing. Built to scale with you.
Speech-to-Text API
Token-based pricing
All API costs are calculated based on tokens.
Equivalent to about $0.10/hour for async (file) and $0.12/hour for real-time (streaming) transcription.
Duration of audio or streaming session
Custom instructions or context you provide
Transcription and optionally translation or other text returned by the model
Usage reference:
1 hour of audio is ~30,000 input audio tokens • 1 hour of speech is ~15,000 output text tokens • 1 character of output is ~0.3 tokens
Text-to-Speech API
Token-based pricing
All API costs are calculated based on tokens.
Equivalent to about $0.70/hour of generated speech.
Text input to generate
Duration of generated audio
Usage reference:
1 character ≈ 0.3 input text tokens • 15,000 input text tokens ≈ 1 hour of generated speech • 1 hour of generated speech ≈ 30,000 output audio tokens