Build voice apps for the real world.
Power transcription, translation, and speaker-aware understanding that keeps up with natural conversations and holds up in production. One API. 60+ languages. High accuracy, low latency.
One API built for how people actually speak
One API for the world
Transcribe and translate in 60+ languages with production-ready accuracy. No model switching. No per-language setup. No glue code.
Real time that’s actually real time
Get token-level updates in milliseconds. No buffering, no batch lag, no awkward handoffs. Feels live because it is.
Built to handle real speech
Detects speakers. Tracks language shifts. Structures chaos. Works the way people talk, not the way models wish they did.
Everything in one call
No patching together transcription, speaker logic, and translation. Soniox handles the full stream — from raw audio to structured output.
Everything you need to build great voice apps
Multilingual transcription
Convert speech to text in over 60 languages. No per-language setup required.
Real-time translation
Get real-time translated output alongside the original audio stream.
Real-time streaming
Receive token-level transcription and translation in milliseconds.
Two-way translation
Stream multilingual conversations and receive real-time transcription and translation in both directions.
Language detection
Automatically identify the spoken language without preconfiguration.
Speaker diarization
Automatically detect and label individual speakers in any conversation.
Endpoint detection
Track when speakers start and stop talking in real time.
Latency control
Adjust how quickly tokens become final — trade off speed for accuracy to match your real-time needs.
Custom vocabulary
Improve accuracy for names, acronyms, or domain-specific terms.
Async file transcription
Submit audio files via URL or upload for offline processing.
Structured output
Get labeled, speaker-aware output ready for downstream use.
Helping startups and enterprises ship real world voice apps




Power every speech experience, in any language
Transcribe and translate 60+ languages in real time
Get accurate speech-to-text and instant translation. No language config required.
Build fast, responsive voice agents and assistants
Stream audio over Websocket and receive token-level output that stays in sync with users.
Secure medical transcription with custom vocab
Capture clinical conversations with speaker labels, term tuning, and HIPAA-compliant infrastructure, using our REST API.
Generate live captions and subtitle files
Output timestamped, speaker-aware text in formats like SRT or VTT, or display live captions.
Analyze calls with structured transcription
Use custom context to improve accuracy and output labeled, segmented transcripts for QA and insights.
Privacy and compliance, built right in
Never stored, never saved.
Audio stays in memory, everything is processed in real time.
Built for privacy-critical use cases.
SOC 2 Type II–certified and HIPAA-ready from day one.
Trusted where privacy matters most.
Used in industries where speech is sensitive — from healthcare to enterprise.

Get started with the Soniox API
Start building
Create your account and generate an API key. Includes $200 in free credits.
Explore the docs
Find guides, API reference, and code samples to help you build fast.
Join our Discord
Ask questions, get feedback, and connect with other builders.