# Soniox docs
Documentation from the [soniox.com/docs](https://soniox.com/docs) website.
## Links to docs content pages:
- [Community and support](https://soniox.com/docs/community-and-support)
- [FAQ](https://soniox.com/docs/faq)
- [Introduction](https://soniox.com/docs/)
- [AI engineering](https://soniox.com/docs/stt/ai-engineering)
- [Data residency](https://soniox.com/docs/stt/data-residency)
- [Get started](https://soniox.com/docs/stt/get-started)
- [Models](https://soniox.com/docs/stt/models)
- [Security and privacy](https://soniox.com/docs/stt/security-and-privacy)
- [React Native SDK](https://soniox.com/docs/stt/SDKs/react-native-SDK)
- [Async transcription](https://soniox.com/docs/stt/async/async-transcription)
- [Async translation](https://soniox.com/docs/stt/async/async-translation)
- [Error handling](https://soniox.com/docs/stt/async/error-handling)
- [Limits & quotas](https://soniox.com/docs/stt/async/limits-and-quotas)
- [Webhooks](https://soniox.com/docs/stt/async/webhooks)
- [Confidence scores](https://soniox.com/docs/stt/concepts/confidence-scores)
- [Context](https://soniox.com/docs/stt/concepts/context)
- [Language hints](https://soniox.com/docs/stt/concepts/language-hints)
- [Language identification](https://soniox.com/docs/stt/concepts/language-identification)
- [Language restrictions](https://soniox.com/docs/stt/concepts/language-restrictions)
- [Speaker diarization](https://soniox.com/docs/stt/concepts/speaker-diarization)
- [Supported languages](https://soniox.com/docs/stt/concepts/supported-languages)
- [Timestamps](https://soniox.com/docs/stt/concepts/timestamps)
- [Soniox Live](https://soniox.com/docs/stt/demo-apps/soniox-live)
- [API reference](https://soniox.com/docs/stt/api-reference)
- [WebSocket API](https://soniox.com/docs/stt/api-reference/websocket-api)
- [Connection keepalive](https://soniox.com/docs/stt/rt/connection-keepalive)
- [Endpoint detection](https://soniox.com/docs/stt/rt/endpoint-detection)
- [Error handling](https://soniox.com/docs/stt/rt/error-handling)
- [Limits & quotas](https://soniox.com/docs/stt/rt/limits-and-quotas)
- [Manual finalization](https://soniox.com/docs/stt/rt/manual-finalization)
- [Real-time transcription](https://soniox.com/docs/stt/rt/real-time-transcription)
- [Real-time translation](https://soniox.com/docs/stt/rt/real-time-translation)
- [Community integrations](https://soniox.com/docs/stt/integrations/community-integrations)
- [Integrations](https://soniox.com/docs/stt/integrations)
- [LiveKit](https://soniox.com/docs/stt/integrations/livekit)
- [n8n](https://soniox.com/docs/stt/integrations/n8n)
- [Pipecat](https://soniox.com/docs/stt/integrations/pipecat)
- [TanStack AI SDK](https://soniox.com/docs/stt/integrations/tanstack-ai-sdk)
- [Twilio](https://soniox.com/docs/stt/integrations/twilio)
- [Vercel AI SDK](https://soniox.com/docs/stt/integrations/vercel-ai-sdk)
- [Direct stream](https://soniox.com/docs/stt/guides/direct-stream)
- [Proxy stream](https://soniox.com/docs/stt/guides/proxy-stream)
- [Async transcription with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/async-transcription)
- [Handling files with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/files)
- [Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK)
- [Real-time transcription with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/realtime-transcription)
- [Handling webhooks with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/webhooks)
- [Async transcription with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/async-transcription)
- [Handling files with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/files)
- [Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK)
- [Real-time transcription with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/realtime-transcription)
- [Sync vs async clients](https://soniox.com/docs/stt/SDKs/python-SDK/sync-vs-async-clients)
- [Handling webhooks with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/webhooks)
- [React SDK](https://soniox.com/docs/stt/SDKs/react-SDK)
- [Real-time transcription with React SDK](https://soniox.com/docs/stt/SDKs/react-SDK/realtime-transcription)
- [Web SDK](https://soniox.com/docs/stt/SDKs/web-SDK)
- [Real-time transcription with Web SDK](https://soniox.com/docs/stt/SDKs/web-SDK/realtime-transcription)
- [LangChain.js (JavaScript)](https://soniox.com/docs/stt/integrations/langchain/langchain-js)
- [LangChain (Python)](https://soniox.com/docs/stt/integrations/langchain/langchain)
- [Classes](https://soniox.com/docs/stt/SDKs/node-SDK/reference/classes)
- [Full Node SDK reference](https://soniox.com/docs/stt/SDKs/node-SDK/reference)
- [Types](https://soniox.com/docs/stt/SDKs/node-SDK/reference/types)
- [Async Client](https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/async_client)
- [Realtime Client](https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/realtime_client)
- [Types](https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/types)
- [Full React SDK reference](https://soniox.com/docs/stt/SDKs/react-SDK/reference)
- [Types](https://soniox.com/docs/stt/SDKs/react-SDK/reference/types)
- [Classes](https://soniox.com/docs/stt/SDKs/web-SDK/reference/classes)
- [Full Web SDK reference](https://soniox.com/docs/stt/SDKs/web-SDK/reference)
- [Types](https://soniox.com/docs/stt/SDKs/web-SDK/reference/types)
- [Create temporary API key](https://soniox.com/docs/stt/api-reference/auth/create_temporary_api_key)
- [Delete file](https://soniox.com/docs/stt/api-reference/files/delete_file)
- [Get file](https://soniox.com/docs/stt/api-reference/files/get_file)
- [Get files](https://soniox.com/docs/stt/api-reference/files/get_files)
- [Upload file](https://soniox.com/docs/stt/api-reference/files/upload_file)
- [Get models](https://soniox.com/docs/stt/api-reference/models/get_models)
- [Create transcription](https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription)
- [Delete transcription](https://soniox.com/docs/stt/api-reference/transcriptions/delete_transcription)
- [Get transcription](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription)
- [Get transcription transcript](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription_transcript)
- [Get transcriptions](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcriptions)
# Community and support
URL: /community-and-support
Engage with our community to explore new updates, participate in discussions, contribute to our projects, and report any issues you encounter.
import { LinkCard } from "@/components/link-card";
import { GitHubIcon } from "@/components/github-icon";
## Support
We offer three levels of support depending on your plan:
**Free**
Community-driven support through our [Discord server](https://discord.gg/rWfnk9uM5j).
**Business**
Priority email support and onboarding assistance.
Please contact [sales@soniox.com](mailto:sales@soniox.com) for more information.
**Enterprise**
Dedicated support channels, defined response-time SLAs, and escalation paths for production deployments.
Please contact [sales@soniox.com](mailto:sales@soniox.com) for more information.
***
## Github
We use GitHub to track issues related to official Soniox SDKs and integrations.
Check our [Soniox GitHub](https://github.com/soniox) profile for all available code.
***
## Website
For more information about our products, pricing or Soniox in general, visit out [website](https://soniox.com/).
# FAQ
URL: /faq
Common troubleshooting guidance and answers for integrating with the Soniox API.
This page answers common questions related to integrating with the Soniox API.
High WebSocket connection startup time is usually caused by one or more of the following:
* **Network latency:** High round-trip time between your client and Soniox increases the duration of the TLS handshake and WebSocket upgrade.
* **Region selection:** Using an endpoint located far from your compute environment adds unnecessary cross-region latency during connection establishment. See [Data residency](/stt/data-residency) for more info.
* **Large initial context payload:** Sending a large [context](/stt/concepts/context) during initialization delays readiness because the server must fully receive and process the payload before the session becomes active.
To minimize perceived startup delay, you should always **buffer audio locally before the WebSocket connection is established** and immediately stream all buffered audio chunks after sending the initial configuration message.
You can request a limit increase from the [Soniox Console organization limits](https://console.soniox.com/org/limits) page. Requests are reviewed within 1-3 business days.
Yes. Soniox provides standard legal and compliance documentation for companies integrating the Soniox API into their products or services. This may include an MSA, DPA, and security or compliance documentation required for procurement or security review processes.
Documentation is available for Business and Enterprise customers. Please contact [sales@soniox.com](mailto:sales@soniox.com) to request access or begin the review process.
# Introduction
URL: /
Soniox provides powerful, production-ready APIs for transcribing, translating, and understanding audio content.
import { LinkCards, SpeechToTextIcon } from "@/components/link-card";
import { Step, Steps } from "fumadocs-ui/components/steps";
## Get started with Soniox APIs
Welcome to Soniox — the fastest, most accurate platform for audio and speech intelligence.
Soniox provides powerful, production-ready APIs for transcribing, translating, and understanding audio content.
Whether you are building real-time voice interfaces, analyzing large volumes of audio,
or extracting structured insights from speech, Soniox gives you the tools to do it efficiently and at scale.
You can integrate Soniox into your product, workflow, or pipeline using simple REST or WebSocket APIs,
with support for multiple SDKs and real-time streaming.
## Products
,
description:
"Transcribe and translate speech in 60+ languages with world-leading accuracy and real-time performance. Supports file and real-time modes, high-quality translation with super-low latency, speaker diarization, and advanced customization.",
arrowInSeparateLine: true,
titleSize: "text-xl",
},
]}
/>
## Before you begin
To start using Soniox, create a [Soniox account](https://console.soniox.com/signup/). Visit the [Soniox Console](https://console.soniox.com/) to generate and manage API keys, view usage, logs, and billing. Soniox Console is your self-service control center for everything Soniox.
# AI engineering
URL: /stt/ai-engineering
Using MCP, AI assistant, and LLMs with Soniox for AI-powered development
import Image from "next/image";
Soniox provides easy-to-use AI tools that help you explore documentation, generate code, and get guidance, even if you're new to programming. These tools work directly with your coding environment, so you can focus on building instead of searching for answers.
With Soniox AI engineering, you can:
* Browse documentation via the **MCP server** without leaving your coding tools
* Ask the **AI assistant** for explanations, examples, or code help
* Use **LLM context files** so AI models understand Soniox APIs and examples
* Copy page content or open it directly in your preferred AI tool
These features reduce friction, help you learn faster, and make working with Soniox APIs simple and efficient.
***
## MCP server
The **MCP server** lets you access Soniox documentation right from tools like Cursor, Windsurf, or Claude Code. You can search guides, view examples, and explore APIs without switching windows.
### How to set it up
Add the following configuration to your coding tool:
```json
"soniox-docs": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://soniox.com/docs/api/mcp/mcp"
]
}
```
Follow your tool's instructions for adding a remote server. Once set up, you can quickly explore Soniox docs and code samples from within your coding environment.
***
## AI assistant
The **Soniox AI assistant** is available directly from the docs. It can:
* Answer questions about Soniox APIs
* Explain example code or suggest modifications
* Provide guidance in context, so you don't need to guess
Even if you're new to programming, the AI assistant can help you understand code and API workflows quickly.
***
## LLM context files
Soniox provides two files that give AI models context about our APIs and examples:
* [llms.txt](/llms.txt) – core context for general tasks
* [llms-full.txt](/llms-full.txt) – extended context for advanced workflows
Adding these files to your AI tool ensures the model can provide accurate, context-aware help.
***
## Copy and open buttons
At the top of each documentation page, the **Copy page** button makes it easy to bring content into your workflow:
* **Copy Markdown** – copy the full page content instantly
* **Open in ChatGPT or Claude** – send the page context for live AI interaction
These features help you experiment and learn by bringing examples and documentation directly into your coding environment.
***
For more information about Soniox products, pricing, or general resources, visit our [website](https://soniox.com/).
# Data residency
URL: /stt/data-residency
Learn about data residency.
Soniox keeps your data yours. Any content you send to the Soniox API — audio, transcripts, or metadata — is **never used to train or improve our models.** For more information, see our [Security and privacy](/stt/security-and-privacy).
***
## What is data residency
Data residency lets you choose **where** Soniox processes and stores your content. When you select a region for a project, **all audio and transcript data for that project stays in that region** — for both processing and storage.
To get access to regional deployments, contact us: [sales@soniox.com](mailto:sales@soniox.com).
***
## How data residency works
When data residency is enabled for your account:
* You choose a **region** when creating a new project.
* Any API requests made using that project's API key are handled **fully within the selected region.**
* All **content data** (audio + transcripts) remains within that region for processing and storage.
### System data
Data residency **does not apply to system data** such as account and project metadata, usage statistics, and billing data.
This system data may be processed outside the selected region.
**Your content (audio + transcripts) never leaves the region you choose.**
***
## Using data residency
Data residency is set **per project** within your Soniox organization.
### 1. Create a project with a region
When creating a new project:
* Select the region from the **region** dropdown.
* Each project receives region-specific API keys.
### 2. Use the region-specific API domain
To ensure processing stays in the region, use:
* The **API key** from the regional project.
* The **correct API domain** for that region (see below).
***
## Regional endpoints
| Region | Regional storage | Regional processing | Capabilities | API domain |
| ------------------ | ---------------- | ------------------- | ------------------ | ------------------------------------------------- |
| **United States** | ✅ Yes | ✅ Yes | Full API supported | `api.soniox.com` `stt-rt.soniox.com` |
| **European Union** | ✅ Yes | ✅ Yes | Full API supported | `api.eu.soniox.com` `stt-rt.eu.soniox.com` |
| **Japan** | ✅ Yes | ✅ Yes | Full API supported | `api.jp.soniox.com` `stt-rt.jp.soniox.com` |
If you'd like help enabling data residency or need a custom region, reach out: [sales@soniox.com](mailto:sales@soniox.com).
# Get started
URL: /stt/get-started
Learn how to use Soniox Speech-to-Text API.
## Learn how to use Soniox API in minutes
Soniox Speech-to-Text is a **universal speech AI** that lets you transcribe and
translate speech in 60+ languages — from recorded files (async) or live audio
streams (real-time). Languages can be freely mixed within the same conversation,
and Soniox will handle them seamlessly with high accuracy and low latency.
In just a few steps, you can run your first transcription or translation. The
examples also cover advanced features such as speaker diarization, real-time
translation, context customization, and automatic language identification — all
through the same simple API.
### Get API key
Create a [Soniox account](https://console.soniox.com/signup) and log in to
the [Console](https://console.soniox.com) to get your API key.
API keys are created per project. In the Console, go to **My First Project** and click **API Keys** to generate one.
Export it as an environment variable (replace with your key):
```sh title="Terminal"
export SONIOX_API_KEY=
```
### Get examples
Clone the official examples repo:
```sh title="Terminal"
git clone https://github.com/soniox/soniox_examples
cd soniox_examples/speech_to_text
```
### Run examples
Choose your language and run the ready-to-use examples below.
{/* TABLE START */}
{/* NOTE: Width is set so that we have maximum of 2 lines in 'Example' column. */}
{/* NOTE: Font size is set so the table doesn't look "too big". */}
|
Example
| What it does | Output |
| ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| **Real-time transcription** | Transcribes speech in any language in real-time. | Transcript streamed to console. |
| **Real-time one-way translation** | Transcribes speech in any language and translates it into Spanish in real-time. | Transcript + Spanish translation streamed together. |
| **Real-time two-way translation** | Transcribes speech in any language and translates English ↔ Spanish in real-time. Spanish → English, English → Spanish. | Transcript + bidirectional translations streamed together. |
| **Transcribe file from URL** | Transcribes an audio file directly from a public URL. | Transcript printed to console. |
| **Transcribe local file** | Uploads and transcribes an audio file from your computer. | Transcript printed to console. |
{/* TABLE END */}
{/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal"
cd python_sdk
# Set up environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Real-time examples
python soniox_sdk_realtime.py --audio_path ../assets/coffee_shop.mp3
python soniox_sdk_realtime.py --audio_path ../assets/coffee_shop.mp3 --translation one_way
python soniox_sdk_realtime.py --audio_path ../assets/two_way_translation.mp3 --translation two_way
# Async examples
python soniox_sdk_async.py --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
python soniox_sdk_async.py --audio_path ../assets/coffee_shop.mp3
```
{/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal"
cd nodejs_sdk
# Install dependencies
npm install
# Real-time examples
node soniox_sdk_realtime.js --audio_path ../assets/coffee_shop.mp3
node soniox_sdk_realtime.js --audio_path ../assets/coffee_shop.mp3 --translation one_way
node soniox_sdk_realtime.js --audio_path ../assets/two_way_translation.mp3 --translation two_way
# Async examples
node soniox_sdk_async.js --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
node soniox_sdk_async.js --audio_path ../assets/coffee_shop.mp3
```
{/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal"
cd python
# Set up environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Real-time examples
python soniox_realtime.py --audio_path ../assets/coffee_shop.mp3
python soniox_realtime.py --audio_path ../assets/coffee_shop.mp3 --translation one_way
python soniox_realtime.py --audio_path ../assets/two_way_translation.mp3 --translation two_way
# Async examples
python soniox_async.py --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
python soniox_async.py --audio_path ../assets/coffee_shop.mp3
```
{/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal"
cd nodejs
# Install dependencies
npm install
# Real-time examples
node soniox_realtime.js --audio_path ../assets/coffee_shop.mp3
node soniox_realtime.js --audio_path ../assets/coffee_shop.mp3 --translation one_way
node soniox_realtime.js --audio_path ../assets/two_way_translation.mp3 --translation two_way
# Async examples
node soniox_async.js --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
node soniox_async.js --audio_path ../assets/coffee_shop.mp3
```
## Next steps
* **Dive into the [Real-time API](/stt/rt/real-time-transcription)** → Run live transcription, translations, and endpoint detection.
* **Explore the [Async API](/stt/async/async-transcription)** → Transcribe and translate (recorded) files at scale and integrate with webhooks.
# Models
URL: /stt/models
Learn about latest models, changelog, and deprecations.
Soniox Speech-to-Text **AI** provides multiple models for real-time and asynchronous
transcription and translation. This page lists the currently available models,
their capabilities, and important updates.
***
## Current models
{/*TABLE START */}
{/* NOTE: Width is set so that we have maximum of 2 lines in 'Example' column. */}
{/* NOTE: Font size is set so the table doesn't look "too big".*/}
|
Model
| {" "}
Type
| Status |
| -------------------------------------------- | ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| **stt-rt-v4** | Real-time | **Active** |
| **stt-async-v4** | Async | **Active** |
| **stt-rt-v3** | Real-time | **Active** (After 2026-02-28, requests will automatically route to `stt-rt-v4` with no service interruption. No API changes required.) |
| **stt-async-v3** | Async | **Active** (After 2026-02-28, requests will automatically route to `stt-async-v4` with no service interruption. No API changes required.) |
{/* TABLE END */}
***
## Aliases
Aliases provide a stable reference so you don’t need to change your code when newer versions are released.
| Alias | Points to | Notes |
| ------------------------ | -------------- | -------------------------------------------------- |
| **stt-rt-v3-preview** | `stt-rt-v3` | Always points to the latest real-time active model |
| **stt-rt-preview-v2** | `stt-rt-v3` | |
| **stt-async-preview-v1** | `stt-async-v3` | |
***
## Changelog
### February 5, 2026
**New models:** stt-rt-v4
**Replaces:** stt-rt-v3
#### Overview
**Soniox v4 Real-Time** is a next-generation real-time speech recognition model built for low-latency voice interactions.
It delivers speaker-native accuracy across 60+ languages with improved latency, reliability, and conversational behavior.
The model is production-ready and fully backward-compatible with v3 Real-Time.
#### Key improvements
* Higher accuracy across all supported languages
* Better multilingual detection and mid-sentence language switching
* Lower endpoint latency with faster final transcription
* Improved semantic endpointing for more natural turn-taking
* Lower manual finalization latency with faster final transcription
* More stable, higher-quality transcription on long and multi-hour recordings
* Stronger use of provided context for domain-specific accuracy
* More fluent, accurate, and consistent translation across all supported languages
* Added `max_endpoint_delay_ms` for controlling end-of-speech endpoint delay
#### API compatibility
* The stt-rt-v4 model is fully compatible with the existing stt-rt-v3 model and Soniox API
* To upgrade, simply replace the model name in your API request:
* `{ "model": "stt-rt-v4" }` for real-time
#### Deprecation notice
* The stt-rt-v3 model will be removed on February 28, 2026
* After February 28, 2026, requests will automatically route to stt-rt-v4 with no service interruption. No API changes required
### January 29, 2026
**New models:** stt-async-v4
**Replaces:** stt-async-v3
#### Overview
**Soniox v4 Async** is the latest generation of Soniox’s asynchronous speech recognition and translation model. This release delivers a significant improvement in accuracy, robustness, and multilingual performance across more than 60 languages. v4 Async reaches human-parity transcription quality in real-world scenarios, while also introducing stronger long-form processing, improved speaker diarization, richer context handling, and higher-quality translation output. The model is designed for production-scale workloads and consistent, high-fidelity results across diverse acoustic environments and language mixes.
#### Key improvements
* Higher transcription accuracy across all languages, reaching speaker-native quality in many domains
* More robust performance in noise, accents, overlapping speech, and poor audio
* Better language identification and smoother mid-sentence language switching
* Improved speaker separation and more consistent labeling in multi-speaker audio
* Better normalization of dates, numbers, phone/email addresses, and other structured content
* More stable, higher-quality transcription on long and multi-hour recordings
* Stronger use of provided context for domain-specific accuracy
* More fluent, accurate, and consistent translation across all supported languages
#### API compatibility
* The stt-async-v4 model is fully compatible with the existing stt-async-v3 model and Soniox API
* To upgrade, simply replace the model name in your API request:
* `{ "model": "stt-async-v4" }` for async
#### Deprecation notice
* The stt-async-v3 model will be removed on February 28, 2026
* After February 28, 2026, requests will automatically route to stt-async-v4 with no service interruption. No API changes required
### October 31, 2025
#### Model retirement and upgrade
We have accelerated the retirement of older models following the overwhelmingly positive response to the new v3 models. The following models have been retired:
* stt-async-preview-v1
* stt-rt-preview-v2
Both models have been **aliased to the new Soniox v3 models.**
This means all existing requests using the old model names are now automatically served with v3, giving every user our most accurate, capable, and intelligent voice AI experience, without any code changes required.
#### Context compatibility
The context feature is now backward compatible with v3 models, ensuring smooth migration from older versions. However, we **strongly recommend updating to the new context** structure for best results and future flexibility. Learn more about [context](/stt/concepts/context).
### October 29, 2025
**Model update:** v3 enhancements
**Applies to:** stt-rt-v3, stt-async-v3
#### New features
* **Extended audio duration support:** both real-time (stt-rt-v3) and asynchronous (stt-async-v3) models now support **audio up to 5 hours** in a single request.
#### Quality improvements
* **Higher transcription accuracy** across challenging audio conditions and diverse languages.
#### Notes
* No API changes are required; existing integrations continue to work seamlessly.
* For asynchronous processing, large files up to 5 hours can now be uploaded directly without chunking.
* For real-time streaming, sessions up to 5 hours are supported under the same WebSocket connection.
### October 21, 2025
**New models:** stt-rt-v3, stt-async-v3
**Replaces:** stt-rt-preview-v2, stt-async-preview-v1
#### Overview
The **v3 models** introduce major improvements across recognition, translation, and reasoning — making Soniox faster, more accurate, and more capable than ever before.
These models power real-time and asynchronous speech processing in 60+ languages, with enhanced accuracy, robustness, and context understanding.
#### Key improvements
* Higher transcription accuracy across 60+ languages
* Improved multilingual switching — seamless recognition when speakers change language mid-sentence
* Significantly higher translation quality, especially for languages such as German and Korean
* The async model now also supports translation
* Support for new advanced structured context, enabling richer domain- and task-specific adaptation
* Enhanced alphanumeric accuracy (addresses, IDs, codes, serials)
* More accurate speaker diarization, even in overlapping speech
* Extended maximum audio duration to 5 hours for both async and real-time models
#### API compatibility
* The v3 models are fully compatible with the existing Soniox API, if you are not using the context feature.
* To upgrade, simply replace the model name in your API request:
* `{ "model": "stt-rt-v3" }` for real-time
* `{ "model": "stt-async-v3" }` for async
* If you are using the context feature, update to the new structured [context](/stt/concepts/context) for improved accuracy.
#### Deprecation notice
The following preview models are **deprecated** and will be retired on **November 30, 2025:**
* stt-async-preview-v1
* stt-rt-preview-v2
Please migrate to the v3 models before that date to ensure uninterrupted service.
### August 15, 2025
* Deprecated `stt-rt-preview-v1`
### August 5, 2025
* Released `stt-rt-preview-v2`
* Higher transcription accuracy
* Improved translation quality
* Expanded to support all translation pairs
* More reliable automatic language switching
* **Replaces:** stt-rt-preview-v2, stt-async-preview-v1
# Security and privacy
URL: /stt/security-and-privacy
Learn about security and privacy policies.
At Soniox, we take security and privacy seriously. Our platform is designed to keep your data protected while reducing compliance burdens for your business. This page outlines how Soniox handles data, meets compliance requirements, and ensures secure communication.
***
## Compliance
Soniox meets industry-leading certification standards:
* **SOC 2 Type 2** – auditing standard that evaluates an organization’s controls for security, availability, processing integrity, confidentiality, and privacy over an extended period of time.
* **ISO/IEC 27001:2022** – internationally recognized standard for Information Security Management Systems (ISMS).
* **GDPR** – European Union regulation that governs the collection, processing, and protection of personal data and privacy rights.
* **HIPAA** – U.S. regulatory framework that establishes requirements for protecting sensitive healthcare data, including Protected Health Information (PHI).
To request compliance documentation, contact us at [support@soniox.com](mailto:support@soniox.com).
***
## Data handling
* **No model training** – your audio and transcripts are never used to improve Soniox models or services.
* **No retention** – Soniox does not store your audio or transcript data unless explicitly requested through a service that supports storage, i.e. async API.
* **Storage** – when you choose to store data, it is securely isolated within your Soniox Account.
* **Data deletion** – you can delete all stored audio and transcripts at any time via the Soniox Console or API.
***
## Logging
* Minimal logging is performed for service reliability, debugging, and billing.
* Logs **never** contain raw audio or transcript content.
* Diagnostic metadata (such as request IDs or error traces) may be retained temporarily for operational purposes.
***
## Encryption & security
* **In transit** – all communication between your application and Soniox services is encrypted using **TLS 1.2+**.
* **Access control** – stored data is restricted to your account namespace, accessible only by your API keys.
# React Native SDK
URL: /stt/SDKs/react-native-SDK
Build speech-to-text workflows in React Native with real-time API.
Soniox [React SDK](/stt/SDKs/react-SDK) works with React Native and Expo out of the box, providing the same hooks for real-time speech-to-text.
It lets you:
* Capture audio from the device microphone with a single hook
* Stream audio to Soniox in real time
* Receive transcription and translation results as reactive state
## Quickstart
### Install
Install via your preferred package manager:
```bash tab
npm install @soniox/react @soniox/client
```
```bash tab
yarn add @soniox/react @soniox/client
```
```bash tab
pnpm add @soniox/react @soniox/client
```
```bash tab
bun add @soniox/react @soniox/client
```
### Setup you temporary API key endpoint
In client enviroment (browser, mobile app, React Native, etc.), you don't want to expose your API key to the client.
For this reason, you can create a temporary API key endpoint on your server and use it to issue temporary API keys for the client.
Read more about using Temporary API keys with [React SDK](/stt/SDKs/react-SDK#setup-you-temporary-api-key-endpoint)
### Create a custom audio source
Wrap any RN audio streaming library (e.g. `@siteed/expo-audio-studio`) with the `AudioSource` interface to stream PCM audio chunks to Soniox
```ts
import type { AudioSource, AudioSourceHandlers } from "@soniox/client";
class MyAudioSource implements AudioSource {
private handlers: AudioSourceHandlers | null = null;
async start(handlers: AudioSourceHandlers): Promise {
this.handlers = handlers;
// Start your audio capture here.
// Call handlers.onData(chunk) with each audio chunk as an ArrayBuffer.
// Call handlers.onError(error) if something goes wrong.
// Call handlers.onMuted?.() / handlers.onUnmuted?.() when the mic is
// muted or unmuted externally (e.g. OS-level, hardware switch).
}
stop(): void {
// Stop audio capture and release resources.
this.handlers = null;
}
}
```
### Create your first real-time session
The core hooks (e.g. [`useRecording`](/stt/SDKs/react-SDK/realtime-transcription#userecording)) are platform-agnostic.
To use them in React Native, provide a custom [`AudioSource`](/stt/SDKs/web-SDK/reference/types#audiosource) that streams PCM
audio chunks
```ts
import { SonioxProvider, useRecording } from "@soniox/react";
import { MyAudioSource } from "./MyAudioSource";
// Create a temporary API key endpoint on your server and use it to issue temporary API keys for the client.
async function fetchApiKey() {
const res = await fetch("/api/soniox-temporary-key", { method: "POST" });
const { api_key } = await res.json();
return api_key;
}
function App() {
return (
// Wrap your app with a SonioxProvider and pass the temporary API key getter function
);
}
function Transcription() {
// Create a custom audio source
const source = useRef(new MyAudioSource()).current;
// Create a recording session
const { state, isActive, finalText, partialText, start, stop } = useRecording({
model: "stt-rt-v4",
audio_format: "pcm_s16le",
sample_rate: 16000,
num_channels: 1,
source,
});
return (
{finalText}{partialText}
{isActive ? (
);
}
```
## Next steps
* [Real-time transcription](/stt/SDKs/react-SDK/realtime-transcription)
* [Full SDK reference](/stt/SDKs/react-SDK/reference)
## Package links
* [GitHub repository](https://github.com/soniox/soniox-js)
* [@soniox/react NPM package](https://www.npmjs.com/package/@soniox/react)
* [@soniox/client NPM package](https://www.npmjs.com/package/@soniox/client)
# Async transcription
URL: /stt/async/async-transcription
Learn about async transcription for audio files.
## Overview
Soniox supports **asynchronous transcription** for audio files. This allows you to
transcribe recordings without maintaining a live connection or streaming
pipeline.
You can submit audio from:
* A **public URL** (`audio_url`).
* A **local file** uploaded via the **Soniox Files API** (`file_id`).
Once submitted, jobs are processed in the background. You can poll for
status/results, or use **webhooks** to get notified when transcription is complete.
***
## Audio input options
### Transcribe from public URL
If your audio is publicly accessible via HTTP, use the `audio_url` parameter:
```json
{"audio_url": "https://example.com/audio.mp3"}
```
### Transcribe from local file
For local files, upload to Soniox using the **Files API**. Then reference the
returned `file_id` when creating the transcription request:
```json
{"file_id": "your_file_id"}
```
***
## Audio formats
Soniox automatically detects audio formats for file transcription — no configuration required.
Supported formats:
```text
aac, aiff, amr, asf, flac, mp3, ogg, wav, webm, m4a, mp4
```
***
## Tracking requests
Optionally, add a client-defined identifier to track requests:
```json
{"client_reference_id": "MyReferenceId"}
```
***
## Code examples
**Prerequisite:** Complete the steps in [Get started](/stt/get-started).
See on GitHub: [soniox\_sdk\_async.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python_sdk/soniox_sdk_async.py).
```
import os
import argparse
from typing import Optional
from soniox import SonioxClient
from soniox.types import (
CreateTranscriptionConfig,
StructuredContext,
TranslationConfig,
StructuredContextGeneralItem,
StructuredContextTranslationTerm,
)
from soniox.utils import render_tokens
def get_config(translation: Optional[str]) -> CreateTranscriptionConfig:
config = CreateTranscriptionConfig(
# Select the model to use.
# See: soniox.com/docs/stt/models
model="stt-async-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
language_hints=["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification=True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization=True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
context=StructuredContext(
general=[
StructuredContextGeneralItem(key="domain", value="Healthcare"),
StructuredContextGeneralItem(
key="topic", value="Diabetes management consultation"
),
StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
StructuredContextGeneralItem(key="patient", value="Mr. David Miller"),
StructuredContextGeneralItem(
key="organization", value="St John's Hospital"
),
],
text="Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms=[
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms=[
StructuredContextTranslationTerm(
source="Mr. Smith", target="Sr. Smith"
),
StructuredContextTranslationTerm(
source="St John's", target="St John's"
),
StructuredContextTranslationTerm(source="stroke", target="ictus"),
],
),
#
# Optional identifier to track this request (client-defined).
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
client_reference_id="MyReferenceId",
)
# Webhook.
# You can set a webhook to get notified when the transcription finishes or fails.
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
# In SDK you can set the following fields:
# - config.webhook_url
# - config.webhook_auth_header_name
# - config.webhook_auth_header_value
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config.translation = TranslationConfig(
type="one_way",
target_language="es",
)
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config.translation = TranslationConfig(
type="two_way",
language_a="en",
language_b="es",
)
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
def transcribe_file(
client: SonioxClient,
audio_url: Optional[str],
audio_path: Optional[str],
translation: Optional[str],
) -> None:
if audio_url is not None:
# Public URL of the audio file to transcribe.
assert audio_path is None
file = None
elif audio_path is not None:
# Local file to be uploaded to obtain file id.
assert audio_url is None
file = client.files.upload(audio_path)
else:
raise ValueError("Missing audio: audio_url or audio_path must be specified.")
config = get_config(translation)
print("Creating transcription...")
transcription = client.stt.create(
config=config, file_id=file.id if file else None, audio_url=audio_url
)
print("Waiting for transcription...")
client.stt.wait(transcription.id)
result = client.stt.get_transcript(transcription.id)
print(render_tokens(result.tokens, []))
client.stt.delete(transcription.id)
if file is not None:
client.files.delete(file.id)
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--audio_url", help="Public URL of the audio file to transcribe."
)
parser.add_argument(
"--audio_path", help="Path to a local audio file to transcribe."
)
parser.add_argument("--delete_all_files", action="store_true")
parser.add_argument("--delete_all_transcriptions", action="store_true")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if not api_key:
raise RuntimeError(
"Missing SONIOX_API_KEY.\n"
"1. Get your API key at https://console.soniox.com\n"
"2. Run: export SONIOX_API_KEY="
)
client = SonioxClient()
# Delete all uploaded files.
if args.delete_all_files:
print("Deleting all files...")
client.files.delete_all()
return
# Delete all transcriptions.
if args.delete_all_transcriptions:
print("Deleting all transcriptions...")
client.stt.delete_all()
return
# If not deleting, require one audio source.
if not (args.audio_url or args.audio_path):
parser.error("Provide --audio_url or --audio_path (or use a delete flag).")
transcribe_file(client, args.audio_url, args.audio_path, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# Transcribe file from URL
python soniox_sdk_async.py --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
# Transcribe from local file
python soniox_sdk_async.py --audio_path ../assets/coffee_shop.mp3
# Delete all uploaded files
python soniox_sdk_async.py --delete_all_files
# Delete all transcriptions
python soniox_sdk_async.py --delete_all_transcriptions
```
See on GitHub: [soniox\_sdk\_async.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs_sdk/soniox_sdk_async.js).
```
import { SonioxNodeClient } from "@soniox/node";
import fs from "fs";
import { parseArgs } from "node:util";
import process from "process";
// Initialize the client.
// The API key is read from the SONIOX_API_KEY environment variable.
const client = new SonioxNodeClient();
// Convert transcript into a readable output.
function renderTranscript(transcript) {
return transcript
.segments()
.map((s) => {
const speaker = s.speaker ? `Speaker ${s.speaker}` : "";
const isTranslation = s.tokens[0]?.translation_status === "translation";
const lang = isTranslation
? `[Translation][${s.language}]`
: `[${s.language}]`;
return `${speaker} ${lang}: ${s.text.trim()}`;
})
.join("\n");
}
// Build transcription options.
function getTranscriptionOptions(audioUrl, audioPath, translation) {
if (!audioUrl && !audioPath) {
throw new Error(
"Missing audio: audio_url or audio_path must be specified.",
);
}
const options = {
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-async-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Optional identifier to track this request (client-defined).
client_reference_id: "MyReferenceId",
// Wait for transcription to complete and fetch the transcript.
wait: true,
// Automatically clean up the file and transcription after we're done.
cleanup: ["file", "transcription"],
};
// Audio source: either a local file or a public URL.
if (audioPath) {
options.file = fs.readFileSync(audioPath);
options.filename = audioPath;
} else {
options.audio_url = audioUrl;
}
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
options.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
options.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return options;
}
async function transcribeFile(audioUrl, audioPath, translation) {
console.log("Starting transcription...");
const transcription = await client.stt.transcribe(
getTranscriptionOptions(audioUrl, audioPath, translation),
);
console.log(renderTranscript(transcription.transcript));
}
async function deleteAllFiles() {
const { deleted } = await client.files.delete_all();
console.log(
deleted === 0 ? "No files to delete." : `Deleted ${deleted} files.`,
);
}
async function deleteAllTranscriptions() {
const { deleted } = await client.stt.delete_all();
console.log(
deleted === 0
? "No transcriptions to delete."
: `Deleted ${deleted} transcriptions.`,
);
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_url: {
type: "string",
description: "Public URL of the audio file to transcribe",
},
audio_path: {
type: "string",
description: "Path to a local audio file to transcribe",
},
delete_all_files: {
type: "boolean",
description: "Delete all uploaded files",
},
delete_all_transcriptions: {
type: "boolean",
description: "Delete all transcriptions",
},
translation: { type: "string", default: "none" },
},
});
if (argv.delete_all_files) {
await deleteAllFiles();
return;
}
if (argv.delete_all_transcriptions) {
await deleteAllTranscriptions();
return;
}
await transcribeFile(argv.audio_url, argv.audio_path, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# Transcribe file from URL
node soniox_sdk_async.js --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
# Transcribe from local file
node soniox_sdk_async.js --audio_path ../assets/coffee_shop.mp3
# Delete all uploaded files
node soniox_sdk_async.js --delete_all_files
# Delete all transcriptions
node soniox_sdk_async.js --delete_all_transcriptions
```
See on GitHub: [soniox\_async.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python/soniox_async.py).
```
import os
import time
import argparse
from typing import Optional
import requests
from requests import Session
SONIOX_API_BASE_URL = "https://api.soniox.com"
# Get Soniox STT config.
def get_config(
audio_url: Optional[str], file_id: Optional[str], translation: Optional[str]
) -> dict:
config = {
# Select the model to use.
# See: soniox.com/docs/stt/models
"model": "stt-async-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
"language_hints": ["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
"enable_language_identification": True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
"enable_speaker_diarization": True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
"context": {
"general": [
{"key": "domain", "value": "Healthcare"},
{"key": "topic", "value": "Diabetes management consultation"},
{"key": "doctor", "value": "Dr. Martha Smith"},
{"key": "patient", "value": "Mr. David Miller"},
{"key": "organization", "value": "St John's Hospital"},
],
"text": "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
"terms": [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
"translation_terms": [
{"source": "Mr. Smith", "target": "Sr. Smith"},
{"source": "St John's", "target": "St John's"},
{"source": "stroke", "target": "ictus"},
],
},
#
# Optional identifier to track this request (client-defined).
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
"client_reference_id": "MyReferenceId",
#
# Audio source (only one can specified):
# - Public URL of the audio file.
# - File ID of a previously uploaded file
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
"audio_url": audio_url,
"file_id": file_id,
}
# Webhook.
# You can set a webhook to get notified when the transcription finishes or fails.
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config["translation"] = {
"type": "one_way",
"target_language": "es",
}
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config["translation"] = {
"type": "two_way",
"language_a": "en",
"language_b": "es",
}
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
def upload_audio(session: Session, audio_path: str) -> str:
print("Starting file upload...")
res = session.post(
f"{SONIOX_API_BASE_URL}/v1/files",
files={"file": open(audio_path, "rb")},
)
file_id = res.json()["id"]
print(f"File ID: {file_id}")
return file_id
def create_transcription(session: Session, config: dict) -> str:
print("Creating transcription...")
try:
res = session.post(
f"{SONIOX_API_BASE_URL}/v1/transcriptions",
json=config,
)
res.raise_for_status()
transcription_id = res.json()["id"]
print(f"Transcription ID: {transcription_id}")
return transcription_id
except Exception as e:
print("error here:", e)
def wait_until_completed(session: Session, transcription_id: str) -> None:
print("Waiting for transcription...")
while True:
res = session.get(f"{SONIOX_API_BASE_URL}/v1/transcriptions/{transcription_id}")
res.raise_for_status()
data = res.json()
if data["status"] == "completed":
return
elif data["status"] == "error":
raise Exception(f"Error: {data.get('error_message', 'Unknown error')}")
time.sleep(1)
def get_transcription(session: Session, transcription_id: str) -> dict:
res = session.get(
f"{SONIOX_API_BASE_URL}/v1/transcriptions/{transcription_id}/transcript"
)
res.raise_for_status()
return res.json()
def delete_transcription(session: Session, transcription_id: str) -> dict:
res = session.delete(f"{SONIOX_API_BASE_URL}/v1/transcriptions/{transcription_id}")
res.raise_for_status()
def delete_file(session: Session, file_id: str) -> dict:
res = session.delete(f"{SONIOX_API_BASE_URL}/v1/files/{file_id}")
res.raise_for_status()
def delete_all_files(session: Session) -> None:
files: list[dict] = []
cursor: str = ""
while True:
print("Getting files...")
res = session.get(f"{SONIOX_API_BASE_URL}/v1/files?cursor={cursor}")
res.raise_for_status()
res_json = res.json()
files.extend(res_json["files"])
cursor = res_json["next_page_cursor"]
if cursor is None:
break
total = len(files)
if total == 0:
print("No files to delete.")
return
print(f"Deleting {total} files...")
for idx, file in enumerate(files):
file_id = file["id"]
print(f"Deleting file: {file_id} ({idx + 1}/{total})")
delete_file(session, file_id)
def delete_all_transcriptions(session: Session) -> None:
transcriptions: list[dict] = []
cursor: str = ""
while True:
print("Getting transcriptions...")
res = session.get(f"{SONIOX_API_BASE_URL}/v1/transcriptions?cursor={cursor}")
res.raise_for_status()
res_json = res.json()
for transcription in res_json["transcriptions"]:
status = transcription["status"]
# Delete only transcriptions with completed or error status.
if status in ("completed", "error"):
transcriptions.append(transcription)
cursor = res_json["next_page_cursor"]
if cursor is None:
break
total = len(transcriptions)
if total == 0:
print("No transcriptions to delete.")
return
print(f"Deleting {total} transcriptions...")
for idx, transcription in enumerate(transcriptions):
transcription_id = transcription["id"]
print(f"Deleting transcription: {transcription_id} ({idx + 1}/{total})")
delete_transcription(session, transcription_id)
# Convert tokens into a readable transcript.
def render_tokens(final_tokens: list[dict]) -> str:
text_parts: list[str] = []
current_speaker: Optional[str] = None
current_language: Optional[str] = None
# Process all tokens in order.
for token in final_tokens:
text = token["text"]
speaker = token.get("speaker")
language = token.get("language")
is_translation = token.get("translation_status") == "translation"
# Speaker changed -> add a speaker tag.
if speaker is not None and speaker != current_speaker:
if current_speaker is not None:
text_parts.append("\n\n")
current_speaker = speaker
current_language = None # Reset language on speaker changes.
text_parts.append(f"Speaker {current_speaker}:")
# Language changed -> add a language or translation tag.
if language is not None and language != current_language:
current_language = language
prefix = "[Translation] " if is_translation else ""
text_parts.append(f"\n{prefix}[{current_language}] ")
text = text.lstrip()
text_parts.append(text)
return "".join(text_parts)
def transcribe_file(
session: Session,
audio_url: Optional[str],
audio_path: Optional[str],
translation: Optional[str],
) -> None:
if audio_url is not None:
# Public URL of the audio file to transcribe.
assert audio_path is None
file_id = None
elif audio_path is not None:
# Local file to be uploaded to obtain file id.
assert audio_url is None
file_id = upload_audio(session, audio_path)
else:
raise ValueError("Missing audio: audio_url or audio_path must be specified.")
config = get_config(audio_url, file_id, translation)
transcription_id = create_transcription(session, config)
wait_until_completed(session, transcription_id)
result = get_transcription(session, transcription_id)
text = render_tokens(result["tokens"])
print(text)
delete_transcription(session, transcription_id)
if file_id is not None:
delete_file(session, file_id)
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--audio_url", help="Public URL of the audio file to transcribe."
)
parser.add_argument(
"--audio_path", help="Path to a local audio file to transcribe."
)
parser.add_argument("--delete_all_files", action="store_true")
parser.add_argument("--delete_all_transcriptions", action="store_true")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if not api_key:
raise RuntimeError(
"Missing SONIOX_API_KEY.\n"
"1. Get your API key at https://console.soniox.com\n"
"2. Run: export SONIOX_API_KEY="
)
# Create an authenticated session.
session = requests.Session()
session.headers["Authorization"] = f"Bearer {api_key}"
# Delete all uploaded files.
if args.delete_all_files:
delete_all_files(session)
return
# Delete all transcriptions.
if args.delete_all_transcriptions:
delete_all_transcriptions(session)
return
# If not deleting, require one audio source.
if not (args.audio_url or args.audio_path):
parser.error("Provide --audio_url or --audio_path (or use a delete flag).")
transcribe_file(session, args.audio_url, args.audio_path, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# Transcribe file from URL
python soniox_async.py --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
# Transcribe from local file
python soniox_async.py --audio_path ../assets/coffee_shop.mp3
# Delete all uploaded files
python soniox_async.py --delete_all_files
# Delete all transcriptions
python soniox_async.py --delete_all_transcriptions
```
See on GitHub: [soniox\_async.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs/soniox_async.js).
```
import fs from "fs";
import { parseArgs } from "node:util";
import process from "process";
const SONIOX_API_BASE_URL = "https://api.soniox.com";
// Get Soniox STT config.
function getConfig(audioUrl, fileId, translation) {
const config = {
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-async-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Optional identifier to track this request (client-defined).
// See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
client_reference_id: "MyReferenceId",
// Audio source (only one can specified):
// - Public URL of the audio file.
// - File ID of a previously uploaded file
// See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
audio_url: audioUrl,
file_id: fileId,
};
// Webhook.
// You can set a webhook to get notified when the transcription finishes or fails.
// See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
// Translates all languages into the target language.
config.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
// Translates from language_a to language_b and back from language_b to language_a.
config.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return config;
}
// Adds Soniox API_KEY to each request.
async function apiFetch(endpoint, { method = "GET", body, headers = {} } = {}) {
const apiKey = process.env.SONIOX_API_KEY;
if (!apiKey) {
throw new Error(
"Missing SONIOX_API_KEY.\n" +
"1. Get your API key at https://console.soniox.com\n" +
"2. Run: export SONIOX_API_KEY=",
);
}
const res = await fetch(`${SONIOX_API_BASE_URL}${endpoint}`, {
method,
headers: {
Authorization: `Bearer ${apiKey}`,
...headers,
},
body,
});
if (!res.ok) {
const msg = await res.text();
console.log(msg);
throw new Error(`HTTP ${res.status} ${res.statusText}: ${msg}`);
}
return method !== "DELETE" ? res.json() : null;
}
async function uploadAudio(audioPath) {
console.log("Starting file upload...");
const form = new FormData();
form.append("file", new Blob([fs.readFileSync(audioPath)]), audioPath);
const res = await apiFetch("/v1/files", {
method: "POST",
body: form,
});
console.log(`File ID: ${res.id}`);
return res.id;
}
async function createTranscription(config) {
console.log("Creating transcription...");
const res = await apiFetch("/v1/transcriptions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(config),
});
console.log(`Transcription ID: ${res.id}`);
return res.id;
}
async function waitUntilCompleted(transcriptionId) {
console.log("Waiting for transcription...");
while (true) {
const res = await apiFetch(`/v1/transcriptions/${transcriptionId}`);
if (res.status === "completed") return;
if (res.status === "error") throw new Error(`Error: ${res.error_message}`);
await new Promise((r) => setTimeout(r, 1000));
}
}
async function getTranscription(transcriptionId) {
return apiFetch(`/v1/transcriptions/${transcriptionId}/transcript`);
}
async function deleteTranscription(transcriptionId) {
await apiFetch(`/v1/transcriptions/${transcriptionId}`, { method: "DELETE" });
}
async function deleteFile(fileId) {
await apiFetch(`/v1/files/${fileId}`, { method: "DELETE" });
}
async function deleteAllFiles() {
let files = [];
let cursor = "";
while (true) {
const res = await apiFetch(`/v1/files?cursor=${cursor}`);
files = files.concat(res.files);
cursor = res.next_page_cursor;
if (!cursor) break;
}
if (files.length === 0) {
console.log("No files to delete.");
return;
}
console.log(`Deleting ${files.length} files...`);
for (let i = 0; i < files.length; i++) {
console.log(`Deleting file: ${files[i].id} (${i + 1}/${files.length})`);
await deleteFile(files[i].id);
}
}
async function deleteAllTranscriptions() {
let transcriptions = [];
let cursor = "";
while (true) {
const res = await apiFetch(`/v1/transcriptions?cursor=${cursor}`);
// Delete only transcriptions with completed or error status.
transcriptions = transcriptions.concat(
res.transcriptions.filter(
(t) => t.status === "completed" || t.status === "error",
),
);
cursor = res.next_page_cursor;
if (!cursor) break;
}
if (transcriptions.length === 0) {
console.log("No transcriptions to delete.");
return;
}
console.log(`Deleting ${transcriptions.length} transcriptions...`);
for (let i = 0; i < transcriptions.length; i++) {
console.log(
`Deleting transcription: ${transcriptions[i].id} (${i + 1}/${transcriptions.length})`,
);
await deleteTranscription(transcriptions[i].id);
}
}
// Convert tokens into a readable transcript.
function renderTokens(finalTokens) {
const textParts = [];
let currentSpeaker = null;
let currentLanguage = null;
// Process all tokens in order.
for (const token of finalTokens) {
let { text, speaker, language } = token;
const isTranslation = token.translation_status === "translation";
// Speaker changed -> add a speaker tag.
if (speaker !== undefined && speaker !== currentSpeaker) {
if (currentSpeaker !== null) textParts.push("\n\n");
currentSpeaker = speaker;
currentLanguage = null; // Reset language on speaker changes.
textParts.push(`Speaker ${currentSpeaker}:`);
}
// Language changed -> add a language or translation tag.
if (language !== undefined && language !== currentLanguage) {
currentLanguage = language;
const prefix = isTranslation ? "[Translation] " : "";
textParts.push(`\n${prefix}[${currentLanguage}] `);
text = text.trimStart();
}
textParts.push(text);
}
return textParts.join("");
}
async function transcribeFile(audioUrl, audioPath, translation) {
let fileId = null;
if (!audioUrl && !audioPath) {
throw new Error(
"Missing audio: audio_url or audio_path must be specified.",
);
}
if (audioPath) {
fileId = await uploadAudio(audioPath);
}
const config = getConfig(audioUrl, fileId, translation);
const transcriptionId = await createTranscription(config);
await waitUntilCompleted(transcriptionId);
const result = await getTranscription(transcriptionId);
const text = renderTokens(result.tokens);
console.log(text);
await deleteTranscription(transcriptionId);
if (fileId) await deleteFile(fileId);
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_url: {
type: "string",
description: "Public URL of the audio file to transcribe",
},
audio_path: {
type: "string",
description: "Path to a local audio file to transcribe",
},
delete_all_files: {
type: "boolean",
description: "Delete all uploaded files",
},
delete_all_transcriptions: {
type: "boolean",
description: "Delete all transcriptions",
},
translation: { type: "string", default: "none" },
},
});
if (argv.delete_all_files) {
await deleteAllFiles();
return;
}
if (argv.delete_all_transcriptions) {
await deleteAllTranscriptions();
return;
}
await transcribeFile(argv.audio_url, argv.audio_path, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# Transcribe file from URL
node soniox_async.js --audio_url "https://soniox.com/media/examples/coffee_shop.mp3"
# Transcribe from local file
node soniox_async.js --audio_path ../assets/coffee_shop.mp3
# Delete all uploaded files
node soniox_async.js --delete_all_files
# Delete all transcriptions
node soniox_async.js --delete_all_transcriptions
```
# Async translation
URL: /stt/async/async-translation
Learn about async translation for audio files.
## Overview
Soniox also supports **asynchronous transcription with translation,** allowing you to process recorded audio files in a single API call, no live connection or streaming required.
To get started:
1. Review the [Async transcription](/stt/async/async-transcription): guide to understand how asynchronous processing works.
2. Then, see [Real-time translation](/stt/rt/real-time-translation): for a detailed explanation of translation concepts that also apply to async mode.
***
## Code examples
**Prerequisite:** Complete the steps in [Get started](/stt/get-started).
See on GitHub: [soniox\_sdk\_async.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python_sdk/soniox_sdk_async.py).
```
import os
import argparse
from typing import Optional
from soniox import SonioxClient
from soniox.types import (
CreateTranscriptionConfig,
StructuredContext,
TranslationConfig,
StructuredContextGeneralItem,
StructuredContextTranslationTerm,
)
from soniox.utils import render_tokens
def get_config(translation: Optional[str]) -> CreateTranscriptionConfig:
config = CreateTranscriptionConfig(
# Select the model to use.
# See: soniox.com/docs/stt/models
model="stt-async-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
language_hints=["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification=True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization=True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
context=StructuredContext(
general=[
StructuredContextGeneralItem(key="domain", value="Healthcare"),
StructuredContextGeneralItem(
key="topic", value="Diabetes management consultation"
),
StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
StructuredContextGeneralItem(key="patient", value="Mr. David Miller"),
StructuredContextGeneralItem(
key="organization", value="St John's Hospital"
),
],
text="Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms=[
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms=[
StructuredContextTranslationTerm(
source="Mr. Smith", target="Sr. Smith"
),
StructuredContextTranslationTerm(
source="St John's", target="St John's"
),
StructuredContextTranslationTerm(source="stroke", target="ictus"),
],
),
#
# Optional identifier to track this request (client-defined).
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
client_reference_id="MyReferenceId",
)
# Webhook.
# You can set a webhook to get notified when the transcription finishes or fails.
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
# In SDK you can set the following fields:
# - config.webhook_url
# - config.webhook_auth_header_name
# - config.webhook_auth_header_value
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config.translation = TranslationConfig(
type="one_way",
target_language="es",
)
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config.translation = TranslationConfig(
type="two_way",
language_a="en",
language_b="es",
)
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
def transcribe_file(
client: SonioxClient,
audio_url: Optional[str],
audio_path: Optional[str],
translation: Optional[str],
) -> None:
if audio_url is not None:
# Public URL of the audio file to transcribe.
assert audio_path is None
file = None
elif audio_path is not None:
# Local file to be uploaded to obtain file id.
assert audio_url is None
file = client.files.upload(audio_path)
else:
raise ValueError("Missing audio: audio_url or audio_path must be specified.")
config = get_config(translation)
print("Creating transcription...")
transcription = client.stt.create(
config=config, file_id=file.id if file else None, audio_url=audio_url
)
print("Waiting for transcription...")
client.stt.wait(transcription.id)
result = client.stt.get_transcript(transcription.id)
print(render_tokens(result.tokens, []))
client.stt.delete(transcription.id)
if file is not None:
client.files.delete(file.id)
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--audio_url", help="Public URL of the audio file to transcribe."
)
parser.add_argument(
"--audio_path", help="Path to a local audio file to transcribe."
)
parser.add_argument("--delete_all_files", action="store_true")
parser.add_argument("--delete_all_transcriptions", action="store_true")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if not api_key:
raise RuntimeError(
"Missing SONIOX_API_KEY.\n"
"1. Get your API key at https://console.soniox.com\n"
"2. Run: export SONIOX_API_KEY="
)
client = SonioxClient()
# Delete all uploaded files.
if args.delete_all_files:
print("Deleting all files...")
client.files.delete_all()
return
# Delete all transcriptions.
if args.delete_all_transcriptions:
print("Deleting all transcriptions...")
client.stt.delete_all()
return
# If not deleting, require one audio source.
if not (args.audio_url or args.audio_path):
parser.error("Provide --audio_url or --audio_path (or use a delete flag).")
transcribe_file(client, args.audio_url, args.audio_path, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# One-way translation of a live audio stream
python soniox_sdk_async.py --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a live audio stream
python soniox_sdk_async.py --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
See on GitHub: [soniox\_sdk\_async.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs_sdk/soniox_sdk_async.js).
```
import { SonioxNodeClient } from "@soniox/node";
import fs from "fs";
import { parseArgs } from "node:util";
import process from "process";
// Initialize the client.
// The API key is read from the SONIOX_API_KEY environment variable.
const client = new SonioxNodeClient();
// Convert transcript into a readable output.
function renderTranscript(transcript) {
return transcript
.segments()
.map((s) => {
const speaker = s.speaker ? `Speaker ${s.speaker}` : "";
const isTranslation = s.tokens[0]?.translation_status === "translation";
const lang = isTranslation
? `[Translation][${s.language}]`
: `[${s.language}]`;
return `${speaker} ${lang}: ${s.text.trim()}`;
})
.join("\n");
}
// Build transcription options.
function getTranscriptionOptions(audioUrl, audioPath, translation) {
if (!audioUrl && !audioPath) {
throw new Error(
"Missing audio: audio_url or audio_path must be specified.",
);
}
const options = {
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-async-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Optional identifier to track this request (client-defined).
client_reference_id: "MyReferenceId",
// Wait for transcription to complete and fetch the transcript.
wait: true,
// Automatically clean up the file and transcription after we're done.
cleanup: ["file", "transcription"],
};
// Audio source: either a local file or a public URL.
if (audioPath) {
options.file = fs.readFileSync(audioPath);
options.filename = audioPath;
} else {
options.audio_url = audioUrl;
}
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
options.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
options.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return options;
}
async function transcribeFile(audioUrl, audioPath, translation) {
console.log("Starting transcription...");
const transcription = await client.stt.transcribe(
getTranscriptionOptions(audioUrl, audioPath, translation),
);
console.log(renderTranscript(transcription.transcript));
}
async function deleteAllFiles() {
const { deleted } = await client.files.delete_all();
console.log(
deleted === 0 ? "No files to delete." : `Deleted ${deleted} files.`,
);
}
async function deleteAllTranscriptions() {
const { deleted } = await client.stt.delete_all();
console.log(
deleted === 0
? "No transcriptions to delete."
: `Deleted ${deleted} transcriptions.`,
);
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_url: {
type: "string",
description: "Public URL of the audio file to transcribe",
},
audio_path: {
type: "string",
description: "Path to a local audio file to transcribe",
},
delete_all_files: {
type: "boolean",
description: "Delete all uploaded files",
},
delete_all_transcriptions: {
type: "boolean",
description: "Delete all transcriptions",
},
translation: { type: "string", default: "none" },
},
});
if (argv.delete_all_files) {
await deleteAllFiles();
return;
}
if (argv.delete_all_transcriptions) {
await deleteAllTranscriptions();
return;
}
await transcribeFile(argv.audio_url, argv.audio_path, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# One-way translation of a local file
node soniox_sdk_async.js --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a local file
node soniox_sdk_async.js --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
See on GitHub: [soniox\_async.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python/soniox_async.py).
```
import os
import time
import argparse
from typing import Optional
import requests
from requests import Session
SONIOX_API_BASE_URL = "https://api.soniox.com"
# Get Soniox STT config.
def get_config(
audio_url: Optional[str], file_id: Optional[str], translation: Optional[str]
) -> dict:
config = {
# Select the model to use.
# See: soniox.com/docs/stt/models
"model": "stt-async-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
"language_hints": ["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
"enable_language_identification": True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
"enable_speaker_diarization": True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
"context": {
"general": [
{"key": "domain", "value": "Healthcare"},
{"key": "topic", "value": "Diabetes management consultation"},
{"key": "doctor", "value": "Dr. Martha Smith"},
{"key": "patient", "value": "Mr. David Miller"},
{"key": "organization", "value": "St John's Hospital"},
],
"text": "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
"terms": [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
"translation_terms": [
{"source": "Mr. Smith", "target": "Sr. Smith"},
{"source": "St John's", "target": "St John's"},
{"source": "stroke", "target": "ictus"},
],
},
#
# Optional identifier to track this request (client-defined).
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
"client_reference_id": "MyReferenceId",
#
# Audio source (only one can specified):
# - Public URL of the audio file.
# - File ID of a previously uploaded file
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
"audio_url": audio_url,
"file_id": file_id,
}
# Webhook.
# You can set a webhook to get notified when the transcription finishes or fails.
# See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config["translation"] = {
"type": "one_way",
"target_language": "es",
}
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config["translation"] = {
"type": "two_way",
"language_a": "en",
"language_b": "es",
}
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
def upload_audio(session: Session, audio_path: str) -> str:
print("Starting file upload...")
res = session.post(
f"{SONIOX_API_BASE_URL}/v1/files",
files={"file": open(audio_path, "rb")},
)
file_id = res.json()["id"]
print(f"File ID: {file_id}")
return file_id
def create_transcription(session: Session, config: dict) -> str:
print("Creating transcription...")
try:
res = session.post(
f"{SONIOX_API_BASE_URL}/v1/transcriptions",
json=config,
)
res.raise_for_status()
transcription_id = res.json()["id"]
print(f"Transcription ID: {transcription_id}")
return transcription_id
except Exception as e:
print("error here:", e)
def wait_until_completed(session: Session, transcription_id: str) -> None:
print("Waiting for transcription...")
while True:
res = session.get(f"{SONIOX_API_BASE_URL}/v1/transcriptions/{transcription_id}")
res.raise_for_status()
data = res.json()
if data["status"] == "completed":
return
elif data["status"] == "error":
raise Exception(f"Error: {data.get('error_message', 'Unknown error')}")
time.sleep(1)
def get_transcription(session: Session, transcription_id: str) -> dict:
res = session.get(
f"{SONIOX_API_BASE_URL}/v1/transcriptions/{transcription_id}/transcript"
)
res.raise_for_status()
return res.json()
def delete_transcription(session: Session, transcription_id: str) -> dict:
res = session.delete(f"{SONIOX_API_BASE_URL}/v1/transcriptions/{transcription_id}")
res.raise_for_status()
def delete_file(session: Session, file_id: str) -> dict:
res = session.delete(f"{SONIOX_API_BASE_URL}/v1/files/{file_id}")
res.raise_for_status()
def delete_all_files(session: Session) -> None:
files: list[dict] = []
cursor: str = ""
while True:
print("Getting files...")
res = session.get(f"{SONIOX_API_BASE_URL}/v1/files?cursor={cursor}")
res.raise_for_status()
res_json = res.json()
files.extend(res_json["files"])
cursor = res_json["next_page_cursor"]
if cursor is None:
break
total = len(files)
if total == 0:
print("No files to delete.")
return
print(f"Deleting {total} files...")
for idx, file in enumerate(files):
file_id = file["id"]
print(f"Deleting file: {file_id} ({idx + 1}/{total})")
delete_file(session, file_id)
def delete_all_transcriptions(session: Session) -> None:
transcriptions: list[dict] = []
cursor: str = ""
while True:
print("Getting transcriptions...")
res = session.get(f"{SONIOX_API_BASE_URL}/v1/transcriptions?cursor={cursor}")
res.raise_for_status()
res_json = res.json()
for transcription in res_json["transcriptions"]:
status = transcription["status"]
# Delete only transcriptions with completed or error status.
if status in ("completed", "error"):
transcriptions.append(transcription)
cursor = res_json["next_page_cursor"]
if cursor is None:
break
total = len(transcriptions)
if total == 0:
print("No transcriptions to delete.")
return
print(f"Deleting {total} transcriptions...")
for idx, transcription in enumerate(transcriptions):
transcription_id = transcription["id"]
print(f"Deleting transcription: {transcription_id} ({idx + 1}/{total})")
delete_transcription(session, transcription_id)
# Convert tokens into a readable transcript.
def render_tokens(final_tokens: list[dict]) -> str:
text_parts: list[str] = []
current_speaker: Optional[str] = None
current_language: Optional[str] = None
# Process all tokens in order.
for token in final_tokens:
text = token["text"]
speaker = token.get("speaker")
language = token.get("language")
is_translation = token.get("translation_status") == "translation"
# Speaker changed -> add a speaker tag.
if speaker is not None and speaker != current_speaker:
if current_speaker is not None:
text_parts.append("\n\n")
current_speaker = speaker
current_language = None # Reset language on speaker changes.
text_parts.append(f"Speaker {current_speaker}:")
# Language changed -> add a language or translation tag.
if language is not None and language != current_language:
current_language = language
prefix = "[Translation] " if is_translation else ""
text_parts.append(f"\n{prefix}[{current_language}] ")
text = text.lstrip()
text_parts.append(text)
return "".join(text_parts)
def transcribe_file(
session: Session,
audio_url: Optional[str],
audio_path: Optional[str],
translation: Optional[str],
) -> None:
if audio_url is not None:
# Public URL of the audio file to transcribe.
assert audio_path is None
file_id = None
elif audio_path is not None:
# Local file to be uploaded to obtain file id.
assert audio_url is None
file_id = upload_audio(session, audio_path)
else:
raise ValueError("Missing audio: audio_url or audio_path must be specified.")
config = get_config(audio_url, file_id, translation)
transcription_id = create_transcription(session, config)
wait_until_completed(session, transcription_id)
result = get_transcription(session, transcription_id)
text = render_tokens(result["tokens"])
print(text)
delete_transcription(session, transcription_id)
if file_id is not None:
delete_file(session, file_id)
def main():
parser = argparse.ArgumentParser()
parser.add_argument(
"--audio_url", help="Public URL of the audio file to transcribe."
)
parser.add_argument(
"--audio_path", help="Path to a local audio file to transcribe."
)
parser.add_argument("--delete_all_files", action="store_true")
parser.add_argument("--delete_all_transcriptions", action="store_true")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if not api_key:
raise RuntimeError(
"Missing SONIOX_API_KEY.\n"
"1. Get your API key at https://console.soniox.com\n"
"2. Run: export SONIOX_API_KEY="
)
# Create an authenticated session.
session = requests.Session()
session.headers["Authorization"] = f"Bearer {api_key}"
# Delete all uploaded files.
if args.delete_all_files:
delete_all_files(session)
return
# Delete all transcriptions.
if args.delete_all_transcriptions:
delete_all_transcriptions(session)
return
# If not deleting, require one audio source.
if not (args.audio_url or args.audio_path):
parser.error("Provide --audio_url or --audio_path (or use a delete flag).")
transcribe_file(session, args.audio_url, args.audio_path, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# One-way translation of a local file
python soniox_async.py --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a local file
python soniox_async.py --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
See on GitHub: [soniox\_async.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs/soniox_async.js).
```
import fs from "fs";
import { parseArgs } from "node:util";
import process from "process";
const SONIOX_API_BASE_URL = "https://api.soniox.com";
// Get Soniox STT config.
function getConfig(audioUrl, fileId, translation) {
const config = {
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-async-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Optional identifier to track this request (client-defined).
// See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
client_reference_id: "MyReferenceId",
// Audio source (only one can specified):
// - Public URL of the audio file.
// - File ID of a previously uploaded file
// See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
audio_url: audioUrl,
file_id: fileId,
};
// Webhook.
// You can set a webhook to get notified when the transcription finishes or fails.
// See: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription#request
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
// Translates all languages into the target language.
config.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
// Translates from language_a to language_b and back from language_b to language_a.
config.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return config;
}
// Adds Soniox API_KEY to each request.
async function apiFetch(endpoint, { method = "GET", body, headers = {} } = {}) {
const apiKey = process.env.SONIOX_API_KEY;
if (!apiKey) {
throw new Error(
"Missing SONIOX_API_KEY.\n" +
"1. Get your API key at https://console.soniox.com\n" +
"2. Run: export SONIOX_API_KEY=",
);
}
const res = await fetch(`${SONIOX_API_BASE_URL}${endpoint}`, {
method,
headers: {
Authorization: `Bearer ${apiKey}`,
...headers,
},
body,
});
if (!res.ok) {
const msg = await res.text();
console.log(msg);
throw new Error(`HTTP ${res.status} ${res.statusText}: ${msg}`);
}
return method !== "DELETE" ? res.json() : null;
}
async function uploadAudio(audioPath) {
console.log("Starting file upload...");
const form = new FormData();
form.append("file", new Blob([fs.readFileSync(audioPath)]), audioPath);
const res = await apiFetch("/v1/files", {
method: "POST",
body: form,
});
console.log(`File ID: ${res.id}`);
return res.id;
}
async function createTranscription(config) {
console.log("Creating transcription...");
const res = await apiFetch("/v1/transcriptions", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(config),
});
console.log(`Transcription ID: ${res.id}`);
return res.id;
}
async function waitUntilCompleted(transcriptionId) {
console.log("Waiting for transcription...");
while (true) {
const res = await apiFetch(`/v1/transcriptions/${transcriptionId}`);
if (res.status === "completed") return;
if (res.status === "error") throw new Error(`Error: ${res.error_message}`);
await new Promise((r) => setTimeout(r, 1000));
}
}
async function getTranscription(transcriptionId) {
return apiFetch(`/v1/transcriptions/${transcriptionId}/transcript`);
}
async function deleteTranscription(transcriptionId) {
await apiFetch(`/v1/transcriptions/${transcriptionId}`, { method: "DELETE" });
}
async function deleteFile(fileId) {
await apiFetch(`/v1/files/${fileId}`, { method: "DELETE" });
}
async function deleteAllFiles() {
let files = [];
let cursor = "";
while (true) {
const res = await apiFetch(`/v1/files?cursor=${cursor}`);
files = files.concat(res.files);
cursor = res.next_page_cursor;
if (!cursor) break;
}
if (files.length === 0) {
console.log("No files to delete.");
return;
}
console.log(`Deleting ${files.length} files...`);
for (let i = 0; i < files.length; i++) {
console.log(`Deleting file: ${files[i].id} (${i + 1}/${files.length})`);
await deleteFile(files[i].id);
}
}
async function deleteAllTranscriptions() {
let transcriptions = [];
let cursor = "";
while (true) {
const res = await apiFetch(`/v1/transcriptions?cursor=${cursor}`);
// Delete only transcriptions with completed or error status.
transcriptions = transcriptions.concat(
res.transcriptions.filter(
(t) => t.status === "completed" || t.status === "error",
),
);
cursor = res.next_page_cursor;
if (!cursor) break;
}
if (transcriptions.length === 0) {
console.log("No transcriptions to delete.");
return;
}
console.log(`Deleting ${transcriptions.length} transcriptions...`);
for (let i = 0; i < transcriptions.length; i++) {
console.log(
`Deleting transcription: ${transcriptions[i].id} (${i + 1}/${transcriptions.length})`,
);
await deleteTranscription(transcriptions[i].id);
}
}
// Convert tokens into a readable transcript.
function renderTokens(finalTokens) {
const textParts = [];
let currentSpeaker = null;
let currentLanguage = null;
// Process all tokens in order.
for (const token of finalTokens) {
let { text, speaker, language } = token;
const isTranslation = token.translation_status === "translation";
// Speaker changed -> add a speaker tag.
if (speaker !== undefined && speaker !== currentSpeaker) {
if (currentSpeaker !== null) textParts.push("\n\n");
currentSpeaker = speaker;
currentLanguage = null; // Reset language on speaker changes.
textParts.push(`Speaker ${currentSpeaker}:`);
}
// Language changed -> add a language or translation tag.
if (language !== undefined && language !== currentLanguage) {
currentLanguage = language;
const prefix = isTranslation ? "[Translation] " : "";
textParts.push(`\n${prefix}[${currentLanguage}] `);
text = text.trimStart();
}
textParts.push(text);
}
return textParts.join("");
}
async function transcribeFile(audioUrl, audioPath, translation) {
let fileId = null;
if (!audioUrl && !audioPath) {
throw new Error(
"Missing audio: audio_url or audio_path must be specified.",
);
}
if (audioPath) {
fileId = await uploadAudio(audioPath);
}
const config = getConfig(audioUrl, fileId, translation);
const transcriptionId = await createTranscription(config);
await waitUntilCompleted(transcriptionId);
const result = await getTranscription(transcriptionId);
const text = renderTokens(result.tokens);
console.log(text);
await deleteTranscription(transcriptionId);
if (fileId) await deleteFile(fileId);
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_url: {
type: "string",
description: "Public URL of the audio file to transcribe",
},
audio_path: {
type: "string",
description: "Path to a local audio file to transcribe",
},
delete_all_files: {
type: "boolean",
description: "Delete all uploaded files",
},
delete_all_transcriptions: {
type: "boolean",
description: "Delete all transcriptions",
},
translation: { type: "string", default: "none" },
},
});
if (argv.delete_all_files) {
await deleteAllFiles();
return;
}
if (argv.delete_all_transcriptions) {
await deleteAllTranscriptions();
return;
}
await transcribeFile(argv.audio_url, argv.audio_path, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# One-way translation of a local file
node soniox_async.js --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a local file
node soniox_async.js --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
# Error handling
URL: /stt/async/error-handling
Learn about async API error handling.
When using the Async API, errors can occur at different stages of the workflow — from uploading files, to creating transcription requests, to webhook delivery.
This guide explains how to detect, handle, and recover from errors in a robust way.
***
## File upload errors
When uploading files:
* Ensure the file **duration is ≤ 300 minutes** (cannot be increased).
* Stay within your **storage and file count quotas.**
* If you exceed a limit, you’ll get an error.
**How to recover:**
* Delete old files to free up space.
* Request higher limits in the [Soniox Console](https://console.soniox.com/).
***
## Transcription request errors
When creating transcriptions:
* Stay below **100 pending transcriptions** at once.
* Keep your total (pending + completed + failed) **under 2,000.**
* If you exceed a limit, you’ll get an error.
**How to recover:**
* Wait for some pending jobs to complete.
* Delete completed/failed jobs to stay under the quota.
* Request higher limits in the [Soniox Console](https://console.soniox.com/).
***
## Webhook delivery failures
Webhook delivery may fail if:
* Your server is unavailable.
* Your endpoint does not respond in time.
* An invalid response is returned.
### Retry behavior
* Soniox automatically retries multiple times over a short period.
* If all attempts fail, the webhook is marked as permanently failed.
### Recovery options
* Retrieve the transcription result manually using the transcription ID from the create transcription request.
Example:
```sh
curl https://api.soniox.com/v1/async/transcriptions/ \
-H "Authorization: Bearer $SONIOX_API_KEY"
```
# Limits & quotas
URL: /stt/async/limits-and-quotas
Learn about async API limits and quotas.
## File limits
| Limit | Default | Notes |
| ------------------ | --------------- | -------------------------------------- |
| Total file storage | **10 GB** | Across all uploaded files |
| Uploaded files | **1,000** | Maximum number of files stored at once |
| File duration | **300 minutes** | Cannot be increased |
You must **manually delete files** after obtaining transcription results. Files are
**not deleted automatically**.
Limit increases (except file duration) can be requested in the [Soniox Console](https://console.soniox.com).
***
## Transcription limits
| Limit | Default | Notes |
| ---------------------- | --------------- | --------------------------------------- |
| Pending transcriptions | **100** | Requests created but not yet processing |
| Total transcriptions | **2,000** | Includes pending + completed + failed |
| File duration | **300 minutes** | Cannot be increased |
To keep creating new transcriptions:
* Stay below **100 pending** at a time
* Remove completed/failed transcriptions so total stays **under 2,000**
Limit increases (except file duration) can be requested in the [Soniox Console](https://console.soniox.com).
# Webhooks
URL: /stt/async/webhooks
Learn how to setup webhooks for Soniox Speech-to-Text API.
## Overview
Soniox supports webhooks to notify your service when a transcription job is complete or
if an error occurs. This enables fully asynchronous processing — no need to poll the API.
When you provide a webhook URL in your transcription request, Soniox will send a POST request
to that URL once the transcription finishes or fails.
***
## How it works
1. You start an asynchronous transcription job with a webhook URL.
2. Soniox processes the audio in the background.
3. When the job completes (or if an error occurs), Soniox sends a POST request to your webhook endpoint with the result.
***
## Set up a webhook for a transcription
To use a webhook, simply pass the `webhook_url` parameter when creating a transcription job.
The URL must be publicly accessible from Soniox servers.
```json
{
"webhook_url": "https://example.com/webhook"
}
```
During development, you can test webhooks on your local machine using tools like [Cloudflare
tunnel](https://developers.cloudflare.com/pages/how-to/preview-with-cloudflare-tunnel/),
[ngrok](https://ngrok.com/) or [VS Code port
forwarding](https://code.visualstudio.com/docs/editor/port-forwarding).
***
## Handle webhook requests
When a transcription is complete (or if an error occurs), Soniox sends a POST
request to your webhook URL with the following parameters:
The transcription ID that was assigned when the job was created.
Status of the transcription. Possible values: completed or error.
### Example
```json
{
"id": "548d023b-2b3d-4dc2-a3ef-cca26d05fd9a",
"status": "completed"
}
```
***
## Add authentication to webhooks
You can secure your webhook endpoint by requiring an authentication header. Soniox allows you to include
a custom HTTP header in every webhook request by setting the following parameters when creating a transcription:
The name of the HTTP header to include in the webhook request. For example, use Authorization for standard auth headers.
The value of the header to include. This could be an API key, bearer token, or any secret that your server expects.
When Soniox sends the webhook request, it will include the specified header,
allowing your server to verify that the request came from Soniox.
### Example
```json
{
"webhook_url": "https://example.com/webhook",
"webhook_auth_header_name": "Authorization",
"webhook_auth_header_value": "Bearer "
}
```
***
## Add metadata to webhook deliveries
You can attach custom metadata (e.g. customer ID, request ID) to the webhook by
encoding it in the URL as query parameters:
```text
https://example.com/webhook?customer_id=1234&order_id=5678
```
These parameters will be included in the webhook request URL, helping you
associate the callback with the original request.
***
## Failed webhook delivery and retries
Webhook delivery may fail if your server is unavailable or does not respond in time.
If delivery fails, Soniox will automatically retry multiple times over a short period.
If all attempts fail, the delivery is considered permanently failed.
You can still retrieve the transcription result manually using the transcription
ID returned when the job was created.
We recommend logging transcription IDs on your side in case webhook delivery fails
and you need to fetch results manually.
# Confidence scores
URL: /stt/concepts/confidence-scores
Learn how to use confidence score of recognized tokens.
## Overview
Soniox Speech-to-Text AI provides a **confidence score** for every recognized token (word or sub-word) in the transcript.
The confidence score represents the model’s estimate of how likely the token was recognized correctly.
Confidence values are floating-point numbers between **0.0** and **1.0**:
* **1.0** → very high confidence.
* **0.0** → very low confidence.
Low confidence values typically occur when recognition is uncertain due to factors like background noise, heavy accents, unclear speech, or uncommon vocabulary.
You can use confidence scores to:
* Assess overall transcription quality.
* Flag or highlight uncertain words in a transcript.
* Trigger post-processing, e.g., request user confirmation or re-check with additional context.
**Confidence scores are always included** by default — no extra configuration needed.
***
## Output format
Each token in the API response includes:
* `text` → the recognized token.
* `confidence` → the confidence score for that token.
***
## Example response
In this example, the word **“Beautiful”** is split into three tokens, each with its own confidence score:
```json
{
"tokens": [
{"text": "Beau", "confidence": 0.82},
{"text": "ti", "confidence": 0.87},
{"text": "ful", "confidence": 0.98}
]
}
```
# Context
URL: /stt/concepts/context
Learn how to use custom context to enhance trancription accuracy.
## Overview
Soniox Speech-to-Text AI lets you improve both **transcription** and
**translation** accuracy by providing **context** with each session.
Context helps the model **understand your domain**, **recognize important terms**,
and apply **custom vocabulary** and **translation preferences**.
Think of it as giving the model **your world** —
what the conversation is about, which words are important, and how certain terms should be translated.
***
## Context sections
You provide context through the `context` object that can include up to **four sections**,
each improving accuracy in different ways:
| Section | Type | Description |
| ------------------- | --------------------- | -------------------------------------------------------------- |
| `general` | array of JSON objects | Structured key-value information (domain, topic, intent, etc.) |
| `text` | string | Longer free-form background text or related documents |
| `terms` | array of strings | Domain-specific or uncommon words |
| `translation_terms` | array of JSON objects | Custom translations for ambiguous terms |
All sections are optional — include only what's relevant for your use case.
### General
General information provides **baseline context** which guides the AI model.
It helps the model adapt its vocabulary to the correct domain, improving **transcription** and
**translation** quality and clarifying ambiguous words.
It consists of structured **key-value pairs** describing the conversation **domain**, **topic**, **intent**, and other
relevant metadata such as participant's names, organization, setting, location, etc.
#### Example
```json
{
"context": {
"general": [
{ "key": "domain", "value": "Healthcare" },
{ "key": "topic", "value": "Diabetes management consultation" },
{ "key": "doctor", "value": "Dr. Martha Smith" },
{ "key": "patient", "value": "Mr. David Miller" },
{ "key": "organization", "value": "St John's Hospital" }
]
}
}
```
### Text
Provide longer unstructured text that expands on general information — examples include:
* History of prior interactions with a customer.
* Reference documents.
* Background summaries.
* Meeting notes.
#### Example
```json
{
"context": {
"text": "The customer, Maria Lopez, contacted BrightWay Insurance to update
her auto policy after purchasing a new vehicle. Agent Daniel Kim reviewed the
changes, explained the premium adjustment, and offered a bundling discount.
Maria agreed to update the policy and scheduled a follow-up to consider
additional options."
}
}
```
### Transcription terms
Improve transcription accuracy of important or uncommon words and phrases
that you expect in the audio — such as:
* Domain or industry-specific terminology.
* Brand or product names.
* Rare, uncommon, or invented words.
#### Example
```json
{
"context": {
"terms": [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium"
]
}
}
```
### Translation terms
Control how specific words or phrases are translated — useful for:
* Technical terminology.
* Entity names.
* Words with ambiguous domain-specific translations.
* Idioms and figurative speech with non-literal meaning.
#### Example for English → Spanish translation
```json
{
"context": {
"translation_terms": [
{ "source": "Mr. Smith", "target": "Sr. Smith" },
{ "source": "MRI", "target": "RM" },
{ "source": "St John's", "target": "St John's" },
{ "source": "stroke", "target": "ictus" }
]
}
}
```
***
## Tips
* Start with `general` context to provide broad information about the audio, such as **domain**, **topic**, or **setting**.
This helps the model understand what the audio is about, without needing prior knowledge of exact words or phrases.
* Key-value pairs in `general` can be arbitrary, but keep them relatively short — ideally **10 or fewer**.
* If specific words or names are important, add them to `terms`.
This ensures consistent spelling and casing for difficult entities.
* Use `text` context only for large supporting documents. It is less influential than `general` or `terms`.
* `translation_terms` are only valuable for translation, otherwise use `terms`.
* If you want translated names or brands unchanged, specify them like `"St John's"` → `"St John's"`.
#### Example: Restaurant takeaway order
```json
{
"context": {
"general": [
{ "key": "restaurant", "value": "Spice India" },
{ "key": "location", "value": "London, UK" },
{ "key": "setting", "value": "Phone ordering" },
{ "key": "topic", "value": "Customer placing a takeaway order" }
],
"terms": [
"butter chicken",
"paneer tikka",
"naan",
"biryani",
"tandoori chicken",
"masala dosa",
"samosa",
"mango lassi"
],
"text": "Spice India is a casual Indian restaurant serving
a variety of traditional and popular dishes from across India.
Customers can order flavorful curries, grilled specialties,
rice dishes, and vegetarian options.
The restaurant offers takeaway and delivery services.
Customers typically call to ask about menu options,
portion sizes, dietary preferences, or popular dishes.
Conversations often focus on ordering food efficiently
and clarifying customer choices."
}
}
```
***
## Experimental features
* **Improving language detection:** The primary method for ensuring consistent single language output are
[language restrictions](/stt/concepts/language-restrictions). In rare cases, when the model still transcribes
in the wrong language, `general` context can help by including **language** and **instructions** keys or
adding `terms` in the correct language.
#### Example for English language
```json
{
"context": {
"general": [
{
"key": "language",
"value": "English"
},
{
"key": "instructions",
"value": "Conversation is in English. Output transcription only in English language."
}
],
"terms": [
"concurrency",
"polymorphism",
"serialization",
"idempotency"
]
}
}
```
* **Improving speaker diarization:** Providing speaker information in `general` context can help
the model more reliably separate voices.
#### Example
```json
{
"context": {
"general": [
{ "key": "setting", "value": "Talk show interview" },
{ "key": "topic", "value": "How AI is transforming the modern business landscape" },
{ "key": "speakers", "value": "2 speakers (1 male host, 1 female guest)" }
]
}
}
```
***
## Context size limit
* Maximum **8,000 tokens** (\~10,000 characters).
* Supports large blocks of text: glossaries, scripts, domain summaries.
* If you exceed the limit, the API will return an error → trim or summarize first.
# Language hints
URL: /stt/concepts/language-hints
Learn about supported languages and how to specify language hints.
## Overview
Soniox Speech-to-Text AI is a powerful, multilingual model that transcribes
speech in **60+ languages** with world-leading accuracy.
By default, you don’t need to pre-select a language — the model automatically
detects and transcribes any supported language. It also handles **multilingual
speech seamlessly,** even when multiple languages are mixed within a single
sentence or conversation.
When you already know which languages are most likely to appear in your audio,
you **should provide language hints** to guide the model. This **improves accuracy** by
biasing recognition toward the specified languages, while still allowing other
languages to be detected if present.
***
## How language hints work
* Use the `language_hints` parameter to provide a list of expected **ISO language codes** (e.g., `en` for English, `es` for Spanish).
* Language hints **do not restrict** recognition to those languages — they only **bias** the model toward them.
**Example: Hinting English and Spanish**
```json
{
"language_hints": ["en", "es"]
}
```
This biases transcription toward **English** and **Spanish,** while still allowing other languages to be detected if spoken.
***
## When to use language hints
Provide `language_hints` when:
* You know or expect certain languages in the audio.
* You want to improve accuracy for specific languages.
* Audio includes **uncommon** or **similar-sounding** languages.
* You’re transcribing content for a **specific audience or market.**
***
## Supported languages
See the list of [supported languages](/stt/concepts/supported-languages) and their ISO codes.
# Language identification
URL: /stt/concepts/language-identification
Learn how to identify one or more spoken languages within an audio.
## Overview
Soniox Speech-to-Text AI can **automatically identify spoken languages** within an
audio stream — whether the speech is entirely in one language or mixes multiple
languages. This lets you handle **real-world multilingual conversations** naturally
and accurately, without requiring users to specify languages in advance.
***
## How it works
Language identification in Soniox is performed **at the token level.** Each token
in the transcript is tagged with a language code. However, the model is trained
to maintain **sentence-level coherence,** not just word-level decisions.
***
### Examples
**Example 1: embedded foreign word**
```json
[en] Hello, my dear amigo, how are you doing?
```
All tokens are labeled as English (`en`), even though “amigo” is Spanish.
**Example 2: distinct sentences in multiple languages**
```json
[en] How are you?
[de] Guten Morgen!
[es] Cómo está everyone?
[en] Great! Let’s begin with the agenda.
```
Here, language tags align with sentence boundaries, making the transcript easier to read and interpret in multilingual conversations.
***
## Enabling language identification
Enable automatic language identification by setting the flag in your request:
```json
{
"enable_language_identification": true
}
```
***
## Output format
When enabled, each token includes a language field alongside the text:
```json
{"text": "How", "language": "en"}
{"text": " are", "language": "en"}
{"text": " you", "language": "en"}
{"text": "?", "language": "en"}
{"text": "Gu", "language": "de"}
{"text": "ten", "language": "de"}
{"text": " Morgen", "language": "de"}
{"text": "!", "language": "de"}
{"text": "Cómo", "language": "es"}
{"text": " está", "language": "es"}
{"text": " every", "language": "es"}
{"text": "one", "language": "es"}
{"text": "?", "language": "es"}
```
***
## Language hints
Use [Language hints](/stt/concepts/language-hints) whenever possible to improve the accuracy of language identification.
***
## Real-time considerations
Language identification in **real-time** is more challenging due to
low-latency constraints. The model has less context available, which may cause:
* Temporary misclassification of language.
* Language tags being **revised** as more speech context arrives.
Despite this, Soniox provides highly reliable detection of language switches in real-time.
***
## Supported languages
Language identification is available for all [supported languages](/stt/concepts/supported-languages).
# Language restrictions
URL: /stt/concepts/language-restrictions
Understand how to restrict the model to avoid accidental transcription in unwanted languages
## Overview
Soniox Speech-to-Text AI supports **restricting recognition to specific languages**. This is useful when your application expects speech in a known language and you want to **avoid accidental transcription in other languages**, especially in cases of heavy accents or ambiguous pronunciation.
Language restriction is **best-effort, not a hard guarantee**. While the model is strongly biased toward the specified languages, it may still occasionally output another language in rare edge cases. In practice, this happens very infrequently when configured correctly.
***
## How language restrictions work
Language restriction is enabled using two parameters:
* `language_hints`
A list of expected spoken languages, provided as ISO language codes (e.g. `en` for English, `es` for Spanish).
* `language_hints_strict`
A boolean flag that enables language restriction based on the provided hints.
When `language_hints_strict` is set to `true`, the model will strongly prefer producing output **only in the specified languages**.
Best results are achieved when specifying a single language.
***
## Recommended usage
### ✅ Use a single language whenever possible
Language restriction is most robust when **only one language** is provided. This is strongly recommended for production use.
For example, restricting to English only:
```json
{
"language_hints": ["en"],
"language_hints_strict": true
}
```
### ⚠️ Multiple languages reduce robustness
You may specify multiple languages, but accuracy can degrade when language identification becomes ambiguous, especially with heavy accents or acoustically similar languages.
Example (English + Spanish):
```json
{
"language_hints": ["en", "es"],
"language_hints_strict": true
}
```
In difficult cases (e.g. heavily accented English spoken by a Hindi speaker), the model may still choose the “wrong” language and transcribe using the wrong script. This is why **single-language restriction is strongly recommended** when correctness is critical.
***
## When to use language restrictions
Use language restriction when:
* Your application expects **only one known language**
* You want to **avoid transliteration into the wrong alphabet**
* You want **higher accuracy** than using `language_hints` alone
* You are processing speech with **strong accents**
Language restriction provides a stronger signal than language hints without restriction.
***
## Language identification behavior
Automatic language identification is still technically active when language restriction is enabled. However:
**Language restriction is intended for cases where the spoken language is already known.**
If you need full automatic language detection across many languages, do not enable strict language restriction.
***
## Supported languages
See the full list of supported languages and their ISO codes in the [supported languages](/stt/concepts/supported-languages) section.
***
## Supported models
Language restriction is supported on:
* `stt-rt-v4`
* `stt-rt-v3`
* `stt-async-v4`
# Speaker diarization
URL: /stt/concepts/speaker-diarization
Learn how to separate speakers in both real-time and asynchronous processing.
## Overview
Soniox Speech-to-Text AI supports **speaker diarization** — the ability to
automatically detect and separate speakers in an audio stream. This allows you
to generate **speaker-labeled transcripts** for conversations, meetings, interviews,
podcasts, and other multi-speaker scenarios — without any manual labeling or
extra metadata.
***
## What is speaker diarization?
Speaker diarization answers the question: **Who spoke when?**
When enabled, Soniox automatically detects speaker changes and assigns each
spoken segment to a speaker label (e.g., `Speaker 1`, `Speaker 2`). This lets you
structure transcripts into clear, speaker-attributed sections.
### Example
Input audio:
```text
How are you? I am fantastic. What about you? Feeling great today. Hey everyone!
```
Output with diarization enabled:
```text
Speaker 1: How are you?
Speaker 2: I am fantastic. What about you?
Speaker 1: Feeling great today.
Speaker 3: Hey everyone!
```
***
## How to enable speaker diarization
Enable diarization by setting this parameter in your API request:
```json
{
"enable_speaker_diarization": true
}
```
***
## Output format
When speaker diarization is enabled, each token includes a `speaker` field:
```json
{"text": "How", "speaker": "1"}
{"text": " are", "speaker": "1"}
{"text": " you", "speaker": "1"}
{"text": "?", "speaker": "1"}
{"text": "I", "speaker": "2"}
{"text": " am", "speaker": "2"}
{"text": " fan", "speaker": "2"}
{"text": "tastic", "speaker": "2"}
{"text": ".", "speaker": "2"}
```
You can group tokens by speaker in your application to create readable segments, or display speaker labels directly in your UI.
***
## Real-time considerations
Real-time speaker diarization is more challenging due to low-latency constraints. You may observe:
* Higher speaker attribution errors compared to async mode.
* Temporary speaker switches that stabilize as more context is available.
Even with these limitations, real-time diarization is valuable for
**live meetings, conferences, customer support calls, and conversational AI
interfaces.**
***
## Number of supported speakers
* Up to **15 different speakers** are supported per transcription session.
* Accuracy may decrease when many speakers have **similar voice characteristics.**
***
## Best practice
For the most accurate and reliable speaker separation, use **asynchronous
transcription** — it provides significantly higher diarization accuracy because
the model has access to the full audio context. Real-time diarization is best
when you need immediate speaker attribution, but expect lower accuracy due to
low-latency constraints.
***
## Supported languages
Speaker diarization is available for all [supported languages](/stt/concepts/supported-languages).
# Supported languages
URL: /stt/concepts/supported-languages
List of supported languages by Soniox Speech-to-Text AI.
## Overview
Soniox Speech-to-Text AI supports **transcription and translation in 60+ languages** with world-leading accuracy — all powered by a **single, unified AI model.**
* **Transcription** → Available in every supported language.
* **Translation** → Works between any pair of supported languages.
All languages are available in both:
* **Real-time API** → Stream live audio with transcription + translation.
* **Async API** → Transcribe recorded files at scale.
To programmatically retrieve the full list of supported languages, use the [Get models](/stt/api-reference/models/get_models) endpoint.
For detailed accuracy comparisons, see our [Benchmark Report](https://soniox.com/media/SonioxSTTBenchmarks2025.pdf).
***
## Supported languages
| Language | ISO Code |
| ----------- | -------- |
| Afrikaans | af |
| Albanian | sq |
| Arabic | ar |
| Azerbaijani | az |
| Basque | eu |
| Belarusian | be |
| Bengali | bn |
| Bosnian | bs |
| Bulgarian | bg |
| Catalan | ca |
| Chinese | zh |
| Croatian | hr |
| Czech | cs |
| Danish | da |
| Dutch | nl |
| English | en |
| Estonian | et |
| Finnish | fi |
| French | fr |
| Galician | gl |
| German | de |
| Greek | el |
| Gujarati | gu |
| Hebrew | he |
| Hindi | hi |
| Hungarian | hu |
| Indonesian | id |
| Italian | it |
| Japanese | ja |
| Kannada | kn |
| Kazakh | kk |
| Korean | ko |
| Latvian | lv |
| Lithuanian | lt |
| Macedonian | mk |
| Malay | ms |
| Malayalam | ml |
| Marathi | mr |
| Norwegian | no |
| Persian | fa |
| Polish | pl |
| Portuguese | pt |
| Punjabi | pa |
| Romanian | ro |
| Russian | ru |
| Serbian | sr |
| Slovak | sk |
| Slovenian | sl |
| Spanish | es |
| Swahili | sw |
| Swedish | sv |
| Tagalog | tl |
| Tamil | ta |
| Telugu | te |
| Thai | th |
| Turkish | tr |
| Ukrainian | uk |
| Urdu | ur |
| Vietnamese | vi |
| Welsh | cy |
# Timestamps
URL: /stt/concepts/timestamps
Learn how to use timestamps and understand their granularity.
## Overview
Soniox Speech-to-Text AI provides **precise timestamps** for every recognized token (word or sub-word).
Timestamps let you align transcriptions with audio, so you know exactly when each word was spoken.
**Timestamps are always included** by default — no extra configuration needed.
***
## Output format
Each token in the response includes:
* `text` → The recognized token.
* `start_ms` → Token start time (in milliseconds).
* `end_ms` → Token end time (in milliseconds).
***
## Example response
In this example, the word **“Beautiful”** is split into three tokens, each with its own timestamp range:
```json
{
"tokens": [
{"text": "Beau", "start_ms": 300, "end_ms": 420},
{"text": "ti", "start_ms": 420, "end_ms": 540},
{"text": "ful", "start_ms": 540, "end_ms": 780}
]
}
```
# Soniox Live
URL: /stt/demo-apps/soniox-live
Demo apps showing how to add Soniox to your product
import Image from "next/image";
## Overview
Soniox Live is a **demo app** that shows how to stream audio from your microphone
directly to the Soniox [Real-time API](/stt/api-reference/websocket-api) for instant transcription and translation.
This is not the [Soniox App](https://soniox.com/soniox-app) (our end-user product). Instead, it is a
**reference implementation** for developers who want to learn how to embed
Soniox into their own web or mobile applications.
## Features
* Stream audio from your mic to Soniox in real time
* Low-latency, high-accuracy transcription in 60+ languages
* Low-latency speech translation to 60+ languages
* Runs in the browser (React) and mobile (React Native)
* Lightweight server issues temporary client keys for secure access
## Usage flow
1. Tap **Start** to begin streaming from your mic
2. **Live captions** appear word by word, then finalize
3. Toggle **Translation** and choose a target language for live translated captions
4. Tap **Stop** to end the session
## Architecture
* **Server (Python):** Stores your secret Soniox API key and issues **temporary API keys** to clients
* **Frontend (React & React Native):** Requests a temporary API key from your server,
then streams microphone audio directly to Soniox servers for real-time transcription and translation
We provide all the implementations with links to GitHub:
* [Python server](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/apps/soniox-live-demo/server)
* [React frontend](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/apps/soniox-live-demo/react) (web)
* [React Native frontend](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/apps/soniox-live-demo/react-native) (mobile)
# API reference
URL: /stt/api-reference
Soniox Speech-to-Text API delivers highly accurate, scalable audio transcription.
## REST API
REST API is available at [https://api.soniox.com/v1](https://api.soniox.com/v1/docs) and is divided into:
* **[Auth API](/stt/api-reference/auth/create_temporary_api_key)**: Create temporary API keys.
* **[Files API](/stt/api-reference/files/get_files)**: Manage audio files by uploading, listing, retrieving, and deleting them.
* **[Models API](/stt/api-reference/models/get_models)**: List available models.
* **[Transcriptions API](/stt/api-reference/transcriptions/get_transcriptions)**: Create and manage transcriptions for audio files uploaded via the Files API.
OpenAPI schema: [https://api.soniox.com/v1/openapi.json](https://api.soniox.com/v1/openapi.json)
***
## WebSocket API
[WebSocket API](/stt/api-reference/websocket-api) transcribes and translates live audio streams — such as
conference calls, broadcasts, or direct microphone input — over a WebSocket
connection.
***
See the [Get started](/stt/get-started) page for an introduction on how to integrate with Soniox API.
# WebSocket API
URL: /stt/api-reference/websocket-api
Learn how to use and integrate Soniox Speech-to-Text WebSocket API.
import { Badge } from "@openapi/ui/components/method-label";
## Overview
The **Soniox WebSocket API** provides real-time **transcription and translation** of
live audio with ultra-low latency. It supports advanced features like **speaker
diarization, context customization,** and **manual finalization** — all over a
persistent WebSocket connection. Ideal for live scenarios such as meetings,
broadcasts, multilingual communication, and voice interfaces.
***
## WebSocket endpoint
Connect to the API using:
```text
wss://stt-rt.soniox.com/transcribe-websocket
```
***
## Configuration
Before streaming audio, configure the transcription session by sending a JSON message such as:
```json
{
"api_key": "",
"model": "stt-rt-preview",
"audio_format": "auto",
"language_hints": ["en", "es"],
"context": {
"general": [
{ "key": "domain", "value": "Healthcare" },
{ "key": "topic", "value": "Diabetes management consultation" },
{ "key": "doctor", "value": "Dr. Martha Smith" },
{ "key": "patient", "value": "Mr. David Miller" },
{ "key": "organization", "value": "St John's Hospital" }
],
"text": "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
"terms": [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium"
],
"translation_terms": [
{ "source": "Mr. Smith", "target": "Sr. Smith" },
{ "source": "St John's", "target": "St John's" },
{ "source": "stroke", "target": "ictus" }
]
},
"enable_speaker_diarization": true,
"enable_language_identification": true,
"translation": {
"type": "two_way",
"language_a": "en",
"language_b": "es"
}
}
```
***
### Parameters
Your Soniox API key. Create API keys in the [Soniox Console](https://console.soniox.com/). For client apps,
generate a [temporary API](/stt/api-reference/auth/create_temporary_api_key)
key from your server to keep secrets secure.
Real-time model to use. See [models](/stt/models).
Example: `"stt-rt-preview"`
Audio format of the stream. See [audio
formats](/stt/rt/real-time-transcription#audio-formats).
Required for raw audio formats. See [audio
formats](/stt/rt/real-time-transcription#audio-formats).
Required for raw audio formats. See [audio
formats](/stt/rt/real-time-transcription#audio-formats).
See [language hints](/stt/concepts/language-hints).
See [language restrictions](/stt/concepts/language-restrictions).
See [context](/stt/concepts/context).
See [speaker diarization](/stt/concepts/speaker-diarization).
See [language identification](/stt/concepts/language-identification).
See [endpoint detection](/stt/rt/endpoint-detection).
Must be between 500 and 3000. Default value is 2000.
See [endpoint detection](/stt/rt/endpoint-detection).
Optional identifier to track this request (client-defined).
See [real-time translation](/stt/rt/real-time-translation).
**One-way translation**
Must be set to `one_way`.
Language to translate the transcript into.
**Two-way translation**
Must be set to `two_way`.
First language for two-way translation.
Second language for two-way translation.
***
## Audio streaming
After configuration, start streaming audio:
* Send audio as binary WebSocket frames.
* Each stream supports up to 300 minutes of audio.
***
## Ending the stream
To gracefully close a streaming session:
* Send an **empty WebSocket frame** (binary or text).
* The server will return one or more responses, including [finished response](#finished-response), and then close the connection.
***
## Response
Soniox returns **responses** in JSON format. A typical successful response looks like:
```json
{
"tokens": [
{
"text": "Hello",
"start_ms": 600,
"end_ms": 760,
"confidence": 0.97,
"is_final": true,
"speaker": "1"
}
],
"final_audio_proc_ms": 760,
"total_audio_proc_ms": 880
}
```
### Field descriptions
List of processed tokens (words or subwords).
Each token may include:
Token text.
Start timestamp of the token (in milliseconds). Not included if `translation_status` is `translation`.
End timestamp of the token (in milliseconds). Not included if `translation_status` is `translation`.
Confidence score (`0.0`–`1.0`).
Whether the token is finalized.
Speaker label (if diarization enabled).
See [real-time translation](/stt/rt/real-time-translation).
Language of the `token.text`.
See [real-time translation](/stt/rt/real-time-translation).
Audio processed into final tokens.
Audio processed into final + non-final tokens.
***
## Finished response
At the end of a stream, Soniox sends a **final message** to indicate the session is complete:
```json
{
"tokens": [],
"final_audio_proc_ms": 1560,
"total_audio_proc_ms": 1680,
"finished": true
}
```
After this, the server closes the WebSocket connection.
***
## Error response
If an error occurs, the server returns an **error message** and immediately closes the connection:
```json
{
"tokens": [],
"error_code": 503,
"error_message": "Cannot continue request (code N). Please restart the request. ..."
}
```
Standard HTTP status code.
A description of the error encountered.
Full list of possible error codes and messages:
400
Bad request
>
}
>
The request is malformed or contains invalid parameters.
* `Audio data channels must be specified for PCM formats`
* `Audio data sample rate must be specified for PCM formats`
* `Audio decode error`
* `Audio is too long.`
* `Client reference ID is too long (max length 256)`
* `Context is too long (max length 10000).`
* `Control request invalid type.`
* `Control request is malformed.`
* `Invalid audio data format: avi`
* `Invalid base64.`
* `Invalid language hint.`
* `Invalid model specified.`
* `Invalid translation target language.`
* `Language hints must be unique.`
* `Missing audio format. Specify a valid audio format (e.g. s16le, f32le, wav, ogg, flac...) or "auto" for auto format detection.`
* `Model does not support translations.`
* `No audio received.`
* `Prompt too long for model`
* `Received too much audio data in total.`
* `Start request is malformed.`
* `Start request must be a text message.`
401
Unauthorized
>
}
>
Authentication is missing or incorrect. Ensure a valid API key is provided before retrying.
* `Invalid API key.`
* `Invalid/expired temporary API key.`
* `Missing API key.`
402
Payment required
>
}
>
The organization's balance or monthly usage limit has been reached.
Additional credits are required before making further requests.
* `Organization balance exhausted. Please either add funds manually or enable autopay.`
* `Organization monthly budget exhausted. Please increase it.`
* `Project monthly budget exhausted. Please increase it.`
408
Request timeout
>
}
>
The client did not send a start message or sufficient audio data within the required timeframe.
The connection was closed due to inactivity.
* `Audio data decode timeout`
* `Input too slow`
* `Request timeout.`
* `Start request timeout`
* `Timed out while waiting for the first audio chunk`
429
Too many requests
>
}
>
A usage or rate limit has been exceeded. You may retry after a delay or request
an increase in limits via the Soniox Console.
* `Rate limit for your organization has been exceeded.`
* `Rate limit for your project has been exceeded.`
* `Your organization has exceeded max number of concurrent requests.`
* `Your project has exceeded max number of concurrent requests.`
500
Internal server error
>
}
>
An unexpected server-side error occurred. The request may be retried.
* `The server had an error processing your request. Sorry about that! You can retry your request, or contact us through our support email support@soniox.com if you keep seeing this error.`
503
Service unavailable
>
}
>
Cannot continue request or accept new requests.
* `Cannot continue request (code N). Please restart the request. Refer to: https://soniox.com/url/cannot-continue-request`
# Connection keepalive
URL: /stt/rt/connection-keepalive
Learn how connection keepalive works.
import { LuTriangleAlert } from "react-icons/lu";
## Overview
In real-time transcription, you may have periods where you are not sending any audio frames — for example when
using client-side VAD (voice activity detection), during pauses in speech, or
when you intentionally stop streaming audio.
To keep the session alive and preserve context, you must send a **keepalive control message:**
```json
{"type": "keepalive"}
```
This prevents the WebSocket connection from timing out when no audio is being sent.
***
## When to use
Send a keepalive message whenever:
* You only stream audio during speech (client-side VAD).
* You temporarily pause audio streaming but want to keep the session active.
This ensures that:
* The connection stays open.
* Session context (e.g., speaker labels, language tracking, prompt) is preserved.
***
## Key points
* **Send at least once every 20 seconds** when not sending audio.
* You may send more frequently (every 5–10s is common).
* If no keepalive or audio is received for >20s, the connection may be closed.
* You are charged for the **full stream duration,** not just the audio processed.
# Endpoint detection
URL: /stt/rt/endpoint-detection
Learn how speech endpoint detection works.
## Overview
{/*
Endpoint detection lets you know when a speaker has finished speaking.
This is critical for real-time voice AI assistants, command-and-response
systems, and conversational apps where you want to respond **immediately** without
waiting for long silences.
When enabled, Soniox automatically detects natural pauses and emits a special `` token at the end of an utterance.
*/}
Endpoint detection lets you know when a speaker has finished speaking. This is
critical for real-time voice AI assistants, command-and-response systems, and
conversational apps where you want to respond immediately without waiting for
long silences.
Unlike traditional endpoint detection based on voice activity detection (VAD),
Soniox provides semantic endpointing where the speech model listens to intonations, pauses, and
conversational context to determine when an utterance has ended. This makes it
far more advanced — delivering **lower latency, fewer false triggers,** and a
noticeably **smoother product experience.**
***
## How it works
When `enable_endpoint_detection` is **enabled**:
* Soniox monitors pauses in speech to determine the end of an utterance.
* As soon as speech ends:
* **All preceding tokens** are marked as final.
* A special `` **token** is returned.
* The `` token:
* Always appears **once** at the end of the segment.
* Is **always final**.
* Can be treated as a reliable signal to trigger downstream logic (e.g., calling an LLM or executing a command).
***
## Enabling endpoint detection
Add the flag in your real-time request:
```json
{
"enable_endpoint_detection": true
}
```
***
## Example
1. **Streaming phase:** tokens are delivered in real-time as the user
speaks. They are marked `is_final: false`, meaning the transcript is still being
processed and may change.
2. **Endpoint detection:** once the speaker stops, the model recognizes the end of the utterance.
3. **Finalization phase:** previously non-final tokens are re-emitted with `is_final: true`, followed by the `` token (also final).
4. **Usage tip:** display non-final tokens immediately for live captions, but switch to final tokens once `` arrives before triggering any downstream actions.
***
## Controlling endpoint delay
In addition to semantic endpoint detection, you can also control the maximum delay between
the end of speech and returned endpoint using `max_endpoint_delay_ms`.
Lower values cause the endpoint to be returned sooner.
Allowed values for maximum delay are between 500ms and 3000ms.
The default value is 2000ms.
Example configuration:
```json
{
"max_endpoint_delay_ms": 500
}
```
# Error handling
URL: /stt/rt/error-handling
Learn about real-time API error handling.
In the Soniox Real-time API, all errors are returned as JSON error responses before the
connection is closed. Your application should always log and inspect these
errors to determine the cause.
## Error responses
If an error occurs, Soniox will:
1. Send an error response containing an **error code** and **error message.**
2. Immediately close the WebSocket connection.
Example:
```json
{
"error_code": 400,
"error_message": "Invalid model specified."
}
```
Always print out or log the error response to capture both the code and message.
The complete list of error codes and their meanings can be found under [Error codes](/stt/api-reference/websocket-api#error-response).
***
## Request termination
Real-time sessions run on a **best-effort basis.**
While most sessions last until the maximum supported audio duration (see [Limits & quotas](/stt/rt/limits-and-quotas)), early termination may occur.
If a session is closed early, you’ll receive a 503 error:
```text
Cannot continue request (code N). Please restart the request.
```
Your application should:
* Detect this error.
* Immediately start a **new request** to continue streaming.
***
## Real-time cadence
You should send audio data to Soniox **in real-time or near real-time speed.** Small
deviations are tolerated — such as brief buffering or network jitter — but
prolonged bursts or lags may result in disconnection.
# Limits & quotas
URL: /stt/rt/limits-and-quotas
Learn about real-time API limits and quotas.
## WebSocket API limits
Soniox applies limits to real-time WebSocket sessions to ensure stability and fair use.
Make sure your application respects these constraints and implements graceful recovery when a limit is reached.
| Limit | Value | Notes |
| ------------------- | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| Requests per minute | 100 | Exceeding this may result in rate limiting |
| Concurrent requests | 10 | Maximum number of simultaneous active WebSocket connections |
| Stream duration | 300 minutes | Each real-time session is capped at 300 minutes. To continue beyond this, open a new session. This limit is fixed and cannot be increased |
You can request higher limits (except for stream duration) in the [Soniox Console](https://console.soniox.com/).
# Manual finalization
URL: /stt/rt/manual-finalization
Learn how manual finalization works.
import { LuTriangleAlert } from "react-icons/lu";
## Overview
Soniox supports **manual finalization** in addition to automatic mechanisms like
[endpoint detection](/stt/rt/endpoint-detection). Manual finalization
gives you precise control over when audio should be finalized — useful for:
* Push-to-talk systems.
* Client-side voice activity detection (VAD).
* Segment-based transcription pipelines.
* Applications where automatic endpoint detection is not ideal.
***
## How to finalize
Send a control message over the WebSocket connection:
```json
{"type": "finalize"}
```
When received:
* Soniox finalizes all audio up to that point.
* All tokens from that audio are returned with `"is_final": true`.
* The model emits a special marker token:
```json
{"text": "", "is_final": true}
```
The `` token signals that finalization is complete.
***
## Key points
* You can call `finalize` multiple times per session.
* You may continue streaming audio after each `finalize` call.
* The `` token is always returned as final and can be used to trigger downstream processing.
* Do not send `finalize` too frequently (every few seconds is fine; too often may cause disconnections).
* Call `finalize` only after sending approximately 200ms of silence following
the end of speech to balance high accuracy and low latency. Adjust the VAD sensitivity accordingly. Triggering `finalize` too early can degrade model accuracy.
* You are charged for the **full stream duration,** not just the audio processed.
***
## Connection keepalive
Combine with [connection keepalive](/stt/rt/connection-keepalive): use keepalive messages to prevent timeouts when no audio is being sent (e.g., during long pauses).
# Real-time transcription
URL: /stt/rt/real-time-transcription
Learn about real-time transcription with low latency and high accuracy for all 60+ languages.
## Overview
Soniox Speech-to-Text AI lets you transcribe audio in real time with **low latency**
and **high accuracy** in over 60 languages. This is ideal for use cases like **live
captions, voice assistants, streaming analytics, and conversational AI.**
Real-time transcription is provided through our [WebSocket API](/stt/api-reference/websocket-api), which streams
results back to you as the audio is processed.
***
## How processing works
As audio is streamed into the API, Soniox returns a continuous stream of **tokens** — small units of text such as subwords, words, or spaces.
Each token carries a status flag (`is_final`) that tells you whether the token is **provisional** or **confirmed:**
* **Non-final token** (`is_final: false`) → Provisional text. Appears instantly but may change, disappear, or be replaced as more audio arrives.
* **Final token** (`is_final: true`) → Confirmed text. Once marked final, it will never change in future responses.
This means you get text right away (non-final for instant feedback), followed by the confirmed version (final for stable output).
Non-final tokens may appear multiple times and change slightly until they stabilize into a final token. Final tokens are sent only once and never repeated.
### Example token evolution
Here’s how `"How are you doing?"` might arrive over time:
**Initial guess (non-final):**
```json
{"tokens": [{"text": "How", "is_final": false},
{"text": "'re", "is_final": false}]}
```
**Refined guess (non-final):**
```json
{"tokens": [{"text": "How", "is_final": false},
{"text": " ", "is_final": false},
{"text": "are", "is_final": false}]}
```
**Mixed output (final + non-final):**
```json
{"tokens": [{"text": "How", "is_final": true},
{"text": " ", "is_final": true},
{"text": "are", "is_final": false},
{"text": " ", "is_final": false},
{"text": "you", "is_final": false}]}
```
**Mixed output (final + non-final):**
```json
{"tokens": [{"text": "are", "is_final": true},
{"text": " ", "is_final": true},
{"text": "you", "is_final": true},
{"text": " ", "is_final": false},
{"text": "do", "is_final": false},
{"text": "ing", "is_final": false},
{"text": "?", "is_final": false}]}
```
**Confirmed tokens (final):**
```json
{"tokens": [{"text": " ", "is_final": true},
{"text": "do", "is_final": true},
{"text": "ing", "is_final": true},
{"text": "?", "is_final": true}]}
```
**Bottom line:** The model may start with a shorthand guess like “How’re”, then
refine it into “How are you”, and finally extend it into “How are you doing?”.
Non-final tokens update instantly, while final tokens never change once
confirmed.
***
## Audio progress tracking
Each response also tells you **how much audio has been processed**:
* `audio_final_proc_ms` — audio processed into **final tokens.**
* `audio_total_proc_ms` — audio processed into **final + non-final tokens.**
Example:
```json
{
"audio_final_proc_ms": 4800,
"audio_total_proc_ms": 5250
}
```
**This means:**
* Audio up to **4.8s** has been processed and finalized (final tokens).
* Audio up to **5.25s** has been processed in total (final + non-final tokens).
***
## Getting final tokens sooner
There are two ways to obtain final tokens more quickly:
1. [Endpoint detection](/stt/rt/endpoint-detection) — the model can detect when a speaker has stopped talking and finalize tokens immediately.
2. [Manual finalization](/stt/rt/manual-finalization) — you can send a `"type": "finalize"` message over the WebSocket to force all pending tokens to finalize.
{/* **Example: Transcribe a live audio stream** */}
***
## Audio formats
Soniox supports both **auto-detected formats** (no configuration required) and **raw audio formats** (manual configuration required).
### Auto-detected formats
Soniox can automatically detect common container formats from stream headers.
No configuration needed — just set:
```json
{
"audio_format": "auto"
}
```
Supported auto formats:
```text
aac, aiff, amr, asf, flac, mp3, ogg, wav, webm
```
### Raw audio formats
For raw audio streams without headers, you must provide:
* `audio_format` → encoding type.
* `sample_rate` → sample rate in Hz.
* `num_channels` → number of channels (e.g. 1 (mono) or 2 (stereo)).
**Supported encodings:**
* PCM (signed): `pcm_s8`, `pcm_s16`, `pcm_s24`, `pcm_s32` (`le`/`be`).
* PCM (unsigned): `pcm_u8`, `pcm_u16`, `pcm_u24`, `pcm_u32` (`le`/`be`).
* Float PCM: `pcm_f32`, `pcm_f64` (`le`/`be`).
* Companded: `mulaw`, `alaw`.
**Example: raw PCM (16-bit, 16kHz, mono)**
```json
{
"audio_format": "pcm_s16le",
"sample_rate": 16000,
"num_channels": 1
}
```
***
## Code example
**Prerequisite:** Complete the steps in [Get started](/stt/get-started).
See on GitHub: [soniox\_sdk\_realtime.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python_sdk/soniox_sdk_realtime.py).
```
import os
import argparse
from typing import Optional
from soniox import SonioxClient
from soniox.types import (
RealtimeSTTConfig,
StructuredContext,
TranslationConfig,
StructuredContextGeneralItem,
StructuredContextTranslationTerm,
)
from soniox.utils import render_tokens, start_audio_thread, throttle_audio
def get_config(audio_format: str, translation: Optional[str]) -> RealtimeSTTConfig:
config = RealtimeSTTConfig(
# Select the model to use.
# See: soniox.com/docs/stt/models
model="stt-rt-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
language_hints=["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification=True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization=True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
context=StructuredContext(
general=[
StructuredContextGeneralItem(key="domain", value="Healthcare"),
StructuredContextGeneralItem(
key="topic", value="Diabetes management consultation"
),
StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
StructuredContextGeneralItem(key="patient", value="Mr. David Miller"),
StructuredContextGeneralItem(
key="organization", value="St John's Hospital"
),
],
text="Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms=[
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms=[
StructuredContextTranslationTerm(
source="Mr. Smith", target="Sr. Smith"
),
StructuredContextTranslationTerm(
source="St John's", target="St John's"
),
StructuredContextTranslationTerm(source="stroke", target="ictus"),
],
),
#
# Use endpointing to detect when the speaker stops.
# It finalizes all non-final tokens right away, minimizing latency.
# See: soniox.com/docs/stt/rt/endpoint-detection
enable_endpoint_detection=True,
)
# Audio format.
# See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if audio_format == "auto":
# Set to "auto" to let Soniox detect the audio format automatically.
config.audio_format = "auto"
elif audio_format == "pcm_s16le":
# Example of a raw audio format; Soniox supports many others as well.
config.audio_format = "pcm_s16le"
config.sample_rate = 16000
config.num_channels = 1
else:
raise ValueError(f"Unsupported audio_format: {audio_format}")
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config.translation = TranslationConfig(
type="one_way",
target_language="es",
)
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config.translation = TranslationConfig(
type="two_way",
language_a="en",
language_b="es",
)
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
def run_session(
client: SonioxClient,
audio_path: str,
audio_format: str,
translation: str,
) -> None:
config = get_config(audio_format, translation)
print("Connecting to Soniox...")
with client.realtime.stt.connect(config=config) as session:
final_tokens = []
start_audio_thread(session, throttle_audio(audio_path, delay_seconds=0.1))
print("Session started.")
for event in session.receive_events():
# Error from server.
# See: https://soniox.com/docs/stt/api-reference/websocket-api#error-response
if event.error_code:
print(f"Error: {event.error_code} - {event.error_message}")
# Parse tokens from current response.
non_final_tokens = []
for token in event.tokens:
if token.is_final:
# Final tokens are returned once and should be appended to final_tokens.
final_tokens.append(token)
else:
# Non-final tokens update as more audio arrives; reset them on every response.
non_final_tokens.append(token)
# Render tokens.
print(render_tokens(final_tokens, non_final_tokens))
# Session finished.
if event.finished:
print("Session finished.")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--audio_path", type=str)
parser.add_argument("--audio_format", default="auto")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if api_key is None:
raise RuntimeError("Missing SONIOX_API_KEY.")
client = SonioxClient()
run_session(client, args.audio_path, args.audio_format, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# Transcribe a live audio stream
python soniox_sdk_realtime.py --audio_path ../assets/coffee_shop.mp3
# Transcribe a raw audio live stream
python soniox_sdk_realtime.py --audio_path ../assets/coffee_shop.pcm_s16le --audio_format pcm_s16le
```
See on GitHub: [soniox\_sdk\_realtime.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs_sdk/soniox_sdk_realtime.js).
```
import { RealtimeUtteranceBuffer, SonioxNodeClient } from "@soniox/node";
import fs from "fs";
import { parseArgs } from "node:util";
import process from "process";
// Initialize the client.
// The API key is read from the SONIOX_API_KEY environment variable.
const client = new SonioxNodeClient();
// Get session config based on CLI arguments.
function getSessionConfig(audioFormat, translation) {
const config = {
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-rt-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Use endpointing to detect when the speaker stops.
// It finalizes all non-final tokens right away, minimizing latency.
// See: soniox.com/docs/stt/rt/endpoint-detection
enable_endpoint_detection: true,
};
// Audio format.
// See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if (audioFormat === "auto") {
config.audio_format = "auto";
} else if (audioFormat === "pcm_s16le") {
config.audio_format = "pcm_s16le";
config.sample_rate = 16000;
config.num_channels = 1;
} else {
throw new Error(`Unsupported audio_format: ${audioFormat}`);
}
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
config.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
config.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return config;
}
// Render a single utterance as readable text.
function renderUtterance(utterance) {
return utterance.segments
.map((segment) => {
const speaker = segment.speaker ? `Speaker ${segment.speaker}:` : "";
const isTranslation =
segment.tokens[0]?.translation_status === "translation";
const lang = segment.language
? `${isTranslation ? "[Translation] " : ""}[${segment.language}]`
: "";
return `${speaker} ${lang} ${segment.text.trimStart()}`;
})
.join("\n");
}
async function runSession(audioPath, audioFormat, translation) {
const config = getSessionConfig(audioFormat, translation);
// Create a real-time STT session.
const session = client.realtime.stt(config);
// Utterance buffer collects tokens and flushes complete utterances on endpoints.
const buffer = new RealtimeUtteranceBuffer();
// Feed every result into the buffer.
session.on("result", (result) => {
buffer.addResult(result);
});
// When an endpoint is detected, flush the buffer into a complete utterance.
session.on("endpoint", () => {
const utterance = buffer.markEndpoint();
if (utterance) {
console.log(renderUtterance(utterance));
}
});
session.on("finished", () => {
// Flush any remaining tokens after the session ends.
const utterance = buffer.markEndpoint();
if (utterance) {
console.log(renderUtterance(utterance));
}
console.log("Session finished.");
});
session.on("error", (err) => {
console.error("Session error:", err);
});
// Connect to the Soniox realtime API.
console.log("Connecting to Soniox...");
await session.connect();
console.log("Session started.");
// Stream the audio file and finish when done.
await session.sendStream(
fs.createReadStream(audioPath, { highWaterMark: 3840 }),
{ pace_ms: 120, finish: true },
);
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_path: { type: "string" },
audio_format: { type: "string", default: "auto" },
translation: { type: "string", default: "none" },
},
});
if (!argv.audio_path) {
throw new Error("Missing --audio_path argument.");
}
await runSession(argv.audio_path, argv.audio_format, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# Transcribe a live audio stream
node soniox_sdk_realtime.js --audio_path ../assets/coffee_shop.mp3
# Transcribe a raw audio live stream
node soniox_sdk_realtime.js --audio_path ../assets/coffee_shop.pcm_s16le --audio_format pcm_s16le
```
{/* NOTE: Empty tag is needed so code block renders correctly */}
See on GitHub: [soniox\_realtime.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python/soniox_realtime.py).
```
import json
import os
import threading
import time
import argparse
from typing import Optional
from websockets import ConnectionClosedOK
from websockets.sync.client import connect
SONIOX_WEBSOCKET_URL = "wss://stt-rt.soniox.com/transcribe-websocket"
# Get Soniox STT config.
def get_config(api_key: str, audio_format: str, translation: str) -> dict:
config = {
# Get your API key at console.soniox.com, then run: export SONIOX_API_KEY=
"api_key": api_key,
#
# Select the model to use.
# See: soniox.com/docs/stt/models
"model": "stt-rt-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
"language_hints": ["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
"enable_language_identification": True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
"enable_speaker_diarization": True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
"context": {
"general": [
{"key": "domain", "value": "Healthcare"},
{"key": "topic", "value": "Diabetes management consultation"},
{"key": "doctor", "value": "Dr. Martha Smith"},
{"key": "patient", "value": "Mr. David Miller"},
{"key": "organization", "value": "St John's Hospital"},
],
"text": "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
"terms": [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
"translation_terms": [
{"source": "Mr. Smith", "target": "Sr. Smith"},
{"source": "St John's", "target": "St John's"},
{"source": "stroke", "target": "ictus"},
],
},
#
# Use endpointing to detect when the speaker stops.
# It finalizes all non-final tokens right away, minimizing latency.
# See: soniox.com/docs/stt/rt/endpoint-detection
"enable_endpoint_detection": True,
}
# Audio format.
# See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if audio_format == "auto":
# Set to "auto" to let Soniox detect the audio format automatically.
config["audio_format"] = "auto"
elif audio_format == "pcm_s16le":
# Example of a raw audio format; Soniox supports many others as well.
config["audio_format"] = "pcm_s16le"
config["sample_rate"] = 16000
config["num_channels"] = 1
else:
raise ValueError(f"Unsupported audio_format: {audio_format}")
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config["translation"] = {
"type": "one_way",
"target_language": "es",
}
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config["translation"] = {
"type": "two_way",
"language_a": "en",
"language_b": "es",
}
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
# Read the audio file and send its bytes to the websocket.
def stream_audio(audio_path: str, ws) -> None:
with open(audio_path, "rb") as fh:
while True:
data = fh.read(3840)
if len(data) == 0:
break
ws.send(data)
# Sleep for 120 ms to simulate real-time streaming.
time.sleep(0.120)
# Empty string signals end-of-audio to the server
ws.send("")
# Convert tokens into a readable transcript.
def render_tokens(final_tokens: list[dict], non_final_tokens: list[dict]) -> str:
text_parts: list[str] = []
current_speaker: Optional[str] = None
current_language: Optional[str] = None
# Process all tokens in order.
for token in final_tokens + non_final_tokens:
text = token["text"]
speaker = token.get("speaker")
language = token.get("language")
is_translation = token.get("translation_status") == "translation"
# Speaker changed -> add a speaker tag.
if speaker is not None and speaker != current_speaker:
if current_speaker is not None:
text_parts.append("\n\n")
current_speaker = speaker
current_language = None # Reset language on speaker changes.
text_parts.append(f"Speaker {current_speaker}:")
# Language changed -> add a language or translation tag.
if language is not None and language != current_language:
current_language = language
prefix = "[Translation] " if is_translation else ""
text_parts.append(f"\n{prefix}[{current_language}] ")
text = text.lstrip()
text_parts.append(text)
text_parts.append("\n===============================")
return "".join(text_parts)
def run_session(
api_key: str,
audio_path: str,
audio_format: str,
translation: str,
) -> None:
config = get_config(api_key, audio_format, translation)
print("Connecting to Soniox...")
with connect(SONIOX_WEBSOCKET_URL) as ws:
# Send first request with config.
ws.send(json.dumps(config))
# Start streaming audio in the background.
threading.Thread(
target=stream_audio,
args=(audio_path, ws),
daemon=True,
).start()
print("Session started.")
final_tokens: list[dict] = []
try:
while True:
message = ws.recv()
res = json.loads(message)
# Error from server.
# See: https://soniox.com/docs/stt/api-reference/websocket-api#error-response
if res.get("error_code") is not None:
print(f"Error: {res['error_code']} - {res['error_message']}")
break
# Parse tokens from current response.
non_final_tokens: list[dict] = []
for token in res.get("tokens", []):
if token.get("text"):
if token.get("is_final"):
# Final tokens are returned once and should be appended to final_tokens.
final_tokens.append(token)
else:
# Non-final tokens update as more audio arrives; reset them on every response.
non_final_tokens.append(token)
# Render tokens.
text = render_tokens(final_tokens, non_final_tokens)
print(text)
# Session finished.
if res.get("finished"):
print("Session finished.")
except ConnectionClosedOK:
# Normal, server closed after finished.
pass
except KeyboardInterrupt:
print("\nInterrupted by user.")
except Exception as e:
print(f"Error: {e}")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--audio_path", type=str)
parser.add_argument("--audio_format", default="auto")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if api_key is None:
raise RuntimeError("Missing SONIOX_API_KEY.")
run_session(api_key, args.audio_path, args.audio_format, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# Transcribe a live audio stream
python soniox_realtime.py --audio_path ../assets/coffee_shop.mp3
# Transcribe a raw audio live stream
python soniox_realtime.py --audio_path ../assets/coffee_shop.pcm_s16le --audio_format pcm_s16le
```
See on GitHub: [soniox\_realtime.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs/soniox_realtime.js).
```
import fs from "fs";
import WebSocket from "ws";
import { parseArgs } from "node:util";
const SONIOX_WEBSOCKET_URL = "wss://stt-rt.soniox.com/transcribe-websocket";
// Get Soniox STT config
function getConfig(apiKey, audioFormat, translation) {
const config = {
// Get your API key at console.soniox.com, then run: export SONIOX_API_KEY=
api_key: apiKey,
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-rt-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Use endpointing to detect when the speaker stops.
// It finalizes all non-final tokens right away, minimizing latency.
// See: soniox.com/docs/stt/rt/endpoint-detection
enable_endpoint_detection: true,
};
// Audio format.
// See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if (audioFormat === "auto") {
// Set to "auto" to let Soniox detect the audio format automatically.
config.audio_format = "auto";
} else if (audioFormat === "pcm_s16le") {
// Example of a raw audio format; Soniox supports many others as well.
config.audio_format = "pcm_s16le";
config.sample_rate = 16000;
config.num_channels = 1;
} else {
throw new Error(`Unsupported audio_format: ${audioFormat}`);
}
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
// Translates all languages into the target language.
config.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
// Translates from language_a to language_b and back from language_b to language_a.
config.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return config;
}
// Read the audio file and send its bytes to the websocket.
async function streamAudio(audioPath, ws) {
const stream = fs.createReadStream(audioPath, { highWaterMark: 3840 });
for await (const chunk of stream) {
ws.send(chunk);
// Sleep for 120 ms to simulate real-time streaming.
await new Promise((res) => setTimeout(res, 120));
}
// Empty string signals end-of-audio to the server
ws.send("");
}
// Convert tokens into readable transcript
function renderTokens(finalTokens, nonFinalTokens) {
let textParts = [];
let currentSpeaker = null;
let currentLanguage = null;
const allTokens = [...finalTokens, ...nonFinalTokens];
// Process all tokens in order.
for (const token of allTokens) {
let { text, speaker, language } = token;
const isTranslation = token.translation_status === "translation";
// Speaker changed -> add a speaker tag.
if (speaker && speaker !== currentSpeaker) {
if (currentSpeaker !== null) textParts.push("\n\n");
currentSpeaker = speaker;
currentLanguage = null; // Reset language on speaker changes.
textParts.push(`Speaker ${currentSpeaker}:`);
}
// Language changed -> add a language or translation tag.
if (language && language !== currentLanguage) {
currentLanguage = language;
const prefix = isTranslation ? "[Translation] " : "";
textParts.push(`\n${prefix}[${currentLanguage}] `);
text = text.trimStart();
}
textParts.push(text);
}
textParts.push("\n===============================");
return textParts.join("");
}
function runSession(apiKey, audioPath, audioFormat, translation) {
const config = getConfig(apiKey, audioFormat, translation);
console.log("Connecting to Soniox...");
const ws = new WebSocket(SONIOX_WEBSOCKET_URL);
let finalTokens = [];
ws.on("open", () => {
// Send first request with config.
ws.send(JSON.stringify(config));
// Start streaming audio in the background.
streamAudio(audioPath, ws).catch((err) =>
console.error("Audio stream error:", err),
);
console.log("Session started.");
});
ws.on("message", (msg) => {
const res = JSON.parse(msg.toString());
// Error from server.
// See: https://soniox.com/docs/stt/api-reference/websocket-api#error-response
if (res.error_code) {
console.error(`Error: ${res.error_code} - ${res.error_message}`);
ws.close();
return;
}
// Parse tokens from current response.
let nonFinalTokens = [];
if (res.tokens) {
for (const token of res.tokens) {
if (token.text) {
if (token.is_final) {
// Final tokens are returned once and should be appended to final_tokens.
finalTokens.push(token);
} else {
// Non-final tokens update as more audio arrives; reset them on every response.
nonFinalTokens.push(token);
}
}
}
}
// Render tokens.
const text = renderTokens(finalTokens, nonFinalTokens);
console.log(text);
// Session finished.
if (res.finished) {
console.log("Session finished.");
ws.close();
}
});
ws.on("error", (err) => console.error("WebSocket error:", err));
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_path: { type: "string", required: true },
audio_format: { type: "string", default: "auto" },
translation: { type: "string", default: "none" },
},
});
const apiKey = process.env.SONIOX_API_KEY;
if (!apiKey) {
throw new Error(
"Missing SONIOX_API_KEY.\n" +
"1. Get your API key at https://console.soniox.com\n" +
"2. Run: export SONIOX_API_KEY=",
);
}
runSession(apiKey, argv.audio_path, argv.audio_format, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# Transcribe a live audio stream
node soniox_realtime.js --audio_path ../assets/coffee_shop.mp3
# Transcribe a raw audio live stream
node soniox_realtime.js --audio_path ../assets/coffee_shop.pcm_s16le --audio_format pcm_s16le
```
# Real-time translation
URL: /stt/rt/real-time-translation
Learn how real-time translation works.
import { CodeBlock, Pre } from "@/components/codeblock";
import { DynamicCodeBlock } from "@/components/dynamic-codeblock";
import { LuTriangleAlert } from "react-icons/lu";
## Overview
Soniox Speech-to-Text AI introduces a new kind of translation designed
for low latency applications. Unlike traditional systems that wait until
the end of a sentence before producing a translation, Soniox translates
**mid-sentence**—as words are spoken. This innovation enables a completely new
experience: you can follow conversations across languages in real-time, without
delays.
***
## How it works
* **Always transcribes speech:** every spoken word is transcribed, regardless of translation settings.
* **Translation:** choose between:
* **One-way translation** → translate all speech into a single target language.
* **Two-way translation** → translate back and forth between two languages.
* **Low latency:** translations are streamed in chunks, balancing speed and accuracy.
* **Unified token stream:** transcriptions and translations arrive together, labeled for easy handling.
### Example
Speaker says:
```json
"Hello everyone, thank you for joining us today."
```
The token stream unfolds like this:
```json
[Transcription] Hello everyone,
[Translation] Bonjour à tous,
[Transcription] thank you
[Translation] merci
[Transcription] for joining us
[Translation] de nous avoir rejoints
[Transcription] today.
[Translation] aujourd'hui.
```
Notice how:
* **Transcription tokens arrive first,** as soon as words are recognized.
* **Translation tokens follow,** chunk by chunk, without waiting for the full sentence.
* Developers can display tokens immediately for **low latency transcription and translation.**
***
## Translation modes
Soniox provides two translation modes: translate all speech into a single target language, or enable seamless two-way conversations between languages.
### One-way translation
Translate **all spoken languages** into a single target language.
**Example: translate everything into French**
```json
{
"translation": {
"type": "one_way",
"target_language": "fr"
}
}
```
* All speech is **transcribed.**
* All speech is **translated into French.**
### Two-way translation
Translate **back and forth** between two specified languages.
**Example: Japanese ⟷ Korean**
```json
{
"translation": {
"type": "two_way",
"language_a": "ja",
"language_b": "ko"
}
}
```
* All speech is **transcribed.**
* Japanese speech is **translated into Korean.**
* Korean speech is **translated into Japanese.**
***
## Token format
Each result (transcription or translation) is returned as a **token** with clear metadata.
| Field | Description |
| -------------------- | ---------------------------------------------------------------------------------------------------- |
| `text` | Token text |
| `translation_status` | `"none"` (not translated) `"original"` (spoken text) `"translation"` (translated text) |
| `language` | Language of the token |
| `source_language` | Original language (only for translated tokens) |
### Example: two-way translation
Two way translation between English (`en`) and German (`de`).
**Config**
```json
{
"translation": {
"type": "two_way",
"language_a": "en",
"language_b": "de"
}
}
```
**Text**
```json
[en] Good morning
[de] Guten Morgen
[de] Wie geht’s?
[en] How are you?
[fr] Bonjour à tous
(fr is only transcribed, not translated)
[en] I’m fine, thanks.
[de] Mir geht’s gut, danke.
```
**Tokens**
{/* NOTE(miha): ``` tags put this code into scrollable view, that we didn't want */}
Transcription and translation chunks follow each
other, but tokens are not 1-to-1 mapped and may not align.
***
## Supported languages
**All pairs supported** — translate between any two [supported languages](/stt/concepts/supported-languages).
***
## Timestamps
* **Spoken tokens** (`translation_status: "none"` or `"original"`) include timestamps (`start_ms`, `end_ms`) that align to the exact position in the audio.
* **Translated tokens do not** include timestamps, since they are generated
immediately after the spoken tokens and directly follow their timing.
This way, you can always align transcripts to the original audio, while translations stream naturally in sequence.
***
## Code example
**Prerequisite:** Complete the steps in [Get started](/stt/get-started).
See on GitHub: [soniox\_sdk\_realtime.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python_sdk/soniox_sdk_realtime.py).
```
import os
import argparse
from typing import Optional
from soniox import SonioxClient
from soniox.types import (
RealtimeSTTConfig,
StructuredContext,
TranslationConfig,
StructuredContextGeneralItem,
StructuredContextTranslationTerm,
)
from soniox.utils import render_tokens, start_audio_thread, throttle_audio
def get_config(audio_format: str, translation: Optional[str]) -> RealtimeSTTConfig:
config = RealtimeSTTConfig(
# Select the model to use.
# See: soniox.com/docs/stt/models
model="stt-rt-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
language_hints=["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification=True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization=True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
context=StructuredContext(
general=[
StructuredContextGeneralItem(key="domain", value="Healthcare"),
StructuredContextGeneralItem(
key="topic", value="Diabetes management consultation"
),
StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
StructuredContextGeneralItem(key="patient", value="Mr. David Miller"),
StructuredContextGeneralItem(
key="organization", value="St John's Hospital"
),
],
text="Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms=[
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms=[
StructuredContextTranslationTerm(
source="Mr. Smith", target="Sr. Smith"
),
StructuredContextTranslationTerm(
source="St John's", target="St John's"
),
StructuredContextTranslationTerm(source="stroke", target="ictus"),
],
),
#
# Use endpointing to detect when the speaker stops.
# It finalizes all non-final tokens right away, minimizing latency.
# See: soniox.com/docs/stt/rt/endpoint-detection
enable_endpoint_detection=True,
)
# Audio format.
# See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if audio_format == "auto":
# Set to "auto" to let Soniox detect the audio format automatically.
config.audio_format = "auto"
elif audio_format == "pcm_s16le":
# Example of a raw audio format; Soniox supports many others as well.
config.audio_format = "pcm_s16le"
config.sample_rate = 16000
config.num_channels = 1
else:
raise ValueError(f"Unsupported audio_format: {audio_format}")
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config.translation = TranslationConfig(
type="one_way",
target_language="es",
)
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config.translation = TranslationConfig(
type="two_way",
language_a="en",
language_b="es",
)
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
def run_session(
client: SonioxClient,
audio_path: str,
audio_format: str,
translation: str,
) -> None:
config = get_config(audio_format, translation)
print("Connecting to Soniox...")
with client.realtime.stt.connect(config=config) as session:
final_tokens = []
start_audio_thread(session, throttle_audio(audio_path, delay_seconds=0.1))
print("Session started.")
for event in session.receive_events():
# Error from server.
# See: https://soniox.com/docs/stt/api-reference/websocket-api#error-response
if event.error_code:
print(f"Error: {event.error_code} - {event.error_message}")
# Parse tokens from current response.
non_final_tokens = []
for token in event.tokens:
if token.is_final:
# Final tokens are returned once and should be appended to final_tokens.
final_tokens.append(token)
else:
# Non-final tokens update as more audio arrives; reset them on every response.
non_final_tokens.append(token)
# Render tokens.
print(render_tokens(final_tokens, non_final_tokens))
# Session finished.
if event.finished:
print("Session finished.")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--audio_path", type=str)
parser.add_argument("--audio_format", default="auto")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if api_key is None:
raise RuntimeError("Missing SONIOX_API_KEY.")
client = SonioxClient()
run_session(client, args.audio_path, args.audio_format, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# One-way translation of a live audio stream
python soniox_sdk_realtime.py --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a live audio stream
python soniox_sdk_realtime.py --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
See on GitHub: [soniox\_sdk\_realtime.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs_sdk/soniox_sdk_realtime.js).
```
import { RealtimeUtteranceBuffer, SonioxNodeClient } from "@soniox/node";
import fs from "fs";
import { parseArgs } from "node:util";
import process from "process";
// Initialize the client.
// The API key is read from the SONIOX_API_KEY environment variable.
const client = new SonioxNodeClient();
// Get session config based on CLI arguments.
function getSessionConfig(audioFormat, translation) {
const config = {
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-rt-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Use endpointing to detect when the speaker stops.
// It finalizes all non-final tokens right away, minimizing latency.
// See: soniox.com/docs/stt/rt/endpoint-detection
enable_endpoint_detection: true,
};
// Audio format.
// See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if (audioFormat === "auto") {
config.audio_format = "auto";
} else if (audioFormat === "pcm_s16le") {
config.audio_format = "pcm_s16le";
config.sample_rate = 16000;
config.num_channels = 1;
} else {
throw new Error(`Unsupported audio_format: ${audioFormat}`);
}
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
config.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
config.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return config;
}
// Render a single utterance as readable text.
function renderUtterance(utterance) {
return utterance.segments
.map((segment) => {
const speaker = segment.speaker ? `Speaker ${segment.speaker}:` : "";
const isTranslation =
segment.tokens[0]?.translation_status === "translation";
const lang = segment.language
? `${isTranslation ? "[Translation] " : ""}[${segment.language}]`
: "";
return `${speaker} ${lang} ${segment.text.trimStart()}`;
})
.join("\n");
}
async function runSession(audioPath, audioFormat, translation) {
const config = getSessionConfig(audioFormat, translation);
// Create a real-time STT session.
const session = client.realtime.stt(config);
// Utterance buffer collects tokens and flushes complete utterances on endpoints.
const buffer = new RealtimeUtteranceBuffer();
// Feed every result into the buffer.
session.on("result", (result) => {
buffer.addResult(result);
});
// When an endpoint is detected, flush the buffer into a complete utterance.
session.on("endpoint", () => {
const utterance = buffer.markEndpoint();
if (utterance) {
console.log(renderUtterance(utterance));
}
});
session.on("finished", () => {
// Flush any remaining tokens after the session ends.
const utterance = buffer.markEndpoint();
if (utterance) {
console.log(renderUtterance(utterance));
}
console.log("Session finished.");
});
session.on("error", (err) => {
console.error("Session error:", err);
});
// Connect to the Soniox realtime API.
console.log("Connecting to Soniox...");
await session.connect();
console.log("Session started.");
// Stream the audio file and finish when done.
await session.sendStream(
fs.createReadStream(audioPath, { highWaterMark: 3840 }),
{ pace_ms: 120, finish: true },
);
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_path: { type: "string" },
audio_format: { type: "string", default: "auto" },
translation: { type: "string", default: "none" },
},
});
if (!argv.audio_path) {
throw new Error("Missing --audio_path argument.");
}
await runSession(argv.audio_path, argv.audio_format, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# One-way translation of a live audio stream
node soniox_sdk_realtime.js --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a live audio stream
node soniox_sdk_realtime.js --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
See on GitHub: [soniox\_realtime.py](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/python/soniox_realtime.py).
```
import json
import os
import threading
import time
import argparse
from typing import Optional
from websockets import ConnectionClosedOK
from websockets.sync.client import connect
SONIOX_WEBSOCKET_URL = "wss://stt-rt.soniox.com/transcribe-websocket"
# Get Soniox STT config.
def get_config(api_key: str, audio_format: str, translation: str) -> dict:
config = {
# Get your API key at console.soniox.com, then run: export SONIOX_API_KEY=
"api_key": api_key,
#
# Select the model to use.
# See: soniox.com/docs/stt/models
"model": "stt-rt-v4",
#
# Set language hints when possible to significantly improve accuracy.
# See: soniox.com/docs/stt/concepts/language-hints
"language_hints": ["en", "es"],
#
# Enable language identification. Each token will include a "language" field.
# See: soniox.com/docs/stt/concepts/language-identification
"enable_language_identification": True,
#
# Enable speaker diarization. Each token will include a "speaker" field.
# See: soniox.com/docs/stt/concepts/speaker-diarization
"enable_speaker_diarization": True,
#
# Set context to help the model understand your domain, recognize important terms,
# and apply custom vocabulary and translation preferences.
# See: soniox.com/docs/stt/concepts/context
"context": {
"general": [
{"key": "domain", "value": "Healthcare"},
{"key": "topic", "value": "Diabetes management consultation"},
{"key": "doctor", "value": "Dr. Martha Smith"},
{"key": "patient", "value": "Mr. David Miller"},
{"key": "organization", "value": "St John's Hospital"},
],
"text": "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
"terms": [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
"translation_terms": [
{"source": "Mr. Smith", "target": "Sr. Smith"},
{"source": "St John's", "target": "St John's"},
{"source": "stroke", "target": "ictus"},
],
},
#
# Use endpointing to detect when the speaker stops.
# It finalizes all non-final tokens right away, minimizing latency.
# See: soniox.com/docs/stt/rt/endpoint-detection
"enable_endpoint_detection": True,
}
# Audio format.
# See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if audio_format == "auto":
# Set to "auto" to let Soniox detect the audio format automatically.
config["audio_format"] = "auto"
elif audio_format == "pcm_s16le":
# Example of a raw audio format; Soniox supports many others as well.
config["audio_format"] = "pcm_s16le"
config["sample_rate"] = 16000
config["num_channels"] = 1
else:
raise ValueError(f"Unsupported audio_format: {audio_format}")
# Translation options.
# See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if translation == "none":
pass
elif translation == "one_way":
# Translates all languages into the target language.
config["translation"] = {
"type": "one_way",
"target_language": "es",
}
elif translation == "two_way":
# Translates from language_a to language_b and back from language_b to language_a.
config["translation"] = {
"type": "two_way",
"language_a": "en",
"language_b": "es",
}
else:
raise ValueError(f"Unsupported translation: {translation}")
return config
# Read the audio file and send its bytes to the websocket.
def stream_audio(audio_path: str, ws) -> None:
with open(audio_path, "rb") as fh:
while True:
data = fh.read(3840)
if len(data) == 0:
break
ws.send(data)
# Sleep for 120 ms to simulate real-time streaming.
time.sleep(0.120)
# Empty string signals end-of-audio to the server
ws.send("")
# Convert tokens into a readable transcript.
def render_tokens(final_tokens: list[dict], non_final_tokens: list[dict]) -> str:
text_parts: list[str] = []
current_speaker: Optional[str] = None
current_language: Optional[str] = None
# Process all tokens in order.
for token in final_tokens + non_final_tokens:
text = token["text"]
speaker = token.get("speaker")
language = token.get("language")
is_translation = token.get("translation_status") == "translation"
# Speaker changed -> add a speaker tag.
if speaker is not None and speaker != current_speaker:
if current_speaker is not None:
text_parts.append("\n\n")
current_speaker = speaker
current_language = None # Reset language on speaker changes.
text_parts.append(f"Speaker {current_speaker}:")
# Language changed -> add a language or translation tag.
if language is not None and language != current_language:
current_language = language
prefix = "[Translation] " if is_translation else ""
text_parts.append(f"\n{prefix}[{current_language}] ")
text = text.lstrip()
text_parts.append(text)
text_parts.append("\n===============================")
return "".join(text_parts)
def run_session(
api_key: str,
audio_path: str,
audio_format: str,
translation: str,
) -> None:
config = get_config(api_key, audio_format, translation)
print("Connecting to Soniox...")
with connect(SONIOX_WEBSOCKET_URL) as ws:
# Send first request with config.
ws.send(json.dumps(config))
# Start streaming audio in the background.
threading.Thread(
target=stream_audio,
args=(audio_path, ws),
daemon=True,
).start()
print("Session started.")
final_tokens: list[dict] = []
try:
while True:
message = ws.recv()
res = json.loads(message)
# Error from server.
# See: https://soniox.com/docs/stt/api-reference/websocket-api#error-response
if res.get("error_code") is not None:
print(f"Error: {res['error_code']} - {res['error_message']}")
break
# Parse tokens from current response.
non_final_tokens: list[dict] = []
for token in res.get("tokens", []):
if token.get("text"):
if token.get("is_final"):
# Final tokens are returned once and should be appended to final_tokens.
final_tokens.append(token)
else:
# Non-final tokens update as more audio arrives; reset them on every response.
non_final_tokens.append(token)
# Render tokens.
text = render_tokens(final_tokens, non_final_tokens)
print(text)
# Session finished.
if res.get("finished"):
print("Session finished.")
except ConnectionClosedOK:
# Normal, server closed after finished.
pass
except KeyboardInterrupt:
print("\nInterrupted by user.")
except Exception as e:
print(f"Error: {e}")
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--audio_path", type=str)
parser.add_argument("--audio_format", default="auto")
parser.add_argument("--translation", default="none")
args = parser.parse_args()
api_key = os.environ.get("SONIOX_API_KEY")
if api_key is None:
raise RuntimeError("Missing SONIOX_API_KEY.")
run_session(api_key, args.audio_path, args.audio_format, args.translation)
if __name__ == "__main__":
main()
```
```sh title="Terminal"
# One-way translation of a live audio stream
python soniox_realtime.py --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a live audio stream
python soniox_realtime.py --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
See on GitHub: [soniox\_realtime.js](https://github.com/soniox/soniox_examples/blob/master/speech_to_text/nodejs/soniox_realtime.js).
```
import fs from "fs";
import WebSocket from "ws";
import { parseArgs } from "node:util";
const SONIOX_WEBSOCKET_URL = "wss://stt-rt.soniox.com/transcribe-websocket";
// Get Soniox STT config
function getConfig(apiKey, audioFormat, translation) {
const config = {
// Get your API key at console.soniox.com, then run: export SONIOX_API_KEY=
api_key: apiKey,
// Select the model to use.
// See: soniox.com/docs/stt/models
model: "stt-rt-v4",
// Set language hints when possible to significantly improve accuracy.
// See: soniox.com/docs/stt/concepts/language-hints
language_hints: ["en", "es"],
// Enable language identification. Each token will include a "language" field.
// See: soniox.com/docs/stt/concepts/language-identification
enable_language_identification: true,
// Enable speaker diarization. Each token will include a "speaker" field.
// See: soniox.com/docs/stt/concepts/speaker-diarization
enable_speaker_diarization: true,
// Set context to help the model understand your domain, recognize important terms,
// and apply custom vocabulary and translation preferences.
// See: soniox.com/docs/stt/concepts/context
context: {
general: [
{ key: "domain", value: "Healthcare" },
{ key: "topic", value: "Diabetes management consultation" },
{ key: "doctor", value: "Dr. Martha Smith" },
{ key: "patient", value: "Mr. David Miller" },
{ key: "organization", value: "St John's Hospital" },
],
text: "Mr. David Miller visited his healthcare provider last month for a routine follow-up related to diabetes care. The clinician reviewed his recent test results, noted improved glucose levels, and adjusted his medication schedule accordingly. They also discussed meal planning strategies and scheduled the next check-up for early spring.",
terms: [
"Celebrex",
"Zyrtec",
"Xanax",
"Prilosec",
"Amoxicillin Clavulanate Potassium",
],
translation_terms: [
{ source: "Mr. Smith", target: "Sr. Smith" },
{ source: "St John's", target: "St John's" },
{ source: "stroke", target: "ictus" },
],
},
// Use endpointing to detect when the speaker stops.
// It finalizes all non-final tokens right away, minimizing latency.
// See: soniox.com/docs/stt/rt/endpoint-detection
enable_endpoint_detection: true,
};
// Audio format.
// See: soniox.com/docs/stt/rt/real-time-transcription#audio-formats
if (audioFormat === "auto") {
// Set to "auto" to let Soniox detect the audio format automatically.
config.audio_format = "auto";
} else if (audioFormat === "pcm_s16le") {
// Example of a raw audio format; Soniox supports many others as well.
config.audio_format = "pcm_s16le";
config.sample_rate = 16000;
config.num_channels = 1;
} else {
throw new Error(`Unsupported audio_format: ${audioFormat}`);
}
// Translation options.
// See: soniox.com/docs/stt/rt/real-time-translation#translation-modes
if (translation === "one_way") {
// Translates all languages into the target language.
config.translation = { type: "one_way", target_language: "es" };
} else if (translation === "two_way") {
// Translates from language_a to language_b and back from language_b to language_a.
config.translation = {
type: "two_way",
language_a: "en",
language_b: "es",
};
} else if (translation !== "none") {
throw new Error(`Unsupported translation: ${translation}`);
}
return config;
}
// Read the audio file and send its bytes to the websocket.
async function streamAudio(audioPath, ws) {
const stream = fs.createReadStream(audioPath, { highWaterMark: 3840 });
for await (const chunk of stream) {
ws.send(chunk);
// Sleep for 120 ms to simulate real-time streaming.
await new Promise((res) => setTimeout(res, 120));
}
// Empty string signals end-of-audio to the server
ws.send("");
}
// Convert tokens into readable transcript
function renderTokens(finalTokens, nonFinalTokens) {
let textParts = [];
let currentSpeaker = null;
let currentLanguage = null;
const allTokens = [...finalTokens, ...nonFinalTokens];
// Process all tokens in order.
for (const token of allTokens) {
let { text, speaker, language } = token;
const isTranslation = token.translation_status === "translation";
// Speaker changed -> add a speaker tag.
if (speaker && speaker !== currentSpeaker) {
if (currentSpeaker !== null) textParts.push("\n\n");
currentSpeaker = speaker;
currentLanguage = null; // Reset language on speaker changes.
textParts.push(`Speaker ${currentSpeaker}:`);
}
// Language changed -> add a language or translation tag.
if (language && language !== currentLanguage) {
currentLanguage = language;
const prefix = isTranslation ? "[Translation] " : "";
textParts.push(`\n${prefix}[${currentLanguage}] `);
text = text.trimStart();
}
textParts.push(text);
}
textParts.push("\n===============================");
return textParts.join("");
}
function runSession(apiKey, audioPath, audioFormat, translation) {
const config = getConfig(apiKey, audioFormat, translation);
console.log("Connecting to Soniox...");
const ws = new WebSocket(SONIOX_WEBSOCKET_URL);
let finalTokens = [];
ws.on("open", () => {
// Send first request with config.
ws.send(JSON.stringify(config));
// Start streaming audio in the background.
streamAudio(audioPath, ws).catch((err) =>
console.error("Audio stream error:", err),
);
console.log("Session started.");
});
ws.on("message", (msg) => {
const res = JSON.parse(msg.toString());
// Error from server.
// See: https://soniox.com/docs/stt/api-reference/websocket-api#error-response
if (res.error_code) {
console.error(`Error: ${res.error_code} - ${res.error_message}`);
ws.close();
return;
}
// Parse tokens from current response.
let nonFinalTokens = [];
if (res.tokens) {
for (const token of res.tokens) {
if (token.text) {
if (token.is_final) {
// Final tokens are returned once and should be appended to final_tokens.
finalTokens.push(token);
} else {
// Non-final tokens update as more audio arrives; reset them on every response.
nonFinalTokens.push(token);
}
}
}
}
// Render tokens.
const text = renderTokens(finalTokens, nonFinalTokens);
console.log(text);
// Session finished.
if (res.finished) {
console.log("Session finished.");
ws.close();
}
});
ws.on("error", (err) => console.error("WebSocket error:", err));
}
async function main() {
const { values: argv } = parseArgs({
options: {
audio_path: { type: "string", required: true },
audio_format: { type: "string", default: "auto" },
translation: { type: "string", default: "none" },
},
});
const apiKey = process.env.SONIOX_API_KEY;
if (!apiKey) {
throw new Error(
"Missing SONIOX_API_KEY.\n" +
"1. Get your API key at https://console.soniox.com\n" +
"2. Run: export SONIOX_API_KEY=",
);
}
runSession(apiKey, argv.audio_path, argv.audio_format, argv.translation);
}
main().catch((err) => {
console.error("Error:", err.message);
process.exit(1);
});
```
```sh title="Terminal"
# One-way translation of a live audio stream
node soniox_realtime.js --audio_path ../assets/coffee_shop.mp3 --translation one_way
# Two-way translation of a live audio stream
node soniox_realtime.js --audio_path ../assets/two_way_translation.mp3 --translation two_way
```
# Community integrations
URL: /stt/integrations/community-integrations
Discover useful Soniox integrations built by the community
import Image from "next/image";
## Overview
This page lists code repositories, SDKs and other useful tools built by our user community. We appreciate the investment in time and willingness to give back to the community from these developers.
Note that the projects linked here are community-built integrations and are not officially supported by Soniox. For any questions or support requests, please contact the respective developers or project maintainers.
***
## Agent Voice Response (AVR)
[Agent Voice Response](https://github.com/agentvoiceresponse) is the ultimate conversational AI platform for Asterisk PBX systems. Experience ultra-low latency speech-to-speech, advanced Voice Activity Detection, and intelligent noise suppression. Choose between cloud and local AI providers based on your needs. Perfect for FreePBX, Asterisk-based contact centers, and enterprise telephony solutions.
Integration wiki: [https://wiki.agentvoiceresponse.com/en/avr-soniox-speech-to-text](https://wiki.agentvoiceresponse.com/en/avr-soniox-speech-to-text).
***
## Go SDK (unofficial)
Unofficial Go SDK for the Soniox Speech-to-Text real-time WebSocket API. Enable Soniox real-time speech-to-text transcription and translation in your Go applications.
GitHub repository: [https://github.com/moxierobots/soniox-stt-go](https://github.com/moxierobots/soniox-stt-go).
***
# Integrations
URL: /stt/integrations
Explore Soniox Speech-to-Text integrations for real-time, multilingual speech recognition. Connect Soniox with LiveKit, LangChain, Twilio, Vercel AI SDK, and more.
Soniox Speech-to-Text integrates seamlessly with leading real-time communication platforms, AI frameworks, automation tools, and developer SDKs. These integrations make it easy to add high-accuracy, low-latency, multilingual speech recognition to live audio, voice agents, call centers, and AI-powered applications, without building everything from scratch.
Whether you’re streaming audio in real time, orchestrating AI workflows, or deploying speech recognition at scale, Soniox integrations let you move faster while maintaining enterprise-grade accuracy and performance.
Explore the available integrations to quickly connect Soniox Speech-to-Text to your existing stack and start transcribing speech in real time.
***
* **[LiveKit](/stt/integrations/livekit)**
* **[Pipecat](/stt/integrations/pipecat)**
* **[Vercel AI SDK](/stt/integrations/vercel-ai-sdk)**
* **[TanStack AI SDK](/stt/integrations/tanstack-ai-sdk)**
* **[LangChain (Python)](/stt/integrations/langchain/langchain)**
* **[LangChain.js (JavaScript)](/stt/integrations/langchain/langchain-js)**
* **[Twilio](/stt/integrations/twilio)**
* **[n8n](/stt/integrations/n8n)**
* **[Community integrations](/stt/integrations/community-integrations)**
***
# LiveKit
URL: /stt/integrations/livekit
How to use Soniox Speech-to-Text AI with LiveKit
import Image from "next/image";
import { LinkCards } from "@/components/link-card";
## Overview
Soniox Speech-to-Text AI turns audio into highly accurate text in real time. Paired with LiveKit, you can create powerful, responsive voice agents.
Use Soniox in your LiveKit agents to:
* Transcribe live audio from voice or video sessions in real time
* Build custom voice agents powered by Soniox
* Deploy voice-driven experiences at enterprise scale
All at lightning speed.
***
## Getting started
To use Soniox with LiveKit, you'll need LiveKit [user account](https://cloud.livekit.io/).
***
## Installation
Soniox provides Speech-to-Text through a [WebSocket API](/stt/api-reference/websocket-api), which is integrated into the official [LiveKit Python plugin](https://github.com/livekit/agents/tree/main/livekit-plugins/livekit-plugins-soniox).
Install LiveKit Agents library from PyPI:
```bash
pip install livekit-plugins-soniox
```
### Get Soniox API key
The Soniox plugin requires an API key to authenticate. You can find your API key in the [Soniox Console](https://console.soniox.com/).
Set Soniox API key in your `.env` file:
```bash
SONIOX_API_KEY=
```
***
## Usage
Use Soniox STT in an `AgentSession` or as a standalone transcription service:
```python
from livekit.plugins import soniox
session = AgentSession(
stt = soniox.STT(),
# ... llm, tts, etc.
)
```
See the [LiveKit Soniox docs](https://docs.livekit.io/agents/models/stt/plugins/soniox/) for more details.
See the [LiveKit Soniox API reference](https://docs.livekit.io/reference/python/v1/livekit/plugins/soniox/index.html) for full API details.
Congratulations! You are now ready to use Soniox STT in your LiveKit agents.
***
## Advanced usage
### Regional endpoints
If you want to use a different region than the default (US), you can pass a different `base_url` to the `STT` constructor:
```python
from livekit.plugins import soniox
session = AgentSession(
stt = soniox.STT(base_url='wss://stt-rt.eu.soniox.com/transcribe-websocket'),
)
```
See the [list of regional endpoints](/stt/data-residency#regional-endpoints) for a list of available endpoints.
### Language hints
There's no need to specify a language in advance — the model automatically detects and transcribes any supported language. It also handles multilingual audio effortlessly, even when multiple languages appear within the same sentence or conversation.
If you already know which languages are likely to be spoken, you can provide language hints to help the model prioritize those languages and improve accuracy:
```python
from livekit.plugins import soniox
options = soniox.STTOptions(
language_hints=["en", "es"],
)
session = AgentSession(
stt = soniox.STT(params=options),
)
```
See [list of supported languages](/stt/concepts/supported-languages) for a list of supported languages.
You can learn more about language hints [here](/stt/concepts/language-hints).
### Customization with context
By providing context, you help the AI model better understand and anticipate the language in your audio - even if some terms do not appear clearly or completely.
```python
from livekit.plugins import soniox
from livekit.plugins.soniox import ContextObject, ContextGeneralItem, ContextTranslationTerm
options = soniox.STTOptions(
context="Celebrex, Zyrtec, Xanax, Prilosec, Amoxicillin Clavulanate Potassium",
)
session = AgentSession(
stt=soniox.STT(
params=soniox.STTOptions(
context=ContextObject(
general=[ContextGeneralItem(key="domain", value="Healthcare")],
terms=["Celebrex", "Zyrtec", "Xanax", "Prilosec", "Amoxicillin Clavulanate Potassium"],
translation_terms=[ContextTranslationTerm(source="Mr. Smith", target="Sr. Smith")]
)
)
),
...
)
```
Learn more about customizing with context [here](/stt/concepts/context).
# n8n
URL: /stt/integrations/n8n
How to use Soniox Speech-to-Text AI with n8n
import Image from "next/image";
import VideoPlayer from "@/components/video-player";
import { Callout } from "@/components/callout";
## Overview
Soniox Speech-to-Text AI turns audio into highly accurate text. Paired with n8n, you can build powerful automation workflows that transcribe audio from any source.
Use the Soniox node in your n8n workflows to:
* Transcribe audio files uploaded to cloud storage
* Process voice messages from messaging platforms
* Build automated transcription pipelines at scale
* Combine speech-to-text with other n8n integrations
All with enterprise-grade accuracy.
***
## Getting started
To use Soniox with n8n, you'll need:
* An [n8n](https://n8n.io/) instance (self-hosted or cloud)
* A [Soniox account](https://console.soniox.com/) with an API key
***
## Installation
Soniox provides a first-party verified node in the n8n marketplace. Search for "Soniox" in the node panel to find it.
Alternatively, you can install via npm:
```bash
npm install @soniox/n8n-nodes-soniox
```
## Credentials
The Soniox node requires an API key to authenticate.
### Get your API key
1. Sign in to the [Soniox Console](https://console.soniox.com/)
2. Navigate to **API Keys**
3. Create a new key or copy an existing one
### Add credentials in n8n
1. In n8n, go to **Credentials** > **Add Credential**
2. Search for **Soniox API**
3. Enter your API key
4. Click **Save**
The credentials will be tested automatically. If successful, you're ready to use the Soniox node.
***
## Operations
The Soniox node supports three operations:
### Create transcription
Creates a new transcription job from an audio source. You can choose to wait for completion or receive results asynchronously via webhook.
**Audio sources:**
| Source | Description |
| ----------- | ------------------------------------------------------------------------ |
| Binary File | Upload audio from a previous node (e.g., HTTP Request, Read Binary File) |
| Audio URL | Provide a publicly accessible URL to the audio file |
| File ID | Use a file previously uploaded to Soniox |
### Get results
Retrieves the status and transcript for an existing transcription job. Use this when processing transcriptions asynchronously.
### Delete
Deletes a transcription and its associated file from Soniox. Simply provide the transcription ID — the node automatically fetches the file ID and deletes both resources. Use this to clean up after async workflows.
Soniox does not delete the file automatically. You must ensure the file is deleted after the transcription is completed. Check [limits and quotas](https://soniox.com/docs/stt/async/limits-and-quotas) for more information.
***
## Basic usage
### Transcribe from URL
The simplest way to transcribe audio is from a public URL:
1. Add the **Soniox** node to your workflow
2. Select **Create** operation
3. Set **Audio Source** to **Audio URL**
4. Enter the URL to your audio file
5. Execute the workflow
The node will wait for the transcription to complete and return the full transcript.
### Transcribe from binary data
To transcribe audio from another node (like HTTP Request or Read Binary File):
1. Connect the source node to the Soniox node
2. Select **Create** operation
3. Set **Audio Source** to **Binary File**
4. Set **Binary Property Name** to the property containing your audio (default: `data`)
5. Execute the workflow
### Polling settings
When **Wait for Completion** is enabled, you can configure:
| Setting | Default | Description |
| ------------------- | ------- | -------------------------------------- |
| Poll Interval (Sec) | 1 | How often to check for completion |
| Max Wait (Sec) | 300 | Maximum time to wait before timing out |
If the **Max Wait (Sec)** value is too large, the n8n cloud platform may timeout before the transcription completes. For long audio files, consider using [async processing with webhooks](#async-processing-with-webhooks) instead.
***
## Advanced usage
### Language hints
The model automatically detects and transcribes any supported language. It also handles multilingual audio, even when multiple languages appear within the same conversation.
If you know which languages are likely to be spoken, you can provide language hints to improve accuracy:
1. In the Soniox node, find **Language Hints**
2. Click **Add Language**
3. Enter the language code (e.g., `en`, `es`, `fr`)
4. Repeat for additional languages
See [list of supported languages](/stt/concepts/supported-languages) for all available language codes.
Learn more about [language hints](/stt/concepts/language-hints).
### Speaker diarization
Enable **Enable Speaker Diarization** to identify and separate different speakers in the audio. The transcript will include speaker labels for each segment.
### Language identification
Enable **Enable Language Identification** to include detected language information in the transcript output.
### Customization with context
Provide context to help the model better understand domain-specific terminology, names, or phrases.
**Simple text context:**
1. Set **Context Mode** to **Text**
2. Enter relevant terms or phrases in the **Context Text** field
```
Celebrex, Zyrtec, Xanax, Prilosec, Amoxicillin
```
**Structured JSON context:**
For more control, use structured context:
1. Set **Context Mode** to **Structured JSON**
2. Enter a JSON object in the **Context JSON** field
```json
{
"general": [
{"key": "domain", "value": "Healthcare"}
],
"text": "Medical consultation recording",
"terms": ["Celebrex", "Zyrtec", "Xanax"],
"translation_terms": [
{"source": "Dr. Smith", "target": "Dr. Smith"}
]
}
```
Learn more about [customizing with context](/stt/concepts/context).
### Translation
Soniox can translate the transcript to another language during transcription.
**One-way translation:**
Translate the transcript to a single target language:
1. Set **Translation Type** to **One Way**
2. Enter the **Target Language** code (e.g., `es` for Spanish)
**Two-way translation:**
For conversations between speakers of two languages, translate each speaker to the other's language:
1. Set **Translation Type** to **Two Way**
2. Enter **Language A** (e.g., `en`)
3. Enter **Language B** (e.g., `es`)
***
## Async processing with webhooks
For long audio files or high-volume processing, you can use webhooks instead of waiting for completion:
1. Set **Wait for Completion** to **false**
2. Enter your **Webhook URL**
3. Optionally set **Webhook Auth Header Name** and **Webhook Auth Header Value** for authentication
The node will immediately return the transcription ID. Soniox will send the results to your webhook when processing completes.
To fetch results later, use the **Get Results** operation with the transcription ID.
***
## Output options
### Output mode
Choose what data to return when the transcription completes:
| Mode | Description |
| ------------- | -------------------------------------------------------------------------------------- |
| Full Response | Returns the complete transcript with all metadata, timestamps, and speaker information |
| Text Only | Returns only the transcribed text as a simple string |
***
## Cleanup and resource management
When you upload binary files, Soniox stores them temporarily. To avoid accumulating unused files, use the cleanup features:
### Automatic cleanup (recommended)
When **Wait for Completion** is enabled, the **Auto Delete** option is available (enabled by default).
When enabled, the node automatically deletes:
* The **transcription** — always deleted regardless of audio source
* The **uploaded file** — only deleted when using Binary File as the audio source (since that's when a file is uploaded)
This keeps your Soniox account clean without extra workflow steps.
### Manual cleanup
For async workflows (when not waiting for completion), use the **Delete** operation to clean up after processing:
1. Add a new **Soniox** node after receiving the webhook callback
2. Select **Delete** operation
3. Enter the **Transcription ID**
4. Execute
The Delete operation automatically fetches the transcription details to find the associated file ID, then deletes both the transcription and its file (if one exists).
**Example async cleanup workflow:**
1. **Soniox** (Create) — Upload file, don't wait.
2. **Webhook** trigger — Receives completion callback from Soniox with `id` (transcription ID)
3. Process the transcript as needed
4. **Soniox** (Delete) — Clean up using the transcription ID from step 2
***
## Resources
* [Soniox API Reference](/stt/api-reference)
* [Supported Languages](/stt/concepts/supported-languages)
* [n8n Documentation](https://docs.n8n.io/)
* [GitHub Repository](https://github.com/soniox/n8n-nodes-soniox)
# Pipecat
URL: /stt/integrations/pipecat
Integrate Soniox Speech-to-Text into Pipecat pipeline.
import { Step, Steps } from "fumadocs-ui/components/steps";
import Image from "next/image";
import { LinkCards } from "@/components/link-card";
## Overview
Pipecat is a framework for building voice-enabled, real-time, multimodal AI applications. Pipecat's pipeline for real-time voice applications looks like this:
1. **Send Audio** - Transmit and capture streamed audio from the user
2. **Transcribe Speech** - Convert speech to text as the user is talking
3. **Process with LLM** - Generate responses using a large language model
4. **Convert to Speech** - Transform text responses into natural speech
5. **Play Audio** - Stream the audio response back to the user
At each step, there are multiple options for services to use. Soniox provides `SonioxSTTService`, which handles the **Transcribe Speech** step. For more details on how Pipecat works, check the [Pipecat documentation](https://docs.pipecat.ai/getting-started/introduction).
## Installation
To use `SonioxSTTService` in Pipecat projects, you need to install the Soniox dependencies:
```bash
pip install "pipecat-ai[soniox]"
```
You'll also need to set up your Soniox API key as an environment variable: `SONIOX_API_KEY`. You can obtain a Soniox API key by signing up at [Soniox Console](https://console.soniox.com/).
## Usage example
To integrate `SonioxSTTService` into a Pipecat pipeline for real-time speech-to-text transcription,
you can simply create an instance of the service and add it to your pipeline:
```python
from pipecat.pipeline.pipeline import Pipeline
from pipecat.services.soniox.stt import SonioxSTTService
# Configure the service
stt = SonioxSTTService(
api_key=os.getenv("SONIOX_API_KEY"),
)
# Use in pipeline
pipeline = Pipeline([
transport.input(),
stt,
llm,
...
])
```
## Complete examples
The following examples demonstrate how to use `SonioxSTTService` in Pipecat projects:
## Advanced usage
### Regional endpoints
If you want to use a different region than the default (US), you can pass a different `url` to the `SonioxSTTService` constructor:
```python
from pipecat.services.soniox.stt import SonioxSTTService
stt = SonioxSTTService(url='wss://stt-rt.eu.soniox.com/transcribe-websocket')
```
See the [list of regional endpoints](/stt/data-residency#regional-endpoints) for a list of available endpoints.
### Language hints
There is no need to pre-select a language — the model automatically detects and transcribes any supported language. It also handles multilingual audio seamlessly, even when multiple languages are mixed within a single sentence or conversation.
However, when you have prior knowledge of the languages likely to be spoken in your audio, you can use language hints to guide the model toward those languages for even greater recognition accuracy.
```python
from pipecat.services.soniox.stt import SonioxInputParams
from pipecat.transcriptions.language import Language
SonioxInputParams(
language_hints=[Language.EN, Language.ES, Language.JA, Language.ZH],
)
```
Language variants are ignored, for example `Language.EN_GB` will be treated same as `Language.EN`. See [list of supported languages](/stt/concepts/supported-languages) for a list of supported languages.
You can learn more about language hints [here](/stt/concepts/language-hints).
### Customization with context
By providing context, you help the AI model better understand and anticipate the language in your audio - even if some terms do not appear clearly or completely.
```python
from pipecat.services.soniox.config import SonioxInputParams
SonioxInputParams(
context="Celebrex, Zyrtec, Xanax, Prilosec, Amoxicillin Clavulanate Potassium",
),
```
### Endpoint Detection and VAD
The `SonioxSTTService` processes your speech and has two ways of knowing when to finalize the text.
#### Automatic Pause Detection
By default, the service listens for natural pauses in your speech. When it detects that you've likely finished a sentence, it finalizes the transcription.
You can learn more about Endpoint Detection [here](/stt/rt/endpoint-detection).
#### Using Voice Activity Detection (VAD)
For more explicit control, you can use a dedicated Voice Activity Detection (VAD) component within your Pipecat pipeline. The VAD's job is to detect when a user has completely stopped talking.
To enable this behavior, set `vad_force_turn_endpoint` to `True`. This will disable the automatic endpoint detection and force the service to return transcription results as soon as the user stops talking.
# TanStack AI SDK
URL: /stt/integrations/tanstack-ai-sdk
Soniox transcription adapter for the TanStack AI SDK.
import Image from "next/image";
import { LinkCards } from "@/components/link-card";
## Overview
[TanStack AI](https://tanstack.com/ai) is a TypeScript toolkit for building AI applications. It provides a unified API that abstracts away the differences between various AI providers, allowing developers to switch models with just a few lines of code.
This package (`@soniox/tanstack-ai-adapter`) implements the SDK's transcription adapter, enabling you to use Soniox's Speech-to-Text models directly within the standard TanStack AI workflow.
## Installation
```bash
npm install @soniox/tanstack-ai-adapter
```
## Authentication
Set `SONIOX_API_KEY` in your environment or pass `apiKey` when creating the adapter.
Get your API key from the [Soniox Console](https://console.soniox.com).
## Example
```ts
import { generateTranscription } from '@tanstack/ai';
import { sonioxTranscription } from '@soniox/tanstack-ai-adapter';
const result = await generateTranscription({
adapter: sonioxTranscription('stt-async-v4'),
audio: new URL(
'https://soniox.com/media/examples/coffee_shop.mp3',
),
modelOptions: {
enableLanguageIdentification: true,
enableSpeakerDiarization: true,
},
});
console.log(result.text);
console.log(result.segments); // Timestamped segments with speaker info
```
## Adapter configuration
Use `createSonioxTranscription` to customize the adapter instance:
```ts
import { createSonioxTranscription } from '@soniox/tanstack-ai-adapter';
const adapter = createSonioxTranscription('stt-async-v4', process.env.SONIOX_API_KEY!, {
baseUrl: 'https://api.soniox.com',
pollingIntervalMs: 1000,
timeout: 180000,
});
```
Options:
* `apiKey`: override `SONIOX_API_KEY` (required when using `createSonioxTranscription`).
* `baseUrl`: custom API base URL. See list of regional API endpoints [here](/stt/data-residency#regional-endpoints). Default is `https://api.soniox.com`.
* `headers`: additional request headers.
* `timeout`: transcription timeout in milliseconds. Default is 180000ms (3 minutes).
* `pollingIntervalMs`: transcription polling interval in milliseconds. Default is 1000ms.
## Transcription options
Per-request options are passed via `modelOptions`:
```ts
const result = await generateTranscription({
adapter: sonioxTranscription('stt-async-v4'),
audio,
modelOptions: {
languageHints: ['en', 'es'],
enableLanguageIdentification: true,
enableSpeakerDiarization: true,
context: {
terms: ['Soniox', 'TanStack'],
},
},
});
```
Available options:
* `languageHints` - Array of ISO language codes to bias recognition. If you pass the TanStack `language` option, this adapter will merge it into `languageHints` for convenience.
* `languageHintsStrict` - When true, rely more heavily on language hints (note: not supported by all models)
* `enableLanguageIdentification` - Automatically detect spoken language
* `enableSpeakerDiarization` - Identify and separate different speakers
* `context` - Additional context to improve accuracy
* `clientReferenceId` - Optional client-defined reference ID
* `webhookUrl` - Webhook URL for transcription completion notifications
* `webhookAuthHeaderName` - Webhook authentication header name
* `webhookAuthHeaderValue` - Webhook authentication header value
* `translation` - Translation configuration
For more information on the available options, see the [Speech-to-Text API reference](/stt/api-reference/transcriptions/create_transcription).
## Accessing raw tokens
When using translation or working with multilingual audio, you may need access to raw tokens with per-token language information and translation status. The adapter attaches a non-standard `providerMetadata` field at runtime:
```ts
const result = await generateTranscription({
adapter: sonioxTranscription('stt-async-v4'),
audio,
modelOptions: {
translation: { type: 'one_way', targetLanguage: 'es' },
},
});
// Access raw Soniox tokens with full metadata
const rawTokens = (result as any).providerMetadata?.soniox?.tokens;
if (rawTokens) {
rawTokens.forEach((token) => {
// token.text - token text
// token.start_ms - start time in milliseconds
// token.end_ms - end time in milliseconds
// token.language - detected language for this token
// token.translation_status - translation status (if translation enabled)
// token.speaker - speaker identifier
// token.confidence - confidence score
});
}
```
**Note:** When using translation, the API returns both transcription tokens (original) and translation tokens. The `segments` array always includes only transcription tokens. To access translation tokens, filter by `translation_status === 'translation'`.
# Twilio
URL: /stt/integrations/twilio
Stream Twilio call audio to Soniox Speech-to-Text API and get real-time transcriptions.
import Image from "next/image";
import { LinkCards } from "@/components/link-card";
## Overview
This guide demonstrates how to stream live Twilio call audio to the Soniox Speech-to-Text API and receive real-time transcription via WebSockets.
If you want to see a complete example, check out this GitHub repository:
## Preparation
### Create a Twilio account
To get started, you'll need a Twilio account. If you don't have one, you can [sign up](https://www.twilio.com/try-twilio).
You will also need two phone numbers to test the integration:
* One from a phone number you own, which needs to be verified by Twilio.
* The other one is a Twilio-owned phone number that you can use for testing.
### Get your Soniox API key
To use Soniox Speech-to-Text API in your application, you'll need to obtain an API key. You can get one by signing up at [Soniox Console](https://console.soniox.com/).
## Running the example
### Clone the repository
Clone the repository and install the dependencies:
```bash
git clone https://github.com/soniox/soniox-twilio-realtime-transcription.git
cd soniox-twilio-realtime-transcription
pip install -r requirements.txt
```
### Configure server environment
Copy the `.env.example` file to `.env` and update the values with your Twilio account credentials and Soniox API key:
```bash
cp .env.example .env
```
### Run the server and expose it to Twilio
Run the server:
```bash
python server.py
```
This will start the server and listen for incoming Twilio calls. You will specify where phone call recording is streamed later. To expose the server to Twilio, you can use [ngrok](https://ngrok.com/).
```bash
ngrok http 5000
```
Note the forwarding URL that ngrok provides. It should look like `https://.ngrok.io` or `https://.ngrok-free.app`.
### Run the client
Edit `client.html` and set `WEBSOCKET_URL` to your ngrok URL with `/client` at the end, e.g. `wss://xxxxx.ngrok.io/client`.
Open `client.html` in your browser to view live call transcriptions.
### Start a Twilio call
You can configure Twilio calls with `TwiML Bin` files. More information about streaming can be found in the [Twilio documentation](https://www.twilio.com/docs/voice/twiml/stream).
Here is an example `TwiML Bin` file calls you phone number and streams the audio to your websocket server:
```xml
USER_PHONE_NUMBER"Hello, this is a test call. How are you?""Thank you, bye!"
```
To start, we recommend using the provided `call_me.py` script to start a Twilio call. Simply set the following environment variables:
* `TWILIO_ACCOUNT_SID` and `TWILIO_AUTH_TOKEN` (from Twilio)
* `TWILIO_PHONE_NUMBER` (your Twilio number, rented on Twilio)
* `WEBSOCKET_URL` with your ngrok URL with `/twilio` at the end, e.g. `wss://xxxxx.ngrok.io/twilio`.
* `USER_PHONE_NUMBER` with your Twilio-verified phone number.
```bash
python call_me.py
```
You should hear a voice message saying "Hello, this is a test call. How are you?" and then a message saying "Thank you, bye!". Simultaneously, you should see the transcription in the browser.
# Vercel AI SDK
URL: /stt/integrations/vercel-ai-sdk
Soniox transcription provider for the Vercel AI SDK.
import Image from "next/image";
import { LinkCards } from "@/components/link-card";
## Overview
[Vercel AI SDK](https://sdk.vercel.ai/) is a TypeScript toolkit for building AI applications. It provides a unified API that abstracts away the differences between various AI providers, allowing developers to switch models with just a few lines of code.
The [`@soniox/vercel-ai-sdk-provider`](https://www.npmjs.com/package/@soniox/vercel-ai-sdk-provider) package implements the SDK's Transcription Interface, enabling you to use Soniox's Speech-to-Text models directly within the standard Vercel AI workflow. Learn more about the Soniox provider in the [Vercel AI SDK Community Providers documentation](https://ai-sdk.dev/providers/community-providers/soniox).
## Installation
```bash
npm install @soniox/vercel-ai-sdk-provider
```
## Authentication
Set `SONIOX_API_KEY` in your environment or pass `apiKey` when creating the provider.
## Example
```ts
import { soniox } from '@soniox/vercel-ai-sdk-provider';
import { experimental_transcribe as transcribe } from 'ai';
const { text } = await transcribe({
model: soniox.transcription('stt-async-v4'),
audio: new URL(
'https://soniox.com/media/examples/coffee_shop.mp3',
),
});
```
## Provider options
Use `createSoniox` to customize the provider instance:
```ts
import { createSoniox } from '@soniox/vercel-ai-sdk-provider';
const soniox = createSoniox({
apiKey: process.env.SONIOX_API_KEY,
apiBaseUrl: 'https://api.soniox.com',
});
```
Options:
* `apiKey`: override `SONIOX_API_KEY`.
* `apiBaseUrl`: custom API base URL. See list of regional API endpoints [here](/stt/data-residency#regional-endpoints).
* `headers`: additional request headers.
* `fetch`: custom fetch implementation.
* `pollingIntervalMs`: transcription polling interval in milliseconds. Default is 1000ms.
## Transcription options
Per-request options are passed via `providerOptions`:
```ts
const { text } = await transcribe({
model: soniox.transcription('stt-async-v4'),
audio,
providerOptions: {
soniox: {
languageHints: ['en', 'es'],
enableLanguageIdentification: true,
enableSpeakerDiarization: true,
context: {
terms: ["Soniox", "Vercel"]
},
},
},
});
```
Available options:
* `languageHints` - Array of ISO language codes to bias recognition
* `languageHintsStrict` - When true, rely more heavily on language hints (note: not supported by all models)
* `enableLanguageIdentification` - Automatically detect spoken language
* `enableSpeakerDiarization` - Identify and separate different speakers
* `context` - Additional context to improve accuracy
* `clientReferenceId` - Optional client-defined reference ID
* `webhookUrl` - Webhook URL for transcription completion notifications
* `webhookAuthHeaderName` - Webhook authentication header name
* `webhookAuthHeaderValue` - Webhook authentication header value
* `translation` - Translation configuration
For more information on the available options, see the [Speech-to-Text API reference](/stt/api-reference/transcriptions/create_transcription).
# Direct stream
URL: /stt/guides/direct-stream
Stream directly from microphone to Soniox Speech-to-Text WebSocket API to minimize latency.
import Image from "next/image";
## Overview
This guide walks you through capturing and transcribing microphone audio in real
time using the Soniox [WebSocket API](/stt/api-reference/websocket-api) — optimized for the lowest possible latency.
The direct stream approach enables the browser to send audio directly to the Soniox WebSocket API
over a WebSocket connection, eliminating the need for any intermediary server.
This results in faster transcription and a simpler architecture.
Soniox's [Web Library](/stt/SDKs/web-library) handles everything client-side — capturing microphone input,
managing the WebSocket connection, and authenticating using temporary API keys.
Use this setup when you want real-time speech-to-text performance directly in the browser **with minimal delay**.
***
## Temporary API keys
[Temporary API keys](/stt/api-reference/auth/create_temporary_api_key)
(obtained from [REST API](/stt/api-reference/auth/create_temporary_api_key))
are required solely to establish the WebSocket connection. Once the connection is established,
it will be kept alive as long it remains active. The `expires_in_seconds` configuration parameter
should be set to a short duration.
Following parameters are required to create a temporary API key:
```json
{
"usage_type": "transcribe_websocket",
"expires_in_seconds": 60
}
```
API request limits apply when creating temporary API keys. See **Limits** section
in the [Soniox Console](https://console.soniox.com).
***
## Example
This is an example of a browser-based transcription, but same principle applies to any other type of
client - you minimize latency by connecting the client directly to the WebSocket API using a temporary API key.
First we create a simple HTTP server that on request:
1. Renders the `index.html` template.
2. Exposes an endpoint to serve the temporary API key (`/temporary-api-key`).
Python server using [FastAPI](https://fastapi.tiangolo.com/):
```
import os
import requests
import uvicorn
from dotenv import load_dotenv
from fastapi import FastAPI, Request
from fastapi.responses import HTMLResponse, JSONResponse
from fastapi.templating import Jinja2Templates
load_dotenv()
templates = Jinja2Templates(directory="templates")
app = FastAPI()
@app.get("/", response_class=HTMLResponse)
async def get_index(request: Request):
return templates.TemplateResponse(
request=request,
name="index.html",
)
@app.get("/temporary-api-key", response_class=JSONResponse)
async def get_temporary_api_key():
try:
response = requests.post(
"https://api.soniox.com/v1/auth/temporary-api-key",
headers={
"Authorization": f"Bearer {os.getenv('SONIOX_API_KEY')}",
"Content-Type": "application/json",
},
json={
"usage_type": "transcribe_websocket",
"expires_in_seconds": 60,
},
)
if not response.ok:
raise Exception(f"Error: {response.json()}")
temporaryApiKeyData = response.json()
return temporaryApiKeyData
except Exception as error:
print(error)
return JSONResponse(
status_code=500,
content={"error": f"Server failed to obtain temporary api key: {error}"},
)
if __name__ == "__main__":
port = int(os.getenv("PORT", 3001))
uvicorn.run(app, host="0.0.0.0", port=port)
```
[View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/python/real_time/browser_direct_stream)
Node.js server using [Express](https://expressjs.com/):
```
require("dotenv").config();
const http = require("http");
const express = require("express");
const fetch = require("node-fetch");
const path = require("path");
const fs = require("fs").promises;
const app = express();
app.use("/templates", express.static(path.join(__dirname, "templates")));
app.get("/", async (req, res) => {
const index = await fs.readFile(
path.join(__dirname, "templates/index.html"),
"utf8"
);
res.send(index);
});
app.get("/temporary-api-key", async (req, res) => {
try {
const response = await fetch(
"https://api.soniox.com/v1/auth/temporary-api-key",
{
method: "POST",
headers: {
Authorization: `Bearer ${process.env.SONIOX_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
usage_type: "transcribe_websocket",
expires_in_seconds: 60,
}),
}
);
if (!response.ok) {
throw await response.json();
}
const temporaryApiKeyData = await response.json();
res.json(temporaryApiKeyData);
} catch (error) {
console.error(error);
res.status(500).json({
error: `Server failed to obtain temporary api key: ${JSON.stringify(error)}`,
});
}
});
// Create HTTP server with Express
const server = http.createServer(app);
server.listen(process.env.PORT, () => {
console.log(
`HTTP server listening on http://0.0.0.0:${process.env.PORT}`
);
});
```
[View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/nodejs/real_time/browser_direct_stream/server.js)
Our HTML client template contains a single "Start" button, that when clicked:
1. Requests microphone permissions.
2. Calls the `/temporary-api-key` endpoint to obtain a temporary API key.
3. Creates a new [`RecordTranscribe`](/stt/SDKs/web-library) class instance passing temporary api key as `apiKey` parameter.
4. Connects to the [WebSocket API](/stt/api-reference/websocket-api).
5. Starts transcribing from microphone input and renders transcribed text into a `div` in real-time.
```
Browser direct stream example
Start
```
[View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/python/real_time/browser_direct_stream)
# Proxy stream
URL: /stt/guides/proxy-stream
How to stream audio from a client app to Soniox Speech-to-Text WebSocket API through a proxy server.
import Image from "next/image";
## Overview
This guide explains how to stream microphone audio from a client to the Soniox
[WebSocket API](/stt/api-reference/websocket-api) through a proxy server.
In this architecture, the client captures audio and sends it over WebSocket to a proxy server. The proxy
server establishes a connection to the Soniox WebSocket API, authenticates the session, streams the
audio for transcription, and relays the transcribed results back to the client in real time.
This setup is useful when you want to **inspect, transform, or store audio and transcription
data on the server side** before passing it to the client. If your goal is simply to transcribe
audio and return results with the lowest possible latency, consider using the
[direct stream](/stt/guides/direct-stream) approach instead.
## Example
In the following example, we create a proxy HTTP server that:
1. Listens for incoming WebSocket connections from the client.
2. Forwards audio data from the client to the [WebSocket API](/stt/api-reference/websocket-api).
3. Relays transcription results back to the client.
Authentication with the [WebSocket API](/stt/api-reference/websocket-api) is handled by the proxy server using the `SONIOX_API_KEY`.
Python server that will act as a proxy between our client and [WebSocket API](/stt/api-reference/websocket-api).
```
import os
import json
import asyncio
from dotenv import load_dotenv
import websockets
load_dotenv()
async def handle_client(websocket):
print("Browser client connected")
# create a message queue to store client messages received before
# Soniox WebSocket API connection is ready, so we don't loose any
message_queue = []
soniox_ws = None
soniox_ws_ready = False
async def init_soniox_connection():
nonlocal soniox_ws, soniox_ws_ready
try:
soniox_ws = await websockets.connect(
"wss://stt-rt.soniox.com/transcribe-websocket"
)
print("Connected to Soniox STT WebSocket API")
# Send initial configuration message
start_message = json.dumps(
{
"api_key": os.getenv("SONIOX_API_KEY"),
"audio_format": "auto",
"model": "stt-rt-preview-v2",
"language_hints": ["en"],
}
)
await soniox_ws.send(start_message)
print("Sent start message to Soniox")
# mark connection as ready
soniox_ws_ready = True
# process any queued messages
while len(message_queue) > 0 and soniox_ws_ready:
data = message_queue.pop(0)
await forward_data(data)
# receive messages from Soniox STT WebSocket API
async for message in soniox_ws:
try:
await websocket.send(message)
except Exception as e:
print(f"Error forwarding Soniox response: {e}")
break
except Exception as e:
print(f"Soniox WebSocket error: {e}")
soniox_ws_ready = False
finally:
if soniox_ws:
await soniox_ws.close()
soniox_ws_ready = False
print("Soniox WebSocket closed")
async def forward_data(data):
try:
if soniox_ws:
await soniox_ws.send(data)
except Exception as e:
print(f"Error forwarding data to Soniox: {e}")
# initialize Soniox connection
soniox_task = asyncio.create_task(init_soniox_connection())
try:
# receive messages from browser client
async for data in websocket:
if soniox_ws_ready:
# forward messages instantly
await forward_data(data)
else:
# queue the message to be processed
# as soon as connection to Soniox STT WebSocket API is ready
message_queue.append(data)
except Exception as e:
print(f"Error with browser client: {e}")
finally:
print("Browser client disconnected")
soniox_task.cancel()
try:
await soniox_task
except asyncio.CancelledError:
pass
async def main():
port = int(os.getenv("PORT", 3001))
server = await websockets.serve(handle_client, "0.0.0.0", port)
print(f"WebSocket proxy server listening on http://0.0.0.0:{port}")
await server.wait_closed()
if __name__ == "__main__":
asyncio.run(main())
```
[View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/python/real_time/browser_proxy_stream)
Node.js server that will act as a proxy between our client and [WebSocket API](/stt/api-reference/websocket-api).
```
require("dotenv").config();
const WebSocket = require("ws");
const http = require("http");
const server = http.createServer();
const wss = new WebSocket.Server({ server });
wss.on("connection", (ws) => {
console.log("Browser client connected");
// create a message queue to store client messages received before
// Soniox WebSocket API connection is ready, so we don't loose any
const messageQueue = [];
let sonioxWs = null;
let sonioxWsReady = false;
function initSonioxConnection() {
sonioxWs = new WebSocket("wss://stt-rt.soniox.com/transcribe-websocket");
sonioxWs.on("open", () => {
console.log("Connected to Soniox STT WebSocket API");
// send initial configuration message
const startMessage = JSON.stringify({
api_key: process.env.SONIOX_API_KEY,
audio_format: "auto",
model: "stt-rt-preview-v2",
language_hints: ["en"],
});
sonioxWs.send(startMessage);
console.log("Sent start message to Soniox");
// mark connection as ready
sonioxWsReady = true;
// process any queued messages
while (messageQueue.length > 0 && sonioxWsReady) {
const data = messageQueue.shift();
forwardData(data);
}
});
// receive messages from Soniox STT WebSocket API
sonioxWs.on("message", (data) => {
// note:
// at this point we could manipulate and enhance the transcribed data
try {
ws.send(data.toString());
} catch (err) {
console.error("Error forwarding Soniox response:", err);
}
});
sonioxWs.on("error", (error) => {
console.log("Soniox WebSocket error:", error);
sonioxWsReady = false;
});
sonioxWs.on("close", (code, reason) => {
console.log("Soniox WebSocket closed:", code, reason);
sonioxWsReady = false;
ws.close();
});
}
// forward message data to Soniox STT WebSocket API
function forwardData(data) {
try {
sonioxWs.send(data);
} catch (err) {
console.error("Error forwarding data to Soniox:", err);
}
}
// initialize Soniox connection
initSonioxConnection();
// receive messages from browser client
ws.on("message", (data) => {
if (sonioxWsReady) {
// forward messages instantly
forwardData(data);
} else {
// queue the message to be processed
// as soon as connection to Soniox STT WebSocket API is ready
messageQueue.push(data);
}
});
ws.on("close", () => {
console.log("Browser client disconnected");
if (sonioxWs) {
try {
sonioxWs.close();
} catch (err) {
console.error("Error closing Soniox connection:", err);
}
}
});
});
server.listen(process.env.PORT, () => {
console.log(
`WebSocket proxy server listening on http://0.0.0.0:${process.env.PORT}`
);
});
```
[View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/nodejs/real_time/browser_proxy_stream)
Next, we create a basic HTML page as the client (same concept works for any other app framework).
The HTML client:
1. Connects to the proxy server via WebSocket.
2. Captures audio stream from the microphone through the [`MediaRecorder`](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder).
3. Streams audio data to the proxy server.
4. Receives messages from the proxy server and renders transcribed text into a `div`.
```
Browser proxy stream example
Start
```
[View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/python/real_time/browser_proxy_stream)
# Async transcription with Node SDK
URL: /stt/SDKs/node-SDK/async-transcription
Transcribe audio files asynchronously with the Soniox Node SDK
Soniox Node SDK supports asynchronous transcription for audio files. This allows you to transcribe recordings without maintaining a live connection or streaming pipeline.
You can either wait for completion or create a job and retrieve the results based on the webhook event.
## Quickstart
SDK provides you a convenient method to transcribe audio from a local file, public URL, or previously uploaded file.
The **[`transcribe`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribe) method will:**
1. Upload the file to Soniox if it's not already uploaded (if `file` is provided)
2. Transcribe the audio
3. Await for the transcription to complete (if `wait: true` is provided)
4. Return the transcription object and final transcript (you can disable this by setting `fetch_transcript: false` and fetch transcript later using [`getTranscript`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) method)
5. Delete the file from Soniox if was uploaded (configurable using `cleanup` option)
Don't forget to remove files and transcriptions from Soniox after you're done with them if `cleanup` option is not set.
**Transcribe from a local file and delete everything after transcription is complete**
```ts
const transcription = await client.stt.transcribe({
model: 'stt-async-v4',
file: audio, // Buffer, Uint8Array, Blob, ReadableStream
filename: 'audio.mp3',
wait: true,
cleanup: ['file', 'transcription'],
});
```
**Transcribe from a public URL and fetch the transcript later using [`getTranscript`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) method**
```ts
const transcription = await client.stt.transcribe({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
wait: true,
});
const transcript = await transcription.getTranscript();
```
**Transcribe from a previously uploaded file and setup a [webhook](/stt/SDKs/node-SDK/webhooks) to get the transcription when it's complete**
```ts
const transcription = await client.stt.transcribe({
model: 'stt-async-v4',
file_id: file.id,
wait: false,
webhook_url: 'https://example.com/webhook',
});
```
Learn more about [testing webhooks locally](/stt/SDKs/node-SDK/webhooks#testing-webhooks-locally).
## Retrieve list of transcriptions
You can retrieve a list of transcriptions using [`list`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-list) method.
```ts
const transcriptions = await client.stt.list({
limit: 100,
});
```
The returned result is async iterable - use `for await...of` to iterate through all pages.
```ts
for await (const transcription of transcriptions) {
console.log(transcription.id, transcription.status);
}
```
## Get transcription
You can get a transcription by ID using [`get`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-get) method.
```ts
const transcription = await client.stt.get(transcription.id);
console.log(transcription.id, transcription.status);
```
## Get transcription transcript
You can get a transcription transcript using [`getTranscript`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) method.
```ts
const transcript = await transcription.getTranscript();
console.log(transcript.text);
```
Or get transcript by transcription ID.
```ts
const transcript = await client.stt.getTranscript(transcription.id);
console.log(transcript.text);
```
## Segmenting transcripts
Group tokens by speaker and language changes:
```ts
const transcript = await transcription.getTranscript();
for (const segment of transcript?.segments() ?? []) {
console.log(`[${segment.speaker}][${segment.language}] ${segment.text}`);
}
```
## Delete or destroy transcription
You can delete or destroy a transcription using [`delete`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete) or [`destroy`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy) method.
**Delete transcription only**
```ts
await client.stt.delete(transcription.id);
```
**Delete transcription and its file if it was uploaded**
```ts
await client.stt.destroy(transcription.id);
```
## Delete all transcriptions and files from your account
### Delete all transcriptions
You can delete all transcriptions using [`stt.delete_all`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete_all) method.
```ts
await client.stt.delete_all();
```
### Delete all transcriptions and their files
You can delete all transcriptions and their files using [`stt.destroy_all`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy_all) method.
```ts
await client.stt.destroy_all();
```
`delete_all` and `destroy_all` operations are irreversible and cannot be undone.
# Handling files with Node SDK
URL: /stt/SDKs/node-SDK/files
Upload audio files and manage them with the Soniox Node SDK
Node SDK provides helpers to work with the [Files API](/stt/api-reference/files/upload_file) to upload audio for async transcription or to reuse files across multiple jobs.
## Upload
[`upload()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-upload) accepts `Buffer`, `Uint8Array`, `Blob`, `ReadableStream`
```ts
import { readFile } from 'node:fs/promises';
const audio = await readFile('audio.mp3');
const file = await client.files.upload(audio, {
filename: 'audio.mp3',
client_reference_id: 'meeting-42',
});
console.log(file.id, file.filename, file.size);
```
Read more about [Supported audio formats](/stt/async/async-transcription#audio-formats).
## List files
[`list()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-list) returns a paginated list of all uploaded files. Use `for await...of` to iterate through all pages.
```ts
const result = await client.files.list({ limit: 100 });
// Automatic pagination
for await (const file of result) {
console.log(file.id, file.filename);
}
```
## Get file
Get a file by ID using [`get()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-get) method:
```ts
const file = await client.files.get('file-id');
```
## Delete file
Delete file via instance using [`file.delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete) method:
```ts
const file = await client.files.get('file-id');
if (file) {
await file.delete();
}
```
Or delete by ID using [`delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete) method:
```ts
await client.files.delete('file-id');
```
## Delete all files from your account
You can delete all files using `files.delete_all` method.
```ts
await client.files.delete_all();
```
`delete_all` operation is irreversible and cannot be undone.
# Node SDK
URL: /stt/SDKs/node-SDK
Build speech-to-text workflows in Node with async and real-time APIs.
The Soniox [Node SDK](https://github.com/soniox/soniox-js) gives you fully typed access to our Async and Real-time Speech-to-Text APIs.
## Quickstart
### Install
Install via your preferred package manager:
```bash tab
npm install @soniox/node
```
```bash tab
yarn add @soniox/node
```
```bash tab
pnpm add @soniox/node
```
```bash tab
bun add @soniox/node
```
### Set your API key
```sh title="Terminal"
export SONIOX_API_KEY=
```
Create a [Soniox account](https://console.soniox.com/signup) and log in to
the [Console](https://console.soniox.com) to get your API key.
See all available environment variables in the [SDK reference](/stt/SDKs/node-SDK/reference#environment-variables).
### Create your first real-time session
```ts
import { SonioxNodeClient } from "@soniox/node";
const stream = await createFakeAudioStream();
// Create a Soniox client
// The API key is read from the SONIOX_API_KEY environment variable.
const client = new SonioxNodeClient();
// Create a real-time session
const session = client.realtime.stt({
model: "stt-rt-v4",
});
// Listen for transcription results
session.on("result", (result) => {
const text = result.tokens.map((t) => t.text).join("");
if (text) console.log(text);
});
// Listen for errors
session.on("error", (err) => console.error("Error:", err));
// Connect to Soniox and stream audio
await session.connect();
// Stream audio in chunks
session.sendStream(stream,
{
pace_ms: 60, // Send audio in chunks of 60ms to simulate real-time transcription
finish: true // Gracefully end the session after the stream is finished
}
);
// Fake streaming: read audio.mp3 in chunks and send in 60ms interval to simulate real-time transcription.
// In real use case, you would stream audio from a microphone or other stream source.
async function createFakeAudioStream() {
const res = await fetch("https://soniox.com/media/examples/coffee_shop.mp3");
if (!res.ok) throw new Error(`Request failed: ${res.status} ${res.statusText}`);
if (!res.body) throw new Error("No response body");
return res.body;
}
```
Learn more about [Real-time transcription](/stt/SDKs/node-SDK/realtime-transcription).
### Create your first async transcription
```ts
import { SonioxNodeClient } from '@soniox/node';
import { readFile } from 'node:fs/promises';
const audio = await readFile('audio.mp3');
const client = new SonioxNodeClient();
const transcription = await client.stt.transcribe({
model: 'stt-async-v4',
file: audio,
filename: 'audio.mp3',
wait: true,
});
console.log(transcription.transcript?.text);
```
Learn more about [Async transcription](/stt/SDKs/node-SDK/async-transcription).
## Next steps
* [Real-time transcription](/stt/SDKs/node-SDK/realtime-transcription)
* [Async transcription](/stt/SDKs/node-SDK/async-transcription)
* [Webhooks](/stt/SDKs/node-SDK/webhooks)
* [Files and models](/stt/SDKs/node-SDK/files)
* [Full SDK reference](/stt/SDKs/node-SDK/reference)
## Package links
* [GitHub repository](https://github.com/soniox/soniox-js)
* [NPM package](https://www.npmjs.com/package/@soniox/node)
# Real-time transcription with Node SDK
URL: /stt/SDKs/node-SDK/realtime-transcription
Create and manage real-time speech-to-text sessions with the Soniox Node SDK
Soniox Node SDK supports real-time streaming transcription over WebSocket. This allows you to transcribe live audio with low latency — ideal for voice agents, live captions, and interactive experiences.
You can consume results via events, async iteration, or buffers that group tokens into utterances. SDK provides you helper methods to work both with direct and proxy streaming.
## Direct stream and temporary API keys
Read more about [Direct stream](/stt/guides/direct-stream)
Node SDK provides you a helper method to issue [temporary API Keys](/stt/api-reference/auth/create_temporary_api_key) to use with [Direct stream](/stt/guides/direct-stream) from the client's browser.
```ts
const { api_key, expires_at } = await client.auth.createTemporaryKey({
usage_type: 'transcribe_websocket',
expires_in_seconds: 3600,
client_reference_id: 'support-call-123',
});
console.log(api_key, expires_at);
```
Soniox's [Web Library](/stt/SDKs/web-library) handles everything client-side — capturing microphone input,
managing the WebSocket connection, and authenticating using temporary API keys.
## Proxy stream helpers
Read more about [Proxy stream](/stt/guides/proxy-stream)
Use the SDK's real-time session for low-latency transcription, live captions, and voice agent experiences.
## Create a real-time session
```ts
const session = client.realtime.stt({
model: 'stt-rt-v4',
audio_format: 'pcm_s16le',
sample_rate: 16000,
num_channels: 1,
enable_endpoint_detection: true,
enable_speaker_diarization: true,
language_hints: ['en'],
context: {
text: 'Support call about billing',
terms: ['invoice', 'refund'],
},
});
```
## Connect and stream
Use [`sendAudio`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendaudio) to send audio chunks to the session.
```ts
await session.connect();
session.on('result', (result) => {
process.stdout.write(result.tokens.map(t => t.text).join(''));
});
for await (const chunk of audioStream) {
session.sendAudio(chunk);
}
await session.finish();
```
See the full example with a demo stream in the quickstart: [Create your first real-time session](/stt/SDKs/node-SDK#create-your-first-real-time-session)
## Handle session events
```ts
session.on('connected', () => console.log('connected'));
session.on('disconnected', (reason) => console.log('disconnected:', reason));
session.on('error', (error) => console.error('error:', error));
session.on('result', (result) => console.log(result.tokens.map(t => t.text).join('')));
session.on('endpoint', () => console.log('endpoint'));
session.on('finalized', () => console.log('finalized'));
session.on('finished', () => console.log('finished'));
```
## Session lifecycle
```ts
// Connect to the session
await session.connect(); // idle -> connected
// Send audio chunks to the session
for await (const chunk of audioStream) {
session.sendAudio(chunk);
}
// Gracefully end the session (Signal end of audio and wait for remaining results from the server)
await session.finish();
// Or cancel immediately:
session.close(); // connected -> closed
```
## Endpoint detection and manual finalization
Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.
Read more about [Endpoint detection](/stt/rt/endpoint-detection)
Enable endpoint detection by setting `enable_endpoint_detection: true` in the session configuration.
```ts
const session = client.realtime.stt({
model: 'stt-rt-v4',
enable_endpoint_detection: true,
});
```
Manual finalization gives you precise control over when audio should be finalized — useful for Push-to-talk systems and client-side voice activity detection (VAD).
Read more about [Manual finalization](/stt/rt/manual-finalization)
```ts
session.finalize();
```
## Pause and resume
```ts
session.pause(); // keeps connection alive, drops audio while paused
session.resume(); // resume sending audio
```
You are billed for the full stream duration even when session is paused.
In a typical voice agent loop, you pause the STT session while the agent is responding to avoid transcribing the agent's own audio or processing overlapping speech:
```ts
session.on("endpoint", async () => {
const utterance = utteranceBuffer.markEndpoint(); // Read more about utterance buffer below
if (!utterance) return;
// Pause STT while the agent processes and responds
session.pause();
const response = await myAgent.respond(utterance.text);
// ... send response audio to the client ...
// Resume listening for the next utterance
session.resume();
});
```
SDK will `finalize` audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about [Manual finalization](/stt/rt/manual-finalization#key-points)
## Keepalive
Read more about [Connection keepalive](/stt/rt/connection-keepalive)
Node SDK **automatically sends** keepalive messages when session is paused via [`session.pause()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-pause).
You can also send keepalive messages manually:
```ts
session.sendKeepalive();
```
## Detecting utterance for voice agents
When building voice AI agents, you need to know when the user has finished speaking so you can process their input. The SDK provides [`RealtimeUtteranceBuffer`](/stt/SDKs/node-SDK/reference/classes#realtimeutterancebuffer) to collect streaming tokens into complete utterances, driven by the server's endpoint detection.
### How it works
1. Set `enable_endpoint_detection: true` in the session config – the server detects when the user stops speaking and emits an endpoint event.
2. Feed every result event into the buffer with [`addResult()`](/stt/SDKs/node-SDK/reference/classes#realtimeutterancebuffer-addresult).
3. When an endpoint fires, call [`markEndpoint()`](/stt/SDKs/node-SDK/reference/classes#realtimeutterancebuffer-markendpoint) to flush the buffer and get the complete utterance.
### Example
```ts
import { SonioxNodeClient, RealtimeUtteranceBuffer } from "@soniox/node";
const client = new SonioxNodeClient();
// Call this for each new user/connection - each session needs its own buffer
function createAgentSession(onUtterance: (text: string) => void) {
const session = client.realtime.stt({
model: "stt-rt-v4",
enable_endpoint_detection: true,
});
// Each session gets its own buffer
const utteranceBuffer = new RealtimeUtteranceBuffer({
final_only: true
});
session.on("result", (result) => {
utteranceBuffer.addResult(result);
});
session.on("endpoint", () => {
const utterance = utteranceBuffer.markEndpoint();
if (utterance) {
onUtterance(utterance.text);
}
});
return session;
}
// Usage: create a session per user connection
const session = createAgentSession((text) => {
console.log("User said:", text);
// Pass to your LLM / agent pipeline
});
await session.connect();
session.sendAudio(audioChunk);
```
## Streaming audio from a file
Use [`sendStream()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendstream) to pipe audio directly from a file (or any async source) into a real-time session. It accepts any `AsyncIterable` – Node.js file streams, Web `ReadableStream`, Bun file streams, fetch response bodies, or custom async generators.
### Simulating real-time pace
When streaming pre-recorded files, you can throttle sending with pace\_ms to simulate how audio would arrive from a live source (e.g. a microphone). This isn't needed for live audio – it naturally arrives at real-time pace.
Use [`sendAudio`](/stt/SDKs/node-SDK/realtime-transcription#connect-and-stream) if you need more control.
# Handling webhooks with Node SDK
URL: /stt/SDKs/node-SDK/webhooks
Use webhooks to receive transcription results with the Soniox Node SDK
SDK provides you a helper method to handle [Webhooks](/stt/async/webhooks) from the Soniox API and transform them into a typed object.
## Configure webhook delivery
If webhook is enabled during [transcription creation](/stt/SDKs/node-SDK/async-transcription#quickstart), Soniox will send a POST request to your webhook URL with the transcription result.
```ts
await client.stt.transcribe({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
webhook_url: 'https://your-server.com/webhooks/soniox',
webhook_auth_header_name: 'X-Webhook-Secret',
webhook_auth_header_value: process.env.SONIOX_API_WEBHOOK_SECRET,
});
```
You can also append metadata as query parameters:
```ts
await client.stt.transcribe({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
webhook_url: 'https://your-server.com/webhooks/soniox',
webhook_query: {
request_id: 'abc-123'
},
});
```
Learn more about [testing webhooks locally](/stt/SDKs/node-SDK/webhooks#testing-webhooks-locally).
## Handling webhooks
The SDK provides both framework-agnostic and framework-specific handlers that parse the request body, verify authentication, and return a typed [`WebhookHandlerResultWithFetch`](/stt/SDKs/node-SDK/reference/types#webhookhandlerresultwithfetch).
All handlers return:
* `ok` — whether the webhook was handled successfully
* `status` — HTTP status code to return to Soniox
* `event` — the parsed [`WebhookEvent`](/stt/SDKs/node-SDK/reference/types#webhookevent) (when `ok=true`)
* `error` — error message (when `ok=false`)
* `fetchTranscript()` — lazily fetch the full transcript (when `event.status === 'completed'`)
* `fetchTranscription()` — lazily fetch the transcription object
```ts
import express from 'express';
const app = express();
app.use(express.json());
app.post('/webhooks/soniox', async (req, res) => {
const result = client.webhooks.handleExpress(req);
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript();
console.log(transcript?.text);
}
res.status(result.status).json({ received: true });
});
```
```ts
import Fastify from 'fastify';
const app = Fastify();
app.post('/webhooks/soniox', async (request, reply) => {
const result = client.webhooks.handleFastify(request);
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript();
console.log(transcript?.text);
}
return reply.status(result.status).send({ received: true });
});
```
```ts
import { Hono } from 'hono';
const app = new Hono();
app.post('/webhooks/soniox', async (c) => {
const result = await client.webhooks.handleHono(c);
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript();
console.log(transcript?.text);
}
return c.json({ received: true }, result.status);
});
```
`handleHono` is async because it reads the request body from the Hono context.
```ts
import { Controller, Post, Req, Res } from '@nestjs/common';
import { Request, Response } from 'express';
@Controller('webhooks')
export class WebhooksController {
@Post('soniox')
async handleSoniox(@Req() req: Request, @Res() res: Response) {
const result = client.webhooks.handleNestJS(req);
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript();
console.log(transcript?.text);
}
res.status(result.status).json({ received: true });
}
}
```
Use `handleRequest` with any framework that provides a standard Fetch API `Request` object:
```ts
export default {
async fetch(request: Request) {
if (new URL(request.url).pathname === '/webhooks/soniox') {
const result = await client.webhooks.handleRequest(request);
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript();
console.log(transcript?.text);
}
return Response.json({ received: true }, { status: result.status });
}
return new Response('Not found', { status: 404 });
},
};
```
`handleRequest` is async because it reads the request body from the `Request` object.
The `handle` method is a framework-agnostic handler. You provide the method, headers, and parsed body directly:
```ts
const result = client.webhooks.handle({
method: req.method,
headers: req.headers,
body: req.body,
});
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript();
console.log(transcript?.text);
}
if (result.ok && result.event.status === 'error') {
const transcription = await result.fetchTranscription();
console.log(transcription?.error_message);
}
```
See [`HandleWebhookOptions`](/stt/SDKs/node-SDK/reference/types#handlewebhookoptions) for all available options.
## Webhook auth helpers
By default, webhook handlers read auth from `SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET`. You can override auth explicitly:
```ts
const result = client.webhooks.handleExpress(req, {
name: 'X-Webhook-Secret',
value: process.env.SONIOX_API_WEBHOOK_SECRET,
});
```
Learn more info about [Environment Variables](/stt/SDKs/node-SDK/reference#environment-variables).
But you can also verify the auth manually:
```ts
const auth = client.webhooks.getAuthFromEnv();
if (!auth) {
throw new Error('Missing webhook auth');
}
const isValid = client.webhooks.verifyAuth(req.headers, auth);
```
## Webhook event helpers
```ts
const event = client.webhooks.parseEvent(req.body);
const isEvent = client.webhooks.isEvent(req.body);
```
## Testing webhooks locally
Since Soniox needs to reach your server over the internet, you'll need a tunnel to expose your local development server. You can use [Cloudflare Tunnel](https://developers.cloudflare.com/pages/how-to/preview-with-cloudflare-tunnel/) or [ngrok](https://ngrok.com/).
[Cloudflare Tunnel](https://developers.cloudflare.com/pages/how-to/preview-with-cloudflare-tunnel/) provides a quick way to expose your local server — no account required.
Install `cloudflared` and start a tunnel pointing to your local server:
```bash
# macOS
brew install cloudflared
# Start a tunnel to your local server on port 3000
cloudflared tunnel --url http://localhost:3000
```
The command will output a public URL like `https://random-name.trycloudflare.com`.
[ngrok](https://ngrok.com/) creates a secure tunnel to your local server and provides a stable public URL.
Install ngrok, authenticate, and start a tunnel:
```bash
# macOS
brew install ngrok
# Authenticate (one-time setup)
ngrok config add-authtoken
# Start a tunnel to your local server on port 3000
ngrok http 3000
```
The command will output a public URL like `https://abcd-1234.ngrok-free.app`.
Once you have your public tunnel URL, use it as the `webhook_url` when creating a transcription:
```ts
import express from 'express';
import { SonioxNodeClient } from '@soniox/node';
const client = new SonioxNodeClient();
const app = express();
app.use(express.json());
// Handle incoming webhook events
app.post('/webhooks/soniox', async (req, res) => {
const result = client.webhooks.handleExpress(req);
// You will receive the webhook event when the transcription is completed
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript(); // Lazy fetch the transcript
console.log(transcript?.text);
}
res.status(result.status).json({ received: true });
});
app.listen(3000, () => console.log('Listening on port 3000'));
// Start a transcription with the tunnel URL as webhook
await client.stt.transcribe({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
webhook_url: 'https:///webhooks/soniox',
});
```
# Async transcription with Python SDK
URL: /stt/SDKs/python-SDK/async-transcription
Transcribe audio files asynchronously with the Soniox Python SDK
Soniox Python SDK supports asynchronous transcription for audio files. This allows you to transcribe recordings without maintaining a live connection or streaming pipeline.
You can either wait for completion or create a job and retrieve the results based on the webhook event.
## Quickstart
The SDK provides a convenient `transcribe` method that accepts a local file, public URL, or previously uploaded file ID.
It will upload the file (if provided) and create the transcription job.
Don't forget to remove files and transcriptions from Soniox after you're done with them.
```python
from soniox import SonioxClient
client = SonioxClient()
# Transcribe from a local file
transcription = client.stt.transcribe(
model="stt-async-v4",
file="audio.mp3",
)
# Transcribe from a public URL and fetch the transcript later
transcription = client.stt.transcribe(
model="stt-async-v4",
audio_url="https://soniox.com/media/examples/coffee_shop.mp3",
)
# Transcribe from a previously uploaded file
transcription = client.stt.transcribe(
model="stt-async-v4",
file_id="uploaded-file-id",
)
```
After creating the job, you can poll the status with `client.stt.get` or wait for completion with
`client.stt.wait`. To get the final transcript, call `client.stt.get_transcript`.
```python
# Check status
transcription = client.stt.get("transcription-id")
print(transcription.status)
# Wait for completion
client.stt.wait("transcription-id")
# Fetch transcript
transcript = client.stt.get_transcript("transcription-id")
print(transcript.text)
```
Don't forget to delete files and transcriptions from Soniox after you're done with them.
## Get transcription
You can get a transcription by ID using `get` method.
```python
transcription = client.stt.get("transcription-id")
print(transcription.id, transcription.status)
```
Get a transcription or return `None` if it doesn't exist:
```python
transcription = client.stt.get_or_none("transcription-id")
if transcription is None:
print("Transcription not found")
```
## Get transcription transcript
If you want to receive text or tokens from a transcription, fetch the transcription transcript with `get_transcript`.
```python
transcript = client.stt.get_transcript("transcription-id")
print(transcript.text)
```
## Retrieve list of transcriptions
You can retrieve a list of transcriptions using `list` method.
```python
from soniox import SonioxClient
client = SonioxClient()
response = client.stt.list(limit=100)
for transcription in response.transcriptions:
print(transcription.id, transcription.status)
# Use pagination to list more transcriptions
while response.next_page_cursor:
response = client.stt.list(
limit=100,
cursor=response.next_page_cursor,
)
for transcription in response.transcriptions:
print(transcription.id, transcription.status)
```
## Delete or destroy transcription
You can delete or destroy a transcription using `delete` or `destroy` method.
**Delete transcription only:**
```python
client.stt.delete("transcription-id")
```
**Delete a transcription only if it exists:**
```python
client.stt.delete_if_exists("transcription-id")
```
**Delete transcription and its file if it was uploaded:**
```python
client.stt.destroy("transcription-id")
```
## Delete all transcriptions and files from your account
You have limited space for files and transcriptions, see: [Limits and quotas](https://soniox.com/docs/stt/async/limits-and-quotas).
These operations are irreversible and cannot be undone.
### Delete all transcriptions
You can delete all transcriptions using `transcriptions.delete_all`.
```python
client.stt.delete_all()
```
### Delete all files
You can delete all files using `files.delete_all`.
```python
client.files.delete_all()
```
### Delete all transcriptions and its files
You can delete all transcriptions and its files (if it exist) using `files.destroy_all`.
```python
client.files.destroy_all()
```
# Handling files with Python SDK
URL: /stt/SDKs/python-SDK/files
Upload audio files and manage them with the Soniox Python SDK
Use the Files API to upload audio for async transcription or to reuse files across multiple jobs.
## Upload
`upload()` accepts `bytes`, file paths (`str` or `Path`), or a file-like object (`BinaryIO`).
```python
from soniox import SonioxClient
client = SonioxClient()
file = client.files.upload("audio.mp3")
print(file.id, file.filename, file.size)
```
Read more about [Supported audio formats](/stt/async/async-transcription#audio-formats).
## Get file
Get a file by ID, throws `SonioxNotFoundError` if file does not exist:
```python
file = client.files.get("file-id")
print(file.id, file.filename)
```
Get file or none:
```python
file = client.files.get_or_none("file-id")
if file is None:
print("File not found")
```
## List files
List files returns a paginated response. Use `next_page_cursor` to fetch additional pages until it is `None`.
```python
from soniox import SonioxClient
client = SonioxClient()
response = client.files.list(limit=100)
for file in response.files:
print(file.id, file.filename)
# Pagination
while response.next_page_cursor:
response = client.files.list(limit=100, cursor=response.next_page_cursor)
for file in response.files:
print(file.id, file.filename)
```
## Delete file
Delete file by ID, throws `SonioxNotFoundError` if file does not exist:
```python
client.files.delete("file-id")
```
Delete file only if exists:
```python
client.files.delete_if_exists("file-id")
```
## Delete all files
Delete all files iterates through every page and removes each file.
```python
client.files.delete_all()
```
# Python SDK
URL: /stt/SDKs/python-SDK
Python SDK for Soniox REST and realtime APIs.
Soniox [Python SDK](https://github.com/soniox/soniox-python) gives you fully typed access to our Async and Real-time Speech-to-Text APIs.
## Quickstart
### Install
```bash
pip install soniox
```
### Set your API key
```bash
export SONIOX_API_KEY=
```
Create a [Soniox account](https://console.soniox.com/signup) and log in to
the [Console](https://console.soniox.com) to get your API key.
### Create your first real-time session
```python
from soniox import SonioxClient
from soniox.types import RealtimeSTTConfig, Token
from soniox.utils import render_tokens, throttle_audio, start_audio_thread
# Grab the demo file from:
# https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3
AUDIO_FILE = "coffee_shop.mp3"
client = SonioxClient()
config = RealtimeSTTConfig(model="stt-rt-v4", audio_format="mp3")
final_tokens: list[Token] = []
non_final_tokens: list[Token] = []
def realtime():
# Create new real-time websocket session
with client.realtime.stt.connect(config=config) as session:
# Stream audio to websocket
start_audio_thread(session, throttle_audio(AUDIO_FILE, delay_seconds=0.1))
# Receive events from Soniox Real-time STT
for event in session.receive_events():
for token in event.tokens:
if token.is_final:
final_tokens.append(token)
else:
non_final_tokens.append(token)
print(render_tokens(final_tokens, non_final_tokens))
non_final_tokens.clear()
realtime()
```
Learn more about [Real-time transcription](/stt/SDKs/python-SDK/realtime-transcription).
### Create your first async transcription
```python
from soniox import SonioxClient
client = SonioxClient()
# Create new transcription from `audio_url`
transcription = client.stt.transcribe(
audio_url="https://soniox.com/media/examples/coffee_shop.mp3",
)
# Wait until transcription processing is finished
client.stt.wait(transcription.id)
# Get transcription transcript and print it
transcript = client.stt.get_transcript(transcription.id)
print(transcript.text)
```
Learn more about [Async transcription](/stt/SDKs/python-SDK/async-transcription).
## Next steps
* [Real-time streaming](/stt/SDKs/python-SDK/realtime-transcription)
* [Async transcription](/stt/SDKs/python-SDK/async-transcription)
* [Webhooks](/stt/SDKs/python-SDK/webhooks)
* [Files and models](/stt/SDKs/python-SDK/files)
* [Sync vs async clients](/stt/SDKs/python-SDK/sync-vs-async-clients)
* [Full SDK reference](/stt/SDKs/python-SDK/Full-SDK-reference/__init__)
## Package links
* [GitHub repository](https://github.com/soniox/soniox-python)
* [PyPi package](https://pypi.org/project/soniox/)
# Real-time transcription with Python SDK
URL: /stt/SDKs/python-SDK/realtime-transcription
Create and connect to Soniox real-time speech-to-text sessions with the Python SDK
Soniox Python SDK supports transcribing audio in real-time with **low latency** and **high accuracy**.
This makes it ideal for voice assistants, live captions, and conversational AI.
## Connect to a real-time session
Example below streams audio from live radio to the Soniox real-time API. If you want to stream from a file instead, see: [Create your first real-time session](/stt/SDKs/python-SDK#create-your-first-real-time-session).
```python
from typing import Iterator
from soniox import SonioxClient
from soniox.types import (
RealtimeSTTConfig,
Token,
StructuredContext,
StructuredContextGeneralItem,
)
from soniox.utils import render_tokens, start_audio_thread
import httpx
AUDIO_URL = "https://npr-ice.streamguys1.com/live.mp3?ck=1742897559135"
# Fetch audio from a live radio stream and yield it in chunks.
def stream_audio_from_url(audio_url) -> Iterator[bytes]:
with httpx.Client() as client:
with client.stream("GET", audio_url) as response:
response.raise_for_status()
for chunk in response.iter_bytes(4096):
if chunk:
yield chunk
client = SonioxClient()
# Create config, see below for all parameters
config = RealtimeSTTConfig(
model="stt-rt-v4",
audio_format="mp3",
enable_endpoint_detection=True,
enable_speaker_diarization=True,
language_hints=["en"],
context=StructuredContext(
general=[StructuredContextGeneralItem(key="domain", value="live radio / news broadcast")],
text="Live NPR news and talk radio stream, including interviews, music, and commentary.",
terms=["NPR", "news", "interview", "music", "commentary", "report", "broadcast", "anchor"],
),
)
final_tokens: list[Token] = []
non_final_tokens: list[Token] = []
def realtime():
# Create new real-time websocket session
with client.realtime.stt.connect(config=config) as session:
# Stream audio from live radio to websocket
start_audio_thread(session, stream_audio_from_url(AUDIO_URL))
# Receive events from Soniox Real-time STT
for event in session.receive_events():
for token in event.tokens:
if token.is_final:
final_tokens.append(token)
else:
non_final_tokens.append(token)
print(render_tokens(final_tokens, non_final_tokens))
non_final_tokens.clear()
realtime()
```
For config options see: [WebSocket API](/stt/api-reference/websocket-api#configuration) or [RealtimeSTTConfig reference](/stt/SDKs/python-SDK/Full-SDK-reference/types/types_realtime#class-realtimesttconfig).
## Endpoint detection
Endpoint detection lets you know when a speaker has finished speaking.
This is critical for real-time voice AI assistants, command-and-response systems,
and conversational apps where you want to respond immediately without waiting for long silences.
Read more about [Endpoint detection](/stt/rt/endpoint-detection)
Enable endpoint detection by setting `enable_endpoint_detection=True` in the session config.
You will receive special token `` when speech ends.
```python
# Enable endpoint detection
config = RealtimeSTTConfig(
enable_endpoint_detection=True,
...
)
# When receiving events, check for special token
for event in session.receive_events():
for token in event.tokens:
if token.text == "":
print("Endpoint detected")
```
## Manual finalization
Manual finalization gives you precise control over when audio should be finalized. When you know the user stopped talking (push-to-talk or client-side VAD), call `finalize` to mark all outstanding tokens as final.
Read more about [Manual finalization](/stt/rt/manual-finalization)
```python
# Finalize current buffered audio without closing the session.
session.finalize()
```
## Pause and resume
```python
session.pause(); // keeps connection alive, drops audio while paused
session.resume(); // resume sending audio
```
You are billed for the full stream duration even when session is paused.
## Keepalive
Soniox terminates your session if no audio arrives for \~20 seconds. To keep the connection alive, send a keepalive control message or run a background keepalive loop.
Python SDK **automatically sends** keepalive messages when session is paused via `session.pause()`.
```python
# Sends keepalive message manually
session.keep_alive()
```
Read more about [Connection keepalive](/stt/rt/connection-keepalive)
## Streaming audio from a file
Use `stream_audio()` with `start_audio_thread()` to stream from a file while receiving events.
If you are streaming live audio (microphone, client stream, etc.), you can feed raw chunks without throttling.
If you are streaming a prerecorded file, throttle chunks to simulate real-time delivery.
```python
from soniox.utils import stream_audio, start_audio_thread, throttle_audio
...
with client.realtime.stt.connect(config=config) as session:
# Start streaming audio on a background thread.
start_audio_thread(session, stream_audio("audio.wav"))
# Or throttle local audio file to simulate streaming (sends chunk every 100 ms)
start_audio_thread(session, throttle_audio("audio.wav", delay_seconds=0.1))
...
```
Use
[`send_bytes`](/stt/SDKs/python-SDK/Full-SDK-reference/realtime/__init__#send_bytes)
if you need more control
## Direct stream and proxy stream
Read more about [Direct stream](/stt/guides/direct-stream) and [Proxy
stream](/stt/guides/proxy-stream).
For direct streaming from a client, issue a temporary API key and pass it to the browser or device that will open the WebSocket connection:
```python
from soniox import SonioxClient
client = SonioxClient()
key = client.auth.create_temporary_api_key(
expires_in_seconds=3600,
client_reference_id="support-call-123",
)
print(key.api_key, key.expires_at)
```
For proxy streaming, keep the WebSocket connection on your server and stream audio through your backend.
# Sync vs async clients
URL: /stt/SDKs/python-SDK/sync-vs-async-clients
Choose between SonioxClient and AsyncSonioxClient based on your app.
Soniox Python SDK provides two clients.
* `SonioxClient` for synchronous code (scripts, CLIs, simple services)
* `AsyncSonioxClient` for asyncio apps (FastAPI, aiohttp, background workers)
The `files`, `models`, `auth`, `webhooks`, `transcriptions`, and `realtime` APIs accept the same parameters and return the same types in both clients.
## Sync client (`SonioxClient`)
```python
from soniox import SonioxClient
client = SonioxClient()
file = client.files.upload("audio.mp3")
transcription = client.stt.transcribe(
audio_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
print(file.id, transcription.id)
```
## Async client (`AsyncSonioxClient`)
```python
import asyncio
from soniox import AsyncSonioxClient
async def main():
client = AsyncSonioxClient()
file = await client.files.upload("audio.mp3")
transcription = await client.stt.transcribe(
audio_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
print(file.id, transcription.id)
asyncio.run(main())
```
# Handling webhooks with Python SDK
URL: /stt/SDKs/python-SDK/webhooks
Use webhooks to receive transcription results with the Soniox Python SDK
Python SDK provides helper functions to work with [Webhooks](/stt/async/webhooks).
## Configure webhook delivery
If webhook is enabled during transcription creation, Soniox will send a POST request to your webhook URL with the [transcription status result](/stt/async/webhooks#example).
```python
from soniox import SonioxClient
from soniox.types import CreateTranscriptionConfig
client = SonioxClient()
config = CreateTranscriptionConfig(
webhook_url="https://your-server.com/webhooks/soniox",
webhook_auth_header_name="X-Webhook-Secret",
webhook_auth_header_value="your-secret",
)
transcription = client.stt.transcribe(
audio_url="https://soniox.com/media/examples/coffee_shop.mp3",
config=config,
)
```
For `transcribe`, you must pass `webhook_auth_header_name` and `webhook_auth_header_value` explicitly in the config.
Environment variables (`SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET`) are only used by webhook helpers
and verification (see below).
You can also append additional metadata as query parameters:
```python
config = CreateTranscriptionConfig(
webhook_url="https://your-server.com/webhooks/soniox?request_id=abc-123",
)
```
If you are uploading a local file, you can also use the convenience helper (reads webhook secret and header
automatically from environment if present):
```python
from soniox import SonioxClient
from soniox.types import WebhookAuthConfig
client = SonioxClient()
transcription = client.stt.transcribe_file_with_webhook(
model="stt-async-v4",
file="audio.mp3",
webhook_url="https://your-server.com/webhooks/soniox",
)
```
## Example (FastAPI + ngrok)
Expose your local server (for example with [ngrok](https://ngrok.com/)), then create a transcription that points to the public ngrok URL and verify the webhook payload on your FastAPI server:
```python
from fastapi import FastAPI, Request
from soniox import SonioxClient
from soniox.errors import InvalidWebhookSignatureError
from soniox.types import CreateTranscriptionConfig, WebhookAuthConfig
app = FastAPI()
client = SonioxClient()
# Replace with your public ngrok URL.
NGROK_URL = "https://your-subdomain.ngrok-free.app"
WEBHOOK_SECRET_NAME = "X-Webhook-Secret"
WEBHOOK_SECRET_VALUE = "your-secret"
# When creating transcript you must provide correct webhook secret name and value
# config = CreateTranscriptionConfig(
# webhook_url=f"{NGROK_URL}/webhooks/soniox",
# webhook_auth_header_name=WEBHOOK_SECRET_NAME,
# webhook_auth_header_value=WEBHOOK_SECRET_VALUE,
# )
# client.stt.transcribe(
# model="stt-async-v4",
# audio_url="https://soniox.com/media/examples/coffee_shop.mp3",
# config=config,
# )
@app.post("/webhooks/soniox")
async def soniox_webhook(request: Request):
payload = await request.body()
headers = dict(request.headers)
try:
event = client.webhooks.unwrap(
payload,
headers,
# This can be omitted if you have set env variables SONOIX_API_WEBHOOK_SECRET and SONIOX_API_WEBHOOK_HEADER
auth=WebhookAuthConfig(
name=WEBHOOK_SECRET_NAME,
value=WEBHOOK_SECRET_VALUE,
),
)
except InvalidWebhookSignatureError:
print("InvalidWebhookSignatureError")
return
if event.status == "completed":
transcript = client.stt.get_transcript(event.id)
print(transcript.text)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8080)
```
## Webhook verification
Verify webhook signatures to ensure the request really came from Soniox (and not a third party posting to your endpoint).
You can verify signatures manually:
```python
from soniox import SonioxClient
from soniox.types import WebhookAuthConfig
client = SonioxClient()
client.webhooks.verify_signature(
headers={"X-Webhook-Secret": "your-secret"},
)
```
Or rely on `unwrap` to validate and parse in one step:
```python
from soniox import SonioxClient
from soniox.types import WebhookAuthConfig
client = SonioxClient()
event = client.webhooks.unwrap(
payload=request_body,
headers={"X-Webhook-Secret": "your-secret"},
)
print(event.id, event.status)
```
If you prefer, you can also use `client.stt.transcribe_file_with_webhook` and `client.webhooks`
with `SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET` set in your environment.
# React SDK
URL: /stt/SDKs/react-SDK
Build speech-to-text workflows in React with real-time API.
import { LinkCards } from "@/components/link-card";
Soniox [React SDK](https://www.npmjs.com/package/@soniox/react) provides React hooks and components for real-time speech-to-text, built on top of the [Web SDK](/stt/SDKs/web-SDK).
It lets you:
* Capture audio from the user's microphone with a single hook
* Stream audio to Soniox in real time
* Receive transcription and translation results as reactive state
## Quickstart
### Install
Install via your preferred package manager:
```bash tab
npm install @soniox/react
```
```bash tab
yarn add @soniox/react
```
```bash tab
pnpm add @soniox/react
```
```bash tab
bun add @soniox/react
```
### Setup you temporary API key endpoint
In client enviroment (browser, mobile app, React Native, etc.), you don't want to expose your API key to the client.
For this reason, you can create a temporary API key endpoint on your server and use it to issue temporary API keys for the client.
For example, you can use our [Node SDK](/stt/SDKs/node-SDK) to create a temporary API key endpoint.
```ts
import express from 'express';
import { SonioxNodeClient } from '@soniox/node';
const app = express();
const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env
// Create a temporary API key endpoint
app.get('/tmp-key', async (_req, res) => {
try {
const { api_key, expires_at } = await client.auth.createTemporaryKey({
usage_type: 'transcribe_websocket',
expires_in_seconds: 300, // 1..3600
});
res.json({ api_key, expires_at });
} catch (err) {
res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' });
}
});
app.listen(3000, () => {
console.log('Server listening on http://localhost:3000');
});
```
Read more about our [Node SDK](/stt/SDKs/node-SDK) and [Temporary API keys](/stt/api-reference/auth/create_temporary_api_key)
### Create your first real-time session
```ts
import { SonioxProvider, useRecording } from "@soniox/react";
// Create a temporary API key endpoint on your server and use it to issue temporary API keys for the client
async function getAPIKey() {
const res = await fetch("/tmp-key", { method: "POST" });
const { api_key } = await res.json();
return api_key;
}
function App() {
return (
// Wrap your app with a SonioxProvider and pass the temporary API key getter function
);
}
function Transcription() {
// Create a recording session
const { state, finalText, partialText, start, stop } = useRecording({
model: "stt-rt-v4",
});
return (
{finalText}{partialText}
State: {state}
{state === "recording" || state === "connecting" || state === "starting" ? (
void stop()}>Stop
) : (
Start
)}
);
}
```
Learn more about [Real-time transcription](/stt/SDKs/react-SDK/realtime-transcription)
## Next steps
* [Real-time transcription](/stt/SDKs/react-SDK/realtime-transcription)
* [Full SDK reference](/stt/SDKs/react-SDK/reference)
## Package links
* [GitHub repository](https://github.com/soniox/soniox-js)
* [NPM package](https://www.npmjs.com/package/@soniox/react)
# Real-time transcription with React SDK
URL: /stt/SDKs/react-SDK/realtime-transcription
Create and manage real-time speech-to-text sessions with the Soniox React SDK
import { LinkCards } from "@/components/link-card";
Soniox React SDK supports real-time transcription via React hooks, built on top of the [@soniox/client](/stt/SDKs/web-SDK) Web SDK.
This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences.
You can capture audio from the user's microphone, receive transcription results as reactive state, and control sessions with simple `start`/`stop` calls.
## Soniox Provider
[`SonioxProvider`](/stt/SDKs/react-SDK/reference/types#sonioxprovider) creates and shares a single [`SonioxClient`](/stt/SDKs/web-SDK/reference/classes#sonioxclient) instance via React context. Place it near the root of your component tree.
### With configuration props
```tsx
import { SonioxProvider } from "@soniox/react";
function App() {
return (
{
const res = await fetch("/api/get-temporary-key", { method: "POST" });
return (await res.json()).api_key;
}}
>
{children}
);
}
```
### With a pre-built client
```tsx
import { SonioxClient } from "@soniox/client";
import { SonioxProvider } from "@soniox/react";
const client = new SonioxClient({
api_key: async () => fetchKey(),
});
function App() {
return {children};
}
```
## `useRecording`
`useRecording` is the primary hook for real-time speech-to-text. Returns reactive transcript state
and control methods. Returns [`UseRecordingReturn`](/stt/SDKs/react-SDK/reference/types#userecordingreturn) which contains reactive state and
control methods.
```tsx
function Transcriber() {
const recording = useRecording({
model: "stt-rt-v4",
language_hints: ["en", "es"],
enable_endpoint_detection: true,
});
return (
State: {recording.state}
{recording.text}
Start
Stop
);
}
```
### Handle session events
| Callback | Signature | Description |
| ----------------- | -------------------------------------------- | ------------------------------------------------- |
| `onResult` | `(result: RealtimeResult) => void` | Called on each result from the server. |
| `onEndpoint` | `() => void` | Called when an endpoint is detected. |
| `onError` | `(error: Error) => void` | Called when an error occurs. |
| `onStateChange` | `(update: { old_state, new_state }) => void` | Called on each state transition. |
| `onFinished` | `() => void` | Called when the recording session finishes. |
| `onConnected` | `() => void` | Called when the WebSocket connects. |
| `onSourceMuted` | `() => void` | Called when the audio source is muted externally. |
| `onSourceUnmuted` | `() => void` | Called when the audio source is unmuted. |
### Session lifecycle
#### Recording state
| Field | Type | Description |
| --------------- | ---------------- | -------------------------------------------------------------------- |
| `state` | `RecordingState` | Current lifecycle state (`'idle'`, `'recording'`, `'paused'`, etc.). |
| `isActive` | `boolean` | `true` when state is not `idle`/`stopped`/`canceled`/`error`. |
| `isRecording` | `boolean` | `true` when `state === 'recording'`. |
| `isPaused` | `boolean` | `true` when `state === 'paused'`. |
| `isSourceMuted` | `boolean` | `true` when the audio source is muted externally. |
#### Available methods
| Method | Signature | Description |
| ----------------- | --------------------- | ------------------------------------------------------------------------ |
| `start` | `() => void` | Start a new recording. Aborts any in-flight recording first. |
| `stop` | `() => Promise` | Gracefully stop — waits for final results from the server. |
| `cancel` | `() => void` | Immediately cancel — does not wait for final results. |
| `pause` | `() => void` | Pause audio capture (keepalive keeps connection open). |
| `resume` | `() => void` | Resume after pause. |
| `finalize` | `(options?) => void` | Request the server to finalize current non-final tokens. |
| `clearTranscript` | `() => void` | Clear transcript state (`finalText`, `partialText`, `utterances`, etc.). |
### Endpoint detection and manual finalization
Endpoint detection lets you know when a speaker has finished speaking.
This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.
Read more about [Endpoint detection](/stt/rt/endpoint-detection)
Enable endpoint detection by setting `enable_endpoint_detection: true` in the hook configuration.
Use the `onEndpoint` callback to know when a speaker has finished speaking.
```tsx
const { start, stop, text } = useRecording({
apiKey: "",
model: "stt-rt-v4",
enable_endpoint_detection: true,
onEndpoint: () => { console.log("--- speaker finished ---"); },
});
```
Manual finalization gives you precise control over when audio should be finalized — useful for push-to-talk systems and client-side voice activity detection (VAD).
Read more about [Manual finalization](/stt/rt/manual-finalization)
The finalize function is returned by useRecording and can be called at any time during an active recording:
```tsx
const { start, stop, finalize } = useRecording({});
// Later, when you want to force finalization:finalize();
```
### Pause, resume and muting audio source
The pause and resume functions are returned by `useRecording`. The `isPaused` flag reflects the current pause state reactively.
```tsx
const { start, stop, pause, resume, isPaused } = useRecording({});
pause(); // keeps connection alive, drops audio while pausedresume(); // resume sending audio
```
SDK will `finalize` audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about [Manual finalization](/stt/rt/manual-finalization#key-points)
The hook also tracks system-level mute events via `isSourceMuted`.
When the audio source is muted externally (e.g. OS-level or hardware mute), keepalive messages are sent automatically to keep the session alive.
You can listen for mute state changes with the `onSourceMuted` and `onSourceUnmuted` callbacks.
```tsx
const { isSourceMuted } = useRecording({
onSourceMuted: () => {
console.log("Microphone muted externally");
},
onSourceUnmuted: () => {
console.log("Microphone unmuted");
},
});
```
You are billed for the full stream duration even when the session is paused.
### Handling translation
The React SDK supports one-way and two-way real-time translation. Configure translation in the useRecording hook config.
The hook automatically groups tokens by translation status or language via the groups snapshot field, so you can render original and translated text separately without manual filtering.
#### One-way translation
Translates all spoken audio into a single target language.
When translation is provided with type: `one_way`, the hook automatically sets `groupBy: 'translation'`, splitting tokens into `original` and `translation` groups.
```tsx
const { groups } = useRecording({
apiKey: "",
model: "stt-rt-preview",
translation: {
type: "one_way",
target_language: "es", // Translate everything to Spanish
},
});
// Render grouped text
return (
Original: {groups.original?.text}
Translated: {groups.translation?.text}
);
```
#### Two-way translation
Translates between two languages — each speaker's speech is translated into the other language.
When translation is provided with type: `two_way`, the hook automatically sets `groupBy: 'language'`, splitting tokens by language code (e.g. `en`, `fr`).
```tsx
const { groups } = useRecording({
apiKey: "",
model: "stt-rt-preview",
translation: {
type: "two_way",
language_a: "en",
language_b: "fr",
},
});
// Render grouped text by language
return (
English: {groups.en?.text}
French: {groups.fr?.text}
);
```
Learn more about [Real-time translation](/stt/rt/real-time-translation)
### Utterances
When `enable_endpoint_detection` is enabled, the `utterances` array accumulates utterances separated by natural pauses:
```tsx
function TranscriptWithUtterances() {
const { utterances, partialText, start, stop, isActive } = useRecording({
model: "stt-rt-v4",
enable_endpoint_detection: true,
});
return (
);
}
```
Learn more about [Endpoint detection](/stt/rt/endpoint-detection)
### Token grouping
The `groupBy` option splits tokens into named groups, accessible via
`recording.groups`. This is particularly useful for translation and
multi-speaker scenarios.
#### `groupBy` strategies
| Value | Keys | Description |
| ------------------- | ------------------------------------ | -------------------------------- |
| `'translation'` | `"original"`, `"translation"` | Group by `translation_status`. |
| `'language'` | Language codes (e.g. `"en"`, `"fr"`) | Group by token `language` field. |
| `'speaker'` | Speaker IDs (e.g. `"1"`) | Group by token `speaker` field. |
| `(token) => string` | Custom keys | Custom grouping function. |
Learn more about [Speaker diarization](/stt/concepts/speaker-diarization)
#### TokenGroup fields
Each group in `recording.groups` contains:
| Field | Type | Description |
| --------------- | ----------------- | ------------------------------------------------------- |
| `text` | `string` | Full text: `finalText + partialText`. |
| `finalText` | `string` | Accumulated finalized text in this group. |
| `partialText` | `string` | Text from current non-final tokens. |
| `partialTokens` | `RealtimeToken[]` | Current non-final tokens (from the latest result only). |
#### Automatic grouping for translation
When a `translation` config is provided, `groupBy` is set automatically:
* `one_way` translation → groups by `'translation'` (keys: `"original"`, `"translation"`)
* `two_way` translation → groups by `'language'` (keys: language codes like `"en"`, `"es"`)
```tsx
function TranslatedTranscript() {
const { groups, start, stop, isActive } = useRecording({
model: "stt-rt-v4",
translation: { type: "one_way", target_language: "es" },
});
return (
{isActive ? "Stop" : "Start"}
Original
{groups.original?.text}
Translation
{groups.translation?.text}
);
}
```
## `useSoniox`
Returns the `SonioxClient` instance from the nearest `SonioxProvider`. Useful for low-level session access.
```tsx
import { useSoniox } from "@soniox/react";
function MyComponent() {
const client = useSoniox();
// Use client.realtime.stt() for low-level session access
// Use client.permissions for permission checks
}
```
## `useMicrophonePermission`
Hook for checking and requesting microphone permission before recording.
Requires a `SonioxProvider` with a permission resolver configured (default in browsers).
```tsx
import { useMicrophonePermission } from "@soniox/react";
function PermissionGate({ children }) {
const mic = useMicrophonePermission({ autoCheck: true });
if (!mic.isSupported) {
return
Microphone permissions are not available.
;
}
if (mic.status === "unknown") {
return
Checking permission...
;
}
if (mic.isDenied) {
return (
Microphone access denied.
{!mic.canRequest && (
Please enable microphone access in your browser settings.
)}
);
}
if (mic.status === "prompt") {
return (
Allow microphone access
);
}
return children;
}
```
### Options
| Option | Type | Default | Description |
| ----------- | --------- | ------- | ---------------------------------------- |
| `autoCheck` | `boolean` | `false` | Automatically check permission on mount. |
### Return value
| Field | Type | Description |
| ------------- | --------------------- | ------------------------------------------------------------------------------------------------------ |
| `status` | `MicPermissionStatus` | Current status: `'granted'`, `'denied'`, `'prompt'`, `'unavailable'`, `'unsupported'`, or `'unknown'`. |
| `canRequest` | `boolean` | Whether the user can be prompted again. `false` when permanently denied. |
| `isGranted` | `boolean` | `status === 'granted'`. |
| `isDenied` | `boolean` | `status === 'denied'`. |
| `isSupported` | `boolean` | Whether permission checking is available. |
| `check` | `() => Promise` | Check (or re-check) the microphone permission. No-op when unsupported. |
### Status values
| Status | Description |
| --------------- | --------------------------------------------------- |
| `'granted'` | Microphone access is granted. |
| `'denied'` | Microphone access is denied. |
| `'prompt'` | User hasn't been asked yet. |
| `'unavailable'` | Permissions API not available in this browser. |
| `'unsupported'` | No `PermissionResolver` configured in the provider. |
| `'unknown'` | Initial state before the first `check()` call. |
## `useAudioLevel`
Hook for real-time audio volume metering. Useful for building recording indicators and animations.
```tsx
import { useAudioLevel } from "@soniox/react";
function VolumeIndicator({ isActive }) {
const { volume } = useAudioLevel({ active: isActive }); // float value between 0 and 1
return (
);
}
```
## Next.js (App Router)
The package declares `'use client'` at the entry point. All hooks must be used
inside Client Components. Server Components cannot use `useRecording` or other
hooks directly.
# Web SDK
URL: /stt/SDKs/web-SDK
Build speech-to-text workflows in browser with real-time API.
import { LinkCards } from "@/components/link-card";
Soniox [Web SDK](https://www.npmjs.com/package/@soniox/client) is the official JavaScript/TypeScript SDK for using the Soniox [Real-time API](/stt/api-reference/websocket-api) directly in the browser.
It lets you:
* Capture audio from the user's microphone
* Stream audio to Soniox in real time
* Receive transcription and translation results instantly
## Quickstart
### Install
Install via your preferred package manager:
```bash tab
npm install @soniox/client
```
```bash tab
yarn add @soniox/client
```
```bash tab
pnpm add @soniox/client
```
```bash tab
bun add @soniox/client
```
### Set up your temporary API key endpoint
In client enviroment (browser, mobile app, React Native, etc.), you don't want to expose your API key to the client.
For this reason, you can create a temporary API key endpoint on your server and use it to issue temporary API keys for the client.
For example, you can use our [Node SDK](/stt/SDKs/node-SDK) to create a temporary API key endpoint.
```ts
import express from 'express';
import { SonioxNodeClient } from '@soniox/node';
const app = express();
const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env
// Create a temporary API key endpoint
app.get('/tmp-key', async (_req, res) => {
try {
const { api_key, expires_at } = await client.auth.createTemporaryKey({
usage_type: 'transcribe_websocket',
expires_in_seconds: 300, // 1..3600
});
res.json({ api_key, expires_at });
} catch (err) {
res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' });
}
});
app.listen(3000, () => {
console.log('Server listening on http://localhost:3000');
});
```
Read more about our [Node SDK](/stt/SDKs/node-SDK) and [Temporary API keys](/stt/api-reference/auth/create_temporary_api_key)
### Create your first real-time session
```ts
import { SonioxClient } from "@soniox/client";
// Create a Soniox client
const client = new SonioxClient({
// Pass a function that fetches a temporary API key from your server
api_key: async () => {
const res = await fetch("/tmp-key", { method: "POST" });
const { api_key } = await res.json();
return api_key;
},
});
// Create a recording session
const recording = client.realtime.record({ model: "stt-rt-v4" });
// Listen for transcription results
recording.on("result", (result) => {
const text = result.tokens.map((t) => t.text).join("");
if (text) console.log(text);
});
// Listen for errors
recording.on("error", (err) => console.error("Error:", err));
// Later, stop gracefully (waits for final results):
// await recording.stop();
```
Learn more about [Real-time transcription](/stt/SDKs/web-SDK/realtime-transcription)
## Next steps
* [Real-time transcription](/stt/SDKs/web-SDK/realtime-transcription)
* [Full SDK reference](/stt/SDKs/web-SDK/reference)
## Package links
* [GitHub repository](https://github.com/soniox/soniox-js)
* [NPM package](https://www.npmjs.com/package/@soniox/client)
# Real-time transcription with Web SDK
URL: /stt/SDKs/web-SDK/realtime-transcription
Create and manage real-time speech-to-text sessions with the Soniox Web SDK
Soniox Web SDK supports real-time transcription over WebSocket directly in the browser.
This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences.
You can capture audio from the user's microphone, consume results via events or buffers that group tokens into utterances, and manage sessions with built-in connection handling.
## Create a real-time recording session
`client.realtime.record()` is the high-level API for capturing audio and streaming it to Soniox for real-time transcription.
It returns a [`Recording`](/stt/SDKs/web-SDK/reference/classes#recording) instance synchronously so you can attach event listeners before any async work
(microphone access, API key fetch, WebSocket connection) begins.
```typescript
const recording = client.realtime.record({
// speech-to-text model to use
model: "stt-rt-v4",
// Optional: hint expected languages
language_hints: ["en", "es"],
// Optional: enable speaker identification
enable_speaker_diarization: true,
// Optional: detect utterance boundaries (useful for voice agents)
enable_endpoint_detection: true,
// Optional: provide domain context to improve accuracy
context: {
terms: ["Soniox", "WebSocket"],
general: [{ key: "domain", value: "technology" }],
},
// ... other options ...
});
```
### Listen for results
The `result` event fires every time the server returns a transcription update.
Each `RealtimeResult` contains an array of `RealtimeToken` objects — both
finalized and in-progress tokens.
```typescript
recording.on("result", (result) => {
const text = result.tokens.map((t) => t.text).join("");
if (text) console.log(text);
});
```
## Handle session events
| Event | Payload | Description |
| ---------------- | -------------------------- | ------------------------------------------------------------------- |
| `result` | `RealtimeResult` | Transcription result received from the server. |
| `error` | `Error` | An error occurred during recording. |
| `endpoint` | — | Endpoint detected (speaker finished talking). |
| `finalized` | — | Server completed finalization of current tokens. |
| `finished` | — | Server acknowledged end of stream. Fires before `stopped` state. |
| `connected` | — | WebSocket connected and streaming. |
| `state_change` | `{ old_state, new_state }` | Recording state transition. |
| `source_muted` | — | Audio source was muted externally (e.g. OS-level or hardware mute). |
| `source_unmuted` | — | Audio source was unmuted after an external mute. |
## Session lifecycle
A `Recording` transitions through a set of states. The lifecycle is fully managed — audio buffering during connection, keepalive during pause, and cleanup on stop or error are all handled automatically.
### States
| State | Description |
| ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `idle` | Initial state before any work begins. |
| `starting` | Audio source is starting, API key is being fetched. Audio is buffered. |
| `connecting` | WebSocket connection is being established. |
| `recording` | Actively capturing and streaming audio. |
| `paused` | Audio capture and streaming paused. Keepalive messages maintain the connection. **You are still charged for the open session even when it is paused.** |
| `stopping` | `stop()` called. Waiting for the server to finish processing remaining audio. |
| `stopped` | Gracefully stopped. All final results have been received. |
| `error` | An error occurred. Resources have been cleaned up. |
| `canceled` | Canceled via `cancel()` or `AbortSignal`. |
### Methods
#### `stop(): Promise`
Gracefully stops the recording. Stops the audio source and waits for the server
to process all remaining audio and return final results.
```typescript
await recording.stop();
// All final results have been received at this point
```
#### `cancel(): void`
Immediately cancels the recording without waiting for final results. Closes the
WebSocket connection and releases all resources.
```typescript
recording.cancel();
```
#### `pause(): void`
Pauses audio capture and streaming. The WebSocket connection stays open with
automatic keepalive messages.
```typescript
recording.pause();
console.log(recording.state); // 'paused'
```
You are charged for the full stream duration even when session is paused.
#### `resume(): void`
Resumes audio capture and streaming after a pause.
```typescript
recording.resume();
console.log(recording.state); // 'recording'
```
#### `finalize(options?): void`
Requests the server to finalize current non-final tokens. Useful for forcing
finalization at a specific point (e.g. before displaying a completed sentence).
```typescript
recording.finalize();
// With trailing silence trimming:
recording.finalize({ trailing_silence_ms: 500 });
```
### Tracking state changes
```typescript
recording.on("state_change", ({ old_state, new_state }) => {
console.log(`${old_state} → ${new_state}`);
});
```
## Endpoint detection and manual finalization
Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.
Read more about [Endpoint detection](/stt/rt/endpoint-detection)
Enable endpoint detection by setting `enable_endpoint_detection: true` in the session configuration.
Listen for the `endpoint` event to know when a speaker has finished speaking.
```typescript
recording.on("endpoint", () => {
console.log("--- speaker finished ---");
});
```
Manual finalization gives you precise control over when audio should be finalized — useful for Push-to-talk systems and client-side voice activity detection (VAD).
Read more about [Manual finalization](/stt/rt/manual-finalization)
```ts
session.finalize();
```
## Pause, resume and muting audio source
```ts
recording.pause(); // keeps connection alive, drops audio while paused
recording.resume(); // resume sending audio
```
SDK will `finalize` audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about [Manual finalization](/stt/rt/manual-finalization#key-points)
Recording will also react on system level mute events and will start sending keepalive messages to keep the session alive.
You are billed for the full stream duration even when session is paused.
## Handling translation
The SDK supports one-way and two-way real-time translation. Configure translation in the session config, then filter tokens by `translation_status` to separate original and translated text.
### One-way translation
Translates all spoken audio into a single target language.
```typescript
const recording = client.realtime.record({
model: "stt-rt-v4",
translation: {
type: "one_way",
target_language: "es", // Translate everything to Spanish
},
});
recording.on("result", (result) => {
for (const token of result.tokens) {
if (token.translation_status === "original") {
console.log("[Original]", token.text);
} else if (token.translation_status === "translation") {
console.log("[Translated]", token.text);
}
}
});
```
### Two-way translation
Translates between two languages — each speaker's speech is translated into the other language.
```typescript
const recording = client.realtime.record({
model: "stt-rt-v4",
translation: {
type: "two_way",
language_a: "en",
language_b: "fr",
},
});
```
### Translation token fields
When translation is enabled, each `RealtimeToken` includes:
| Field | Type | Description |
| -------------------- | --------------------------------------- | ------------------------------------------------------- |
| `translation_status` | `'none' \| 'original' \| 'translation'` | Whether this token is original speech or a translation. |
| `source_language` | `string` | The source language code for translated tokens. |
| `language` | `string` | The language of this token's text. |
Learn more about [Real-time translation](/stt/rt/real-time-translation)
You can provide [custom translation terms](/stt/concepts/context#translation-terms) in the context to improve translation accuracy.
## Handle permissions
The SDK provides a platform-agnostic permission system for checking and requesting microphone access before starting a recording.
This is optional but recommended for a good user experience — you can show appropriate UI based on
the permission state rather than waiting for the recording to fail.
### Setup
Pass a [`BrowserPermissionResolver`](/stt/SDKs/web-SDK/reference/classes#browserpermissionresolver) when creating the client:
```typescript
import { SonioxClient, BrowserPermissionResolver } from "@soniox/client";
const client = new SonioxClient({
api_key: fetchKey,
permissions: new BrowserPermissionResolver(),
});
```
### Check permission status
`check()` queries the current microphone permission without prompting the user:
```typescript
const result = await client.permissions?.check("microphone");
switch (result?.status) {
case "granted":
// Microphone access already granted — safe to record
break;
case "prompt":
// User hasn't been asked yet — show a "start recording" button
break;
case "denied":
if (!result.can_request) {
// Permanently denied — show "go to browser settings" instructions
}
break;
case "unavailable":
// No microphone or getUserMedia not supported
break;
}
```
### Request permission
`request()` triggers the browser permission prompt. On platforms where
permission is already granted, this is a no-op.
```typescript
const result = await client.permissions?.request("microphone");
if (result?.status === "granted") {
startRecording();
} else if (result?.status === "denied") {
showPermissionDeniedMessage();
}
```
Only create `BrowserPermissionResolver` in browser environments
## Use custom audio source
By default, `client.realtime.record()` uses the built-in [`MicrophoneSource`](/stt/SDKs/web-SDK/reference/classes#microphonesource) which captures audio via `getUserMedia` and [`MediaRecorder`](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder).
You can replace it with any object that implements the [`AudioSource`](/stt/SDKs/web-SDK/reference/types#audiosource) interface.
# LangChain.js (JavaScript)
URL: /stt/integrations/langchain/langchain-js
Soniox document loader for LangChain.js
import Image from "next/image";
## Overview
[LangChain](https://www.langchain.com/) is a popular framework for building applications powered by large language models (LLMs).
The `@soniox/langchain` package provides a document loader that transcribes audio files using Soniox's speech-to-text API, making it easy to incorporate audio transcription into your LangChain pipelines.
## Setup
Install the package:
```bash npm2yarn
npm install @soniox/langchain
```
### Credentials
Get your Soniox API key from the [Soniox Console](https://console.soniox.com) and set it as an environment variable:
```bash
export SONIOX_API_KEY=your_api_key
```
## Usage
### Basic transcription
Transcribe audio files using the `SonioxAudioTranscriptLoader`:
```typescript
import { SonioxAudioTranscriptLoader } from "@soniox/langchain";
// Fetch the file
const response = await fetch(
"https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3",
);
const audioBuffer = await response.bytes(); // Uint8Array
const loader = new SonioxAudioTranscriptLoader({
audio: audioBuffer, // Or you can pass in a URL string
});
const docs = await loader.load();
console.log(docs[0].pageContent); // Transcribed text
```
### Two-way translation
Transcribe and translate between two languages simultaneously:
```typescript
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
translation: {
type: "two_way",
language_a: "en",
language_b: "es",
},
language_hints: ["en", "es"],
},
);
const docs = await loader.load();
```
### One-way translation
Translate from any detected language to a target language:
```typescript
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
translation: {
type: "one_way",
target_language: "fr",
},
language_hints: ["en", "es"],
},
);
const docs = await loader.load();
```
## Advanced usage
### Language hints
Provide [language hints](/stt/concepts/language-hints) to improve transcription accuracy:
```typescript
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
language_hints: ["en", "es"],
},
);
```
### Context for improved accuracy
Provide domain-specific [context](/stt/concepts/context) to improve transcription accuracy:
```typescript
const loader = new SonioxAudioTranscriptLoader(
{
audio: audioBuffer,
},
{
context: {
general: [
{ key: "industry", value: "healthcare" },
{ key: "meeting_type", value: "consultation" },
],
terms: ["hypertension", "cardiology", "metformin"],
translation_terms: [
{ source: "blood pressure", target: "presión arterial" },
{ source: "medication", target: "medicamento" },
],
},
},
);
```
## API reference
### Constructor parameters
#### SonioxLoaderParams (required)
| Parameter | Type | Required | Description |
| ------------------- | ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------- |
| `audio` | `Uint8Array \| string` | Yes | Audio file as buffer or URL |
| `audioFormat` | `SonioxAudioFormat` | No | Audio file format |
| `apiKey` | `string` | No | Soniox API key (defaults to `SONIOX_API_KEY` env var) |
| `apiBaseUrl` | `string` | No | API base URL (defaults to `https://api.soniox.com/v1`). See [regional endpoints](/stt/data-residency#regional-endpoints). |
| `pollingIntervalMs` | `number` | No | Polling interval in ms (min: 1000, default: 1000) |
| `pollingTimeoutMs` | `number` | No | Polling timeout in ms (default: 180000) |
#### SonioxLoaderOptions (optional)
| Parameter | Type | Description |
| -------------------------------- | ---------------------------- | ---------------------------------------- |
| `model` | `SonioxTranscriptionModelId` | Model to use (default: `"stt-async-v4"`) |
| `translation` | `object` | Translation configuration |
| `language_hints` | `string[]` | Language hints for transcription |
| `language_hints_strict` | `boolean` | Enforce strict language hints |
| `enable_speaker_diarization` | `boolean` | Enable speaker identification |
| `enable_language_identification` | `boolean` | Enable language detection |
| `context` | `object` | Context for improved accuracy |
Browse the [API reference](/stt/api-reference/transcriptions/create_transcription) for a full list of supported options.
### Supported audio formats
* `aac` - Advanced Audio Coding
* `aiff` - Audio Interchange File Format
* `amr` - Adaptive Multi-Rate
* `asf` - Advanced Systems Format
* `flac` - Free Lossless Audio Codec
* `mp3` - MPEG Audio Layer III
* `ogg` - Ogg Vorbis
* `wav` - Waveform Audio File Format
* `webm` - WebM Audio
### Return value
The `load()` method returns an array containing a single `Document` object:
```typescript
Document {
pageContent: string, // The transcribed text
metadata: SonioxTranscriptResponse // Full transcript with metadata
}
```
The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information.
## Related
* [LangChain documentation](https://docs.langchain.com/oss/javascript/integrations/document_loaders/web_loaders/soniox)
* [Package on NPM](https://www.npmjs.com/package/@soniox/langchain)
# LangChain (Python)
URL: /stt/integrations/langchain/langchain
Soniox document loader for LangChain
import Image from "next/image";
## Overview
[LangChain](https://www.langchain.com/) is a popular framework for building applications powered by large language models (LLMs).
The `langchain-soniox` package provides a document loader that transcribes audio files using Soniox's speech-to-text API,
making it easy to incorporate audio transcription into your LangChain pipelines.
## Setup
Install the package:
```bash
pip install langchain-soniox
```
### Credentials
Get your Soniox API key from the [Soniox Console](https://console.soniox.com) and set it as an environment variable:
```bash
export SONIOX_API_KEY=your_api_key
```
## Usage
### Basic transcription
Transcribe audio files using the `SonioxDocumentLoader`:
```python
from langchain_soniox import SonioxDocumentLoader
# Using a URL
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
docs = list(loader.lazy_load())
print(docs[0].page_content) # Transcribed text
```
You can also load audio from a local file or from bytes:
```python
# Using a local file path
loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3")
# Using binary data
with open("/path/to/audio.mp3", "rb") as f:
audio_bytes = f.read()
loader = SonioxDocumentLoader(file_data=audio_bytes)
```
### Async loading
For async operations, use `alazy_load()`:
```python
import asyncio
from langchain_soniox import SonioxDocumentLoader
async def transcribe_async():
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3"
)
docs = [doc async for doc in loader.alazy_load()]
print(docs[0].page_content)
asyncio.run(transcribe_async())
```
## Advanced usage
### Language hints
Soniox automatically detects and transcribes speech in [**60+ languages**](https://soniox.com/docs/stt/concepts/supported-languages). When you know which languages are likely to appear in your audio, provide `language_hints` to improve accuracy by biasing recognition toward those languages.
Language hints **do not restrict** recognition — they only **bias** the model toward the specified languages, while still allowing other languages to be detected if present.
```python
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
language_hints=["en", "es"],
),
)
docs = list(loader.lazy_load())
```
For more details, see the [Soniox language hints documentation](https://soniox.com/docs/stt/concepts/language-hints).
### Speaker diarization
Enable speaker identification to distinguish between different speakers:
```python
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
enable_speaker_diarization=True,
),
)
docs = list(loader.lazy_load())
# Access speaker information in the metadata
current_speaker = None
output = ""
for token in docs[0].metadata["tokens"]:
if current_speaker != token["speaker"]:
current_speaker = token["speaker"]
output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}"
else:
output += token["text"]
print(output)
```
### Language identification
Enable automatic language detection and identification:
```python
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
enable_language_identification=True,
),
)
docs = list(loader.lazy_load())
# Access language information in the metadata
current_language = None
output = ""
for token in docs[0].metadata["tokens"]:
if current_language != token["language"]:
current_language = token["language"]
output += f"\n[{current_language}] {token['text'].lstrip()}"
else:
output += token["text"]
print(output)
```
### Context for improved accuracy
Provide domain-specific [context](https://soniox.com/docs/stt/concepts/context) to improve transcription accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary.
The `context` object supports four optional sections:
```python
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
StructuredContext,
StructuredContextGeneralItem,
StructuredContextTranslationTerm,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
context=StructuredContext(
# Structured key-value information (domain, topic, intent, etc.)
general=[
StructuredContextGeneralItem(key="domain", value="Healthcare"),
StructuredContextGeneralItem(
key="topic", value="Diabetes management consultation"
),
StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"),
],
# Longer free-form background text or related documents
text="The patient has a history of...",
# Domain-specific or uncommon words
terms=["Celebrex", "Zyrtec", "Xanax"],
# Custom translations for ambiguous terms
translation_terms=[
StructuredContextTranslationTerm(
source="Mr. Smith", target="Sr. Smith"
),
StructuredContextTranslationTerm(source="MRI", target="RM"),
],
),
),
)
docs = list(loader.lazy_load())
```
For more details, see the [Soniox context documentation](https://soniox.com/docs/stt/concepts/context).
### Translation
Translate from any detected language to a target language:
```python
from langchain_soniox import (
SonioxDocumentLoader,
SonioxTranscriptionOptions,
TranslationConfig,
)
loader = SonioxDocumentLoader(
file_url="https://soniox.com/media/examples/coffee_shop.mp3",
options=SonioxTranscriptionOptions(
translation=TranslationConfig(
type="one_way",
target_language="fr",
),
language_hints=["en"],
),
)
docs = list(loader.lazy_load())
for token in docs[0].metadata["tokens"]:
if token["translation_status"] == "translation":
translated_text += token["text"]
else:
original_text += token["text"]
print(original_text)
print(translated_text)
```
You can also transcribe and translate between two languages simultaneously using `two_way` translation type. Learn more about translation [here](https://soniox.com/docs/stt/async/async-translation).
## API reference
### Constructor parameters
| Parameter | Type | Required | Default | Description |
| ------------------------------ | ---------------------------- | -------- | ------------------------------ | -------------------------------------------------- |
| `file_path` | `str` | No\* | `None` | Path to local audio file to transcribe |
| `file_data` | `bytes` | No\* | `None` | Binary data of audio file to transcribe |
| `file_url` | `str` | No\* | `None` | URL of audio file to transcribe |
| `api_key` | `str` | No | `SONIOX_API_KEY` env var | Soniox API key |
| `base_url` | `str` | No | `https://api.soniox.com/v1` | API base URL (see [regional endpoints][endpoints]) |
| `options` | `SonioxTranscriptionOptions` | No | `SonioxTranscriptionOptions()` | Transcription options |
| `polling_interval_seconds` | `float` | No | `1.0` | Time between status polls (seconds) |
| `timeout_seconds` | `float` | No | `300.0` (5 minutes) | Maximum time to wait for transcription |
| `http_request_timeout_seconds` | `float` | No | `60.0` | Timeout for individual HTTP requests |
\* You must specify **exactly one** of: `file_path`, `file_data`, or `file_url`.
[endpoints]: https://soniox.com/docs/stt/data-residency#regional-endpoints
### Transcription options
The `SonioxTranscriptionOptions` class supports these parameters:
| Parameter | Type | Description |
| -------------------------------- | ------------------- | ----------------------------------------------------- |
| `model` | `str` | Async model to use (see [available models][models]) |
| `language_hints` | `list[str]` | Language hints for transcription (ISO language codes) |
| `language_hints_strict` | `bool` | Enforce strict language hints |
| `enable_speaker_diarization` | `bool` | Enable speaker identification |
| `enable_language_identification` | `bool` | Enable language detection |
| `translation` | `TranslationConfig` | Translation configuration |
| `context` | `StructuredContext` | Context for improved accuracy |
| `client_reference_id` | `str` | Custom reference ID for your records |
| `webhook_url` | `str` | Webhook URL for completion notifications |
| `webhook_auth_header_name` | `str` | Custom auth header name for webhook |
| `webhook_auth_header_value` | `str` | Custom auth header value for webhook |
Browse the [API documentation](https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription) for a full list of supported options.
[models]: https://soniox.com/docs/stt/models
### Return value
The `lazy_load()` and `alazy_load()` methods yield a single `Document` object:
```python
Document(
page_content=str, # The transcribed text
metadata={
"source": str, # File URL, path, or "file_upload"
"transcription_id": str, # Unique transcription ID
"audio_duration_ms": int, # Audio duration in milliseconds
"model": str, # Model used for transcription
"created_at": str, # ISO 8601 timestamp
"tokens": list[dict], # Detailed token-level information
}
)
```
The `tokens` array in metadata includes detailed information for each transcribed word:
* `text`: The transcribed text
* `start_ms`: Start time in milliseconds
* `end_ms`: End time in milliseconds
* `speaker`: Speaker ID (if diarization enabled), for example `"1"`, `"2"`, etc.
* `language`: Detected language (if identification enabled), for example `"en"`, `"fr"`, etc.
* `translation_status`: Translation status (`"original"`, `"translated"` or `"none"`)
Learn more about the [Soniox API reference](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription_transcript).
## Related
* [LangChain documentation](https://docs.langchain.com/oss/python/integrations/document_loaders/soniox)
* [Package on PyPI](https://pypi.org/project/langchain-soniox/)
# Classes
URL: /stt/SDKs/node-SDK/reference/classes
Soniox Node SDK — Class Reference
## SonioxNodeClient
Soniox Node Client
### Example
```typescript
import { SonioxNodeClient } from '@soniox/node';
const client = new SonioxNodeClient({
api_key: 'your-api-key',
});
```
### Constructor
```ts
new SonioxNodeClient(options): SonioxNodeClient;
```
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------------------------- |
| `options` | [`SonioxNodeClientOptions`](types#sonioxnodeclientoptions) |
**Returns**
`SonioxNodeClient`
### Properties
| Property | Type |
| ---------- | ------------------------------------------------ |
| `auth` | [`SonioxAuthAPI`](classes#sonioxauthapi) |
| `files` | [`SonioxFilesAPI`](classes#sonioxfilesapi) |
| `models` | [`SonioxModelsAPI`](classes#sonioxmodelsapi) |
| `realtime` | [`SonioxRealtimeApi`](classes#sonioxrealtimeapi) |
| `stt` | [`SonioxSttApi`](classes#sonioxsttapi) |
| `webhooks` | [`SonioxWebhooksAPI`](classes#sonioxwebhooksapi) |
***
## SonioxFilesAPI
### delete()
```ts
delete(file, signal?): Promise;
```
Permanently deletes a file.
This operation is idempotent - succeeds even if the file doesn't exist.
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------- | --------------------------------------------- |
| `file` | [`FileIdentifier`](types#fileidentifier) | The UUID of the file or a SonioxFile instance |
| `signal?` | `AbortSignal` | Optional AbortSignal for cancellation |
**Returns**
`Promise`\<`void`>
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
// Delete by ID
await client.files.delete('550e8400-e29b-41d4-a716-446655440000');
// Or delete a file instance
const file = await client.files.get('550e8400-e29b-41d4-a716-446655440000');
if (file) {
await client.files.delete(file);
}
// Or just use the instance method
await file.delete();
```
***
### delete\_all()
```ts
delete_all(options): Promise;
```
Permanently deletes all uploaded files.
Iterates through all pages of files and deletes each one.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------------------------------------------ | -------------------------------------- |
| `options` | [`DeleteAllFilesOptions`](types#deleteallfilesoptions) | Optional signal and progress callback. |
**Returns**
`Promise`\<`void`>
The number of files deleted.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Throws**
`Error` If the operation is aborted via signal.
**Example**
```typescript
// Delete all files
await client.files.delete_all();
console.log(`Deleted all files.`);
// With cancellation
const controller = new AbortController();
await client.files.delete_all({ signal: controller.signal });
```
***
### get()
```ts
get(file, signal?): Promise;
```
Retrieve metadata for an uploaded file.
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------- | --------------------------------------------- |
| `file` | [`FileIdentifier`](types#fileidentifier) | The UUID of the file or a SonioxFile instance |
| `signal?` | `AbortSignal` | Optional AbortSignal for cancellation |
**Returns**
`Promise`\<[`SonioxFile`](classes#sonioxfile) | `null`>
The file instance, or null if not found
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
const file = await client.files.get('550e8400-e29b-41d4-a716-446655440000');
if (file) {
console.log(file.filename, file.size);
}
```
***
### list()
```ts
list(options): Promise;
```
Retrieves list of uploaded files
The returned result is async iterable - use `for await...of`
**Parameters**
| Parameter | Type | Description |
| --------- | -------------------------------------------- | ----------------------------------------------- |
| `options` | [`ListFilesOptions`](types#listfilesoptions) | Optional pagination and cancellation parameters |
**Returns**
`Promise`\<[`FileListResult`](classes#filelistresult)>
FileListResult
**Throws**
[SonioxHttpError](classes#sonioxhttperror)
**Example**
```typescript
const result = await client.files.list();
// Automatic paging - iterates through ALL files across all pages
for await (const file of result) {
console.log(file.filename, file.size);
}
// Or access just the first page
for (const file of result.files) {
console.log(file.filename);
}
// Check if there are more pages
if (result.isPaged()) {
console.log('More pages available');
}
// Manual paging using cursor
const page1 = await client.files.list({ limit: 10 });
if (page1.next_page_cursor) {
const page2 = await client.files.list({ cursor: page1.next_page_cursor });
}
// With cancellation
const controller = new AbortController();
const result = await client.files.list({ signal: controller.signal });
```
***
### upload()
```ts
upload(file, options): Promise;
```
Uploads a file to Soniox for transcription
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------------- | ------------------------------------------- |
| `file` | [`UploadFileInput`](types#uploadfileinput) | Buffer, Uint8Array, Blob, or ReadableStream |
| `options` | [`UploadFileOptions`](types#uploadfileoptions) | Upload options |
**Returns**
`Promise`\<[`SonioxFile`](classes#sonioxfile)>
The uploaded file metadata
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors
**Throws**
`Error` On validation errors (file too large, invalid input)
**Examples**
```typescript
import * as fs from 'node:fs';
const buffer = await fs.promises.readFile('/path/to/audio.mp3');
const file = await client.files.upload(buffer, { filename: 'audio.mp3' });
```
```typescript
const file = await client.files.upload(Bun.file('/path/to/audio.mp3'));
```
```typescript
const file = await client.files.upload(buffer, {
filename: 'audio.mp3',
client_reference_id: 'order-12345',
});
```
```typescript
const controller = new AbortController();
setTimeout(() => controller.abort(), 30000);
const file = await client.files.upload(buffer, {
filename: 'audio.mp3',
signal: controller.signal,
});
```
***
## SonioxSttApi
### create()
```ts
create(options, signal?): Promise;
```
Creates a new transcription from audio\_url or file\_id
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------------------------------- | ------------------------------------------------------- |
| `options` | [`CreateTranscriptionOptions`](types#createtranscriptionoptions) | Transcription options including model and audio source. |
| `signal?` | `AbortSignal` | - |
**Returns**
`Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)>
The created transcription.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Example**
```typescript
// Transcribe from URL
const transcription = await client.stt.create({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
});
// Transcribe from uploaded file
const file = await client.files.upload(buffer);
const transcription = await client.stt.create({
model: 'stt-async-v4',
file_id: file.id,
});
// With speaker diarization
const transcription = await client.stt.create({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
enable_speaker_diarization: true,
});
```
***
### delete()
```ts
delete(id, signal?): Promise;
```
Permanently deletes a transcription.
This operation is idempotent - succeeds even if the transcription doesn't exist.
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------------------------- | --------------------------------------------------------------- |
| `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance |
| `signal?` | `AbortSignal` | - |
**Returns**
`Promise`\<`void`>
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
// Delete by ID
await client.stt.delete('550e8400-e29b-41d4-a716-446655440000');
// Or delete a transcription instance
const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000');
if (transcription) {
await client.stt.delete(transcription);
}
```
***
### delete\_all()
```ts
delete_all(options): Promise;
```
Permanently deletes all transcriptions.
Iterates through all pages of transcriptions and deletes each one.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------------------------------------------------------------ | ---------------- |
| `options` | [`DeleteAllTranscriptionsOptions`](types#deletealltranscriptionsoptions) | Optional signal. |
**Returns**
`Promise`\<`void`>
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Throws**
`Error` If the operation is aborted via signal.
**Example**
```typescript
// Delete all transcriptions
await client.stt.delete_all();
console.log(`Deleted all transcriptions.`);
// With cancellation
const controller = new AbortController();
await client.stt.delete_all({ signal: controller.signal });
```
***
### destroy()
```ts
destroy(id): Promise;
```
Permanently deletes a transcription and its associated file (if any).
This operation is idempotent - succeeds even if resources don't exist.
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------------------------- | --------------------------------------------------------------- |
| `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance |
**Returns**
`Promise`\<`void`>
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
// Clean up both transcription and uploaded file
const transcription = await client.stt.transcribe({
model: 'stt-async-v4',
file: buffer,
wait: true,
});
// ... use transcription ...
await client.stt.destroy(transcription); // Deletes both
// Or by ID
await client.stt.destroy('550e8400-e29b-41d4-a716-446655440000');
```
***
### destroy\_all()
```ts
destroy_all(options): Promise;
```
Permanently deletes all transcriptions and their associated files.
Iterates through all pages of transcriptions and calls [destroy](#destroy)
on each one, removing both the transcription and its uploaded file.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------------------------------------------------------------ | -------------------------------------- |
| `options` | [`DeleteAllTranscriptionsOptions`](types#deletealltranscriptionsoptions) | Optional signal and progress callback. |
**Returns**
`Promise`\<`void`>
The number of transcriptions destroyed.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Throws**
`Error` If the operation is aborted via signal.
**Example**
```typescript
// Destroy all transcriptions and their files
await client.stt.destroy_all();
console.log(`Destroyed all transcriptions and their files.`);
// With cancellation
const controller = new AbortController();
await client.stt.destroy_all({ signal: controller.signal });
```
***
### get()
```ts
get(id, signal?): Promise;
```
Retrieves a transcription by ID
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------------------------- | ---------------------------------------------------------------- |
| `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance. |
| `signal?` | `AbortSignal` | - |
**Returns**
`Promise`\<[`SonioxTranscription`](classes#sonioxtranscription) | `null`>
The transcription, or null if not found.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404).
**Example**
```typescript
const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000');
if (transcription) {
console.log(transcription.status, transcription.model);
}
```
***
### getTranscript()
```ts
getTranscript(id, signal?): Promise;
```
Retrieves the full transcript text and tokens for a completed transcription.
Only available for successfully completed transcriptions.
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------------------------- | --------------------------------------------------------------- |
| `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance |
| `signal?` | `AbortSignal` | - |
**Returns**
`Promise`\<[`SonioxTranscript`](classes#sonioxtranscript) | `null`>
The transcript with text and detailed tokens, or null if not found
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
const transcript = await client.stt.getTranscript('550e8400-e29b-41d4-a716-446655440000');
if (transcript) {
console.log(transcript.text);
for (const token of transcript.tokens) {
console.log(token.text, token.start_ms, token.end_ms, token.confidence);
}
}
```
***
### list()
```ts
list(options, signal?): Promise;
```
Retrieves list of transcriptions
The returned result is async iterable - use `for await...of` to iterate through all pages
**Parameters**
| Parameter | Type | Description |
| --------- | -------------------------------------------------------------- | ------------------------------------------ |
| `options` | [`ListTranscriptionsOptions`](types#listtranscriptionsoptions) | Optional pagination and filter parameters. |
| `signal?` | `AbortSignal` | - |
**Returns**
`Promise`\<[`TranscriptionListResult`](classes#transcriptionlistresult)>
TranscriptionListResult with async iteration support.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Example**
```typescript
const result = await client.stt.list();
// Automatic paging - iterates through ALL transcriptions across all pages
for await (const transcription of result) {
console.log(transcription.id, transcription.status);
}
// Or access just the first page
for (const transcription of result.transcriptions) {
console.log(transcription.id);
}
// Check if there are more pages
if (result.isPaged()) {
console.log('More pages available');
}
```
***
### transcribe()
```ts
transcribe(options): Promise;
```
Unified transcribe method - supports direct file upload
When `file` is provided, uploads it first then creates a transcription
When `wait: true`, waits for completion before returning
When `cleanup` is specified (requires `wait: true`), cleans up resources after completion or on error/timeout
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------------------- | -------------------------------------------------------------------- |
| `options` | [`TranscribeOptions`](types#transcribeoptions) | Transcribe options including model, audio source, and wait settings. |
**Returns**
`Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)>
The transcription (completed if wait=true, otherwise in queued/processing state).
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Throws**
`Error` On validation errors or wait timeout.
**Example**
```typescript
// Transcribe from URL and wait for completion
const result = await client.stt.transcribe({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
wait: true,
});
// Upload file and transcribe in one call
const result = await client.stt.transcribe({
model: 'stt-async-v4',
file: buffer, // or Blob, ReadableStream
filename: 'meeting.mp3',
enable_speaker_diarization: true,
wait: true,
});
// With wait progress callback
const result = await client.stt.transcribe({
model: 'stt-async-v4',
file: buffer,
wait: true,
wait_options: {
interval_ms: 2000,
on_status_change: (status) => console.log(`Status: ${status}`),
},
});
// Auto-cleanup uploaded file after transcription
const result = await client.stt.transcribe({
model: 'stt-async-v4',
file: buffer,
wait: true,
cleanup: ['file'], // Deletes uploaded file, keeps transcription record
});
// Auto-cleanup everything after transcription
const result = await client.stt.transcribe({
model: 'stt-async-v4',
file: buffer,
wait: true,
cleanup: ['file', 'transcription'], // Deletes both file and transcription record
});
```
***
### transcribeFromFile()
```ts
transcribeFromFile(file, options): Promise;
```
Wrapper to transcribe from raw file data.
**Parameters**
| Parameter | Type | Description |
| --------- | -------------------------------------------------------------- | ------------------------------------------- |
| `file` | [`UploadFileInput`](types#uploadfileinput) | Buffer, Uint8Array, Blob, or ReadableStream |
| `options` | [`TranscribeFromFileOptions`](types#transcribefromfileoptions) | Transcription options (excluding file) |
**Returns**
`Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)>
The transcription (completed if wait=true, otherwise in queued/processing state).
***
### transcribeFromFileId()
```ts
transcribeFromFileId(file_id, options): Promise;
```
Wrapper to transcribe from an uploaded file ID.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------------------------------------------------------ | ------------------------------------------ |
| `file_id` | `string` | ID of a previously uploaded file |
| `options` | [`TranscribeFromFileIdOptions`](types#transcribefromfileidoptions) | Transcription options (excluding file\_id) |
**Returns**
`Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)>
The transcription (completed if wait=true, otherwise in queued/processing state).
***
### transcribeFromUrl()
```ts
transcribeFromUrl(audio_url, options): Promise;
```
Wrapper to transcribe from a URL.
**Parameters**
| Parameter | Type | Description |
| ----------- | ------------------------------------------------------------ | -------------------------------------------- |
| `audio_url` | `string` | Publicly accessible audio URL |
| `options` | [`TranscribeFromUrlOptions`](types#transcribefromurloptions) | Transcription options (excluding audio\_url) |
**Returns**
`Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)>
The transcription (completed if wait=true, otherwise in queued/processing state).
***
### wait()
```ts
wait(id, options?): Promise;
```
Waits for a transcription to complete
**Parameters**
| Parameter | Type | Description |
| ---------- | ---------------------------------------------------------- | ---------------------------------------------------------------- |
| `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance. |
| `options?` | [`WaitOptions`](types#waitoptions) | Wait options including polling interval, timeout, and callbacks. |
**Returns**
`Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)>
The completed or errored transcription.
**Throws**
`Error` If the wait times out or is aborted.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Example**
```typescript
const completed = await client.stt.wait('550e8400-e29b-41d4-a716-446655440000');
// With progress callback
const completed = await client.stt.wait('id', {
interval_ms: 2000,
on_status_change: (status) => console.log(`Status: ${status}`),
});
```
***
## SonioxModelsAPI
### list()
```ts
list(signal?): Promise;
```
List of available models and their attributes.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------- | ------------------------------------- |
| `signal?` | `AbortSignal` | Optional AbortSignal for cancellation |
**Returns**
`Promise`\<[`SonioxModel`](types#sonioxmodel)\[]>
List of available models and their attributes.
**See**
[https://soniox.com/docs/stt/api-reference/models/get\_models](https://soniox.com/docs/stt/api-reference/models/get_models)
***
## SonioxWebhooksAPI
Webhook utilities API accessible via client.webhooks
Provides methods for handling incoming Soniox webhook requests.
When used via the client, results include lazy fetch helpers for transcripts.
### getAuthFromEnv()
```ts
getAuthFromEnv(): WebhookAuthConfig | undefined;
```
Get webhook authentication configuration from environment variables.
Reads `SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET` environment variables.
Returns undefined if either variable is not set (both are required for authentication).
**Returns**
[`WebhookAuthConfig`](types#webhookauthconfig) | `undefined`
***
### handle()
```ts
handle(options): WebhookHandlerResultWithFetch;
```
Framework-agnostic webhook handler
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------------------- |
| `options` | [`HandleWebhookOptions`](types#handlewebhookoptions) |
**Returns**
[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)
***
### handleExpress()
```ts
handleExpress(req, auth?): WebhookHandlerResultWithFetch;
```
Handle a webhook from an Express-like request
**Parameters**
| Parameter | Type |
| --------- | ------------------------------------------------ |
| `req` | [`ExpressLikeRequest`](types#expresslikerequest) |
| `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) |
**Returns**
[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)
**Example**
```typescript
app.post('/webhook', async (req, res) => {
const result = soniox.webhooks.handleExpress(req);
if (result.ok && result.event.status === 'completed') {
const transcript = await result.fetchTranscript();
console.log(transcript?.text);
}
res.status(result.status).json({ received: true });
});
```
***
### handleFastify()
```ts
handleFastify(req, auth?): WebhookHandlerResultWithFetch;
```
Handle a webhook from a Fastify request
**Parameters**
| Parameter | Type |
| --------- | ------------------------------------------------ |
| `req` | [`FastifyLikeRequest`](types#fastifylikerequest) |
| `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) |
**Returns**
[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)
***
### handleHono()
```ts
handleHono(c, auth?): Promise;
```
Handle a webhook from a Hono context
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------------- |
| `c` | [`HonoLikeContext`](types#honolikecontext) |
| `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) |
**Returns**
`Promise`\<[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)>
***
### handleNestJS()
```ts
handleNestJS(req, auth?): WebhookHandlerResultWithFetch;
```
Handle a webhook from a NestJS request
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------------- |
| `req` | [`NestJSLikeRequest`](types#nestjslikerequest) |
| `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) |
**Returns**
[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)
***
### handleRequest()
```ts
handleRequest(request, auth?): Promise;
```
Handle a webhook from a Fetch API Request
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------------- |
| `request` | `Request` |
| `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) |
**Returns**
`Promise`\<[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)>
***
### isEvent()
```ts
isEvent(payload): payload is WebhookEvent;
```
Type guard to check if a value is a valid WebhookEvent
**Parameters**
| Parameter | Type |
| --------- | --------- |
| `payload` | `unknown` |
**Returns**
`payload is WebhookEvent`
***
### parseEvent()
```ts
parseEvent(payload): WebhookEvent;
```
Parse and validate a webhook event payload
**Parameters**
| Parameter | Type |
| --------- | --------- |
| `payload` | `unknown` |
**Returns**
[`WebhookEvent`](types#webhookevent)
***
### verifyAuth()
```ts
verifyAuth(headers, auth): boolean;
```
Verify webhook authentication header
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------------- |
| `headers` | [`WebhookHeaders`](types#webhookheaders) |
| `auth` | [`WebhookAuthConfig`](types#webhookauthconfig) |
**Returns**
`boolean`
***
## SonioxAuthAPI
### createTemporaryKey()
```ts
createTemporaryKey(request, signal?): Promise;
```
Creates a temporary API key for client-side use.
**Parameters**
| Parameter | Type | Description |
| --------- | -------------------------------------------------------- | ---------------------------------------- |
| `request` | [`TemporaryApiKeyRequest`](types#temporaryapikeyrequest) | Request parameters for the temporary key |
| `signal?` | `AbortSignal` | Optional AbortSignal for cancellation |
**Returns**
`Promise`\<[`TemporaryApiKeyResponse`](types#temporaryapikeyresponse)>
The temporary API key response
***
## SonioxRealtimeApi
Real-time API factory for creating STT sessions.
### Example
```typescript
const session = client.realtime.stt({
model: 'stt-rt-v4',
enable_endpoint_detection: true,
});
await session.connect();
```
### stt()
```ts
stt(config, options?): RealtimeSttSession;
```
Create a new Speech-to-Text session.
**Parameters**
| Parameter | Type | Description |
| ---------- | ---------------------------------------------- | -------------------------------------- |
| `config` | [`SttSessionConfig`](types#sttsessionconfig) | Session configuration (sent to server) |
| `options?` | [`SttSessionOptions`](types#sttsessionoptions) | Session options (SDK-level settings) |
**Returns**
[`RealtimeSttSession`](classes#realtimesttsession)
New STT session instance
***
## SonioxFile
Uploaded file
### Constructor
```ts
new SonioxFile(data, _http): SonioxFile;
```
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------- |
| `data` | [`SonioxFileData`](types#sonioxfiledata) |
| `_http` | [`HttpClient`](types#httpclient) |
**Returns**
`SonioxFile`
### delete()
```ts
delete(signal?): Promise;
```
Permanently deletes this file.
This operation is idempotent - succeeds even if the file doesn't exist.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------- | ------------------------------------- |
| `signal?` | `AbortSignal` | Optional AbortSignal for cancellation |
**Returns**
`Promise`\<`void`>
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
const file = await client.files.get('550e8400-e29b-41d4-a716-446655440000');
if (file) {
await file.delete();
}
```
***
### toJSON()
```ts
toJSON(): SonioxFileData;
```
Returns the raw data for this file.
**Returns**
[`SonioxFileData`](types#sonioxfiledata)
### Properties
| Property | Type |
| --------------------- | ----------------------- |
| `client_reference_id` | `string` \| `undefined` |
| `created_at` | `string` |
| `filename` | `string` |
| `id` | `string` |
| `size` | `number` |
***
## SonioxTranscription
A Transcription instance
### Constructor
```ts
new SonioxTranscription(
data,
_http,
transcript?): SonioxTranscription;
```
**Parameters**
| Parameter | Type |
| ------------- | ---------------------------------------------------------- |
| `data` | [`SonioxTranscriptionData`](types#sonioxtranscriptiondata) |
| `_http` | [`HttpClient`](types#httpclient) |
| `transcript?` | [`SonioxTranscript`](classes#sonioxtranscript) \| `null` |
**Returns**
`SonioxTranscription`
### delete()
```ts
delete(): Promise;
```
Permanently deletes this transcription.
This operation is idempotent - succeeds even if the transcription doesn't exist.
**Returns**
`Promise`\<`void`>
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000');
await transcription.delete();
```
***
### destroy()
```ts
destroy(): Promise;
```
Permanently deletes this transcription and its associated file (if any).
This operation is idempotent - succeeds even if resources don't exist.
**Returns**
`Promise`\<`void`>
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404)
**Example**
```typescript
// Clean up both transcription and uploaded file
const transcription = await client.stt.transcribe({
model: 'stt-async-v4',
file: buffer,
wait: true,
});
// ... use transcription ...
await transcription.destroy(); // Deletes both transcription and file
```
***
### getTranscript()
```ts
getTranscript(options?): Promise;
```
Retrieves the full transcript text and tokens for this transcription.
Only available for successfully completed transcriptions.
Returns cached transcript if available (when using `transcribe()` with `wait: true`).
Use `force: true` to bypass the cache and fetch fresh data from the API.
**Parameters**
| Parameter | Type | Description |
| ----------------- | --------------------------------------------------- | -------------------------------------------------------- |
| `options?` | \{ `force?`: `boolean`; `signal?`: `AbortSignal`; } | Optional settings |
| `options.force?` | `boolean` | If true, bypasses cached transcript and fetches from API |
| `options.signal?` | `AbortSignal` | Optional AbortSignal for request cancellation |
**Returns**
`Promise`\<[`SonioxTranscript`](classes#sonioxtranscript) | `null`>
The transcript with text and detailed tokens, or null if not found.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors (except 404).
**Example**
```typescript
const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000');
if (transcription) {
const transcript = await transcription.getTranscript();
if (transcript) {
console.log(transcript.text);
}
}
// Force re-fetch from API
const freshTranscript = await transcription.getTranscript({ force: true });
```
***
### refresh()
```ts
refresh(signal?): Promise;
```
Re-fetches this transcription to get the latest status.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------- | ---------------------------------------------- |
| `signal?` | `AbortSignal` | Optional AbortSignal for request cancellation. |
**Returns**
`Promise`\<`SonioxTranscription`>
A new SonioxTranscription instance with updated data.
**Throws**
[SonioxHttpError](classes#sonioxhttperror)
**Example**
```typescript
let transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000');
transcription = await transcription.refresh();
console.log(transcription.status);
```
***
### toJSON()
```ts
toJSON(): SonioxTranscriptionData;
```
Returns the raw data for this transcription.
**Returns**
[`SonioxTranscriptionData`](types#sonioxtranscriptiondata)
***
### wait()
```ts
wait(options): Promise;
```
Waits for the transcription to complete or fail.
Polls the API at the specified interval until the status is 'completed' or 'error'.
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------- | ---------------------------------------------------------------- |
| `options` | [`WaitOptions`](types#waitoptions) | Wait options including polling interval, timeout, and callbacks. |
**Returns**
`Promise`\<`SonioxTranscription`>
The completed or errored transcription.
**Throws**
`Error` If the wait times out or is aborted.
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On API errors.
**Example**
```typescript
const transcription = await client.stt.create({
model: 'stt-async-v4',
audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3',
});
// Simple wait
const completed = await transcription.wait();
// Wait with progress callback
const completed = await transcription.wait({
interval_ms: 2000,
on_status_change: (status) => console.log(`Status: ${status}`),
});
```
### Properties
| Property | Type | Description |
| -------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `audio_duration_ms` | `number` \| `null` \| `undefined` | Duration of the audio in milliseconds. Only available after processing begins. |
| `audio_url` | `string` \| `null` \| `undefined` | URL of the audio file being transcribed. |
| `client_reference_id` | `string` \| `null` \| `undefined` | Optional tracking identifier. |
| `context` | \| [`TranscriptionContext`](types#transcriptioncontext) \| `null` \| `undefined` | Additional context provided for the transcription. |
| `created_at` | `string` | UTC timestamp when the transcription was created. |
| `enable_language_identification` | `boolean` | When true, language is detected for each part of the transcription. |
| `enable_speaker_diarization` | `boolean` | When true, speakers are identified and separated in the transcription output. |
| `error_message` | `string` \| `null` \| `undefined` | Error message if transcription failed. |
| `error_type` | `string` \| `null` \| `undefined` | Error type if transcription failed. |
| `file_id` | `string` \| `null` \| `undefined` | ID of the uploaded file being transcribed. |
| `filename` | `string` | Name of the file being transcribed. |
| `id` | `string` | Unique identifier of the transcription. |
| `language_hints` | `string`\[] \| `undefined` | Expected languages in the audio. |
| `model` | `string` | Speech-to-text model used. |
| `status` | [`TranscriptionStatus`](types#transcriptionstatus) | Current status of the transcription. |
| `transcript` | [`SonioxTranscript`](classes#sonioxtranscript) \| `null` \| `undefined` | Pre-fetched transcript. Only available when using `transcribe()` with `wait: true`, `fetch_transcript !== false`, and the transcription completed successfully |
| `webhook_auth_header_name` | `string` \| `null` \| `undefined` | Name of the authentication header sent with webhook notifications. |
| `webhook_auth_header_value` | `string` \| `null` \| `undefined` | Authentication header value (masked). |
| `webhook_status_code` | `number` \| `null` \| `undefined` | HTTP status code received when webhook was delivered. |
| `webhook_url` | `string` \| `null` \| `undefined` | URL to receive webhook notifications. |
***
## SonioxTranscript
A Transcript result containing the transcribed text and tokens.
### Constructor
```ts
new SonioxTranscript(data): SonioxTranscript;
```
**Parameters**
| Parameter | Type |
| --------- | ------------------------------------------------ |
| `data` | [`TranscriptResponse`](types#transcriptresponse) |
**Returns**
`SonioxTranscript`
### segments()
```ts
segments(options?): TranscriptSegment[];
```
Groups tokens into segments based on specified grouping keys.
A new segment starts when any of the `group_by` fields changes.
**Parameters**
| Parameter | Type | Description |
| ---------- | ------------------------------------------------------------ | -------------------- |
| `options?` | [`SegmentTranscriptOptions`](types#segmenttranscriptoptions) | Segmentation options |
**Returns**
[`TranscriptSegment`](types#transcriptsegment)\[]
Array of segments with combined text and timing
**Example**
```typescript
const transcript = await transcription.getTranscript();
// Group by both speaker and language (default)
const segments = transcript.segments();
// Group by speaker only
const bySpeaker = transcript.segments({ group_by: ['speaker'] });
for (const s of segments) {
console.log(`[Speaker ${s.speaker}] ${s.text}`);
}
```
### Properties
| Property | Type | Description |
| -------- | --------------------------------------------- | ------------------------------------------------------------------ |
| `id` | `string` | Unique identifier of the transcription this transcript belongs to. |
| `text` | `string` | Complete transcribed text content. |
| `tokens` | [`TranscriptToken`](types#transcripttoken)\[] | List of detailed token information with timestamps and metadata. |
***
## FileListResult
Result set for file listing
### Constructor
```ts
new FileListResult(
initialResponse,
_http,
_limit,
_signal): FileListResult;
```
**Parameters**
| Parameter | Type | Default value |
| ----------------- | ------------------------------------------------------------------------------------------ | ------------- |
| `initialResponse` | [`ListFilesResponse`](types#listfilesresponset)\<[`SonioxFileData`](types#sonioxfiledata)> | `undefined` |
| `_http` | [`HttpClient`](types#httpclient) | `undefined` |
| `_limit` | `number` \| `undefined` | `undefined` |
| `_signal` | `AbortSignal` \| `undefined` | `undefined` |
**Returns**
`FileListResult`
### \[asyncIterator]\()
```ts
asyncIterator: AsyncIterator;
```
Async iterator that automatically fetches all pages
Use with `for await...of` to iterate through all files
**Returns**
`AsyncIterator`\<[`SonioxFile`](classes#sonioxfile)>
***
### isPaged()
```ts
isPaged(): boolean;
```
Returns true if there are more pages of results beyond the first page
**Returns**
`boolean`
***
### toJSON()
```ts
toJSON(): ListFilesResponse;
```
Returns the raw data for this list result.
Also used by JSON.stringify() to prevent serialization of internal HTTP client.
**Returns**
[`ListFilesResponse`](types#listfilesresponset)\<[`SonioxFileData`](types#sonioxfiledata)>
### Properties
| Property | Type | Description |
| ------------------ | ------------------------------------- | ---------------------------------------------------------- |
| `files` | [`SonioxFile`](classes#sonioxfile)\[] | Files from the first page of results |
| `next_page_cursor` | `string` \| `null` | Pagination cursor for the next page. Null if no more pages |
***
## TranscriptionListResult
Result set for transcription listing.
### Constructor
```ts
new TranscriptionListResult(
initialResponse,
_http,
_options,
_signal?): TranscriptionListResult;
```
**Parameters**
| Parameter | Type |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| `initialResponse` | [`ListTranscriptionsResponse`](types#listtranscriptionsresponset)\<[`SonioxTranscriptionData`](types#sonioxtranscriptiondata)> |
| `_http` | [`HttpClient`](types#httpclient) |
| `_options` | [`ListTranscriptionsOptions`](types#listtranscriptionsoptions) |
| `_signal?` | `AbortSignal` |
**Returns**
`TranscriptionListResult`
### \[asyncIterator]\()
```ts
asyncIterator: AsyncIterator;
```
Async iterator that automatically fetches all pages.
Use with `for await...of` to iterate through all transcriptions.
**Returns**
`AsyncIterator`\<[`SonioxTranscription`](classes#sonioxtranscription)>
***
### isPaged()
```ts
isPaged(): boolean;
```
Returns true if there are more pages of results beyond the first page.
**Returns**
`boolean`
***
### toJSON()
```ts
toJSON(): ListTranscriptionsResponse;
```
Returns the raw data for this list result
**Returns**
[`ListTranscriptionsResponse`](types#listtranscriptionsresponset)\<[`SonioxTranscriptionData`](types#sonioxtranscriptiondata)>
### Properties
| Property | Type | Description |
| ------------------ | ------------------------------------------------------- | ----------------------------------------------------------- |
| `next_page_cursor` | `string` \| `null` | Pagination cursor for the next page. Null if no more pages. |
| `transcriptions` | [`SonioxTranscription`](classes#sonioxtranscription)\[] | Transcriptions from the first page of results. |
***
## RealtimeSttSession
Real-time Speech-to-Text session
Provides WebSocket-based streaming transcription with support for:
* Event-based and async iterator consumption
* Pause/resume with automatic keepalive while paused
* AbortSignal cancellation
### Example
```typescript
const session = new RealtimeSttSession(apiKey, wsUrl, { model: 'stt-rt-v4' });
session.on('result', (result) => {
console.log(result.tokens.map(t => t.text).join(''));
});
await session.connect();
session.sendAudio(audioChunk);
await session.finish();
```
### paused
```ts
get paused(): boolean;
```
Whether the session is currently paused.
**Returns**
`boolean`
***
### state
```ts
get state(): SttSessionState;
```
Current session state.
**Returns**
[`SttSessionState`](types#sttsessionstate)
### Constructor
```ts
new RealtimeSttSession(
apiKey,
wsBaseUrl,
config,
options?): RealtimeSttSession;
```
**Parameters**
| Parameter | Type |
| ----------- | ---------------------------------------------- |
| `apiKey` | `string` |
| `wsBaseUrl` | `string` |
| `config` | [`SttSessionConfig`](types#sttsessionconfig) |
| `options?` | [`SttSessionOptions`](types#sttsessionoptions) |
**Returns**
`RealtimeSttSession`
### \[asyncIterator]\()
```ts
asyncIterator: AsyncIterator;
```
Async iterator for consuming events.
**Returns**
`AsyncIterator`\<[`RealtimeEvent`](types#realtimeevent)>
***
### close()
```ts
close(): void;
```
Close (cancel) the session immediately without waiting
**Returns**
`void`
***
### connect()
```ts
connect(): Promise;
```
Connect to the Soniox WebSocket API.
**Returns**
`Promise`\<`void`>
**Throws**
[AbortError](classes#aborterror) If aborted
**Throws**
[ConnectionError](classes#connectionerror) If connection fails
**Throws**
[StateError](classes#stateerror) If already connected
***
### finalize()
```ts
finalize(options?): void;
```
Requests the server to finalize current transcription
**Parameters**
| Parameter | Type |
| ------------------------------ | -------------------------------------- |
| `options?` | \{ `trailing_silence_ms?`: `number`; } |
| `options.trailing_silence_ms?` | `number` |
**Returns**
`void`
***
### finish()
```ts
finish(): Promise;
```
Gracefully finish the session
**Returns**
`Promise`\<`void`>
***
### keepAlive()
```ts
keepAlive(): void;
```
Send a keepalive message
**Returns**
`void`
***
### off()
```ts
off(event, handler): this;
```
Remove an event handler
**Type Parameters**
| Type Parameter |
| ---------------------------------------------------------------- |
| `E` *extends* keyof [`SttSessionEvents`](types#sttsessionevents) |
**Parameters**
| Parameter | Type |
| --------- | -------------------------------------------------- |
| `event` | `E` |
| `handler` | [`SttSessionEvents`](types#sttsessionevents)\[`E`] |
**Returns**
`this`
***
### on()
```ts
on(event, handler): this;
```
Register an event handler
**Type Parameters**
| Type Parameter |
| ---------------------------------------------------------------- |
| `E` *extends* keyof [`SttSessionEvents`](types#sttsessionevents) |
**Parameters**
| Parameter | Type |
| --------- | -------------------------------------------------- |
| `event` | `E` |
| `handler` | [`SttSessionEvents`](types#sttsessionevents)\[`E`] |
**Returns**
`this`
***
### once()
```ts
once(event, handler): this;
```
Register a one-time event handler
**Type Parameters**
| Type Parameter |
| ---------------------------------------------------------------- |
| `E` *extends* keyof [`SttSessionEvents`](types#sttsessionevents) |
**Parameters**
| Parameter | Type |
| --------- | -------------------------------------------------- |
| `event` | `E` |
| `handler` | [`SttSessionEvents`](types#sttsessionevents)\[`E`] |
**Returns**
`this`
***
### pause()
```ts
pause(): void;
```
Pause audio transmission and starts automatic keepalive messages
**Returns**
`void`
***
### resume()
```ts
resume(): void;
```
Resume audio transmission
**Returns**
`void`
***
### sendAudio()
```ts
sendAudio(data): void;
```
Send audio data to the server
**Parameters**
| Parameter | Type | Description |
| --------- | ----------- | --------------------------------------- |
| `data` | `AudioData` | Audio data as Uint8Array or ArrayBuffer |
**Returns**
`void`
**Throws**
[AbortError](classes#aborterror) If aborted
**Throws**
[StateError](classes#stateerror) If not connected
***
### sendStream()
```ts
sendStream(stream, options?): Promise;
```
Stream audio data from an async iterable source.
**Parameters**
| Parameter | Type | Description |
| ---------- | ---------------------------------------------- | ---------------------------------------- |
| `stream` | `AsyncIterable`\<`AudioData`> | Async iterable yielding audio chunks |
| `options?` | [`SendStreamOptions`](types#sendstreamoptions) | Optional pacing and auto-finish settings |
**Returns**
`Promise`\<`void`>
**Throws**
[AbortError](classes#aborterror) If aborted during streaming
**Throws**
[StateError](classes#stateerror) If not connected
***
## RealtimeSegmentBuffer
Rolling buffer for turning real-time results into stable segments.
### size
```ts
get size(): number;
```
Number of tokens currently buffered.
**Returns**
`number`
### Constructor
```ts
new RealtimeSegmentBuffer(options?): RealtimeSegmentBuffer;
```
**Parameters**
| Parameter | Type |
| ---------- | -------------------------------------------------------------------- |
| `options?` | [`RealtimeSegmentBufferOptions`](types#realtimesegmentbufferoptions) |
**Returns**
`RealtimeSegmentBuffer`
### add()
```ts
add(result): RealtimeSegment[];
```
Add a real-time result and return stable segments.
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------- |
| `result` | [`RealtimeResult`](types#realtimeresult) |
**Returns**
[`RealtimeSegment`](types#realtimesegment)\[]
***
### flushAll()
```ts
flushAll(): RealtimeSegment[];
```
Flush all buffered tokens into segments and clear the buffer.
Includes tokens that are not yet stable by final\_audio\_proc\_ms.
**Returns**
[`RealtimeSegment`](types#realtimesegment)\[]
***
### reset()
```ts
reset(): void;
```
Clear all buffered tokens.
**Returns**
`void`
***
## RealtimeUtteranceBuffer
Collects real-time results into utterances for endpoint-driven workflows.
### Constructor
```ts
new RealtimeUtteranceBuffer(options?): RealtimeUtteranceBuffer;
```
**Parameters**
| Parameter | Type |
| ---------- | ------------------------------------------------------------------------ |
| `options?` | [`RealtimeUtteranceBufferOptions`](types#realtimeutterancebufferoptions) |
**Returns**
`RealtimeUtteranceBuffer`
### addResult()
```ts
addResult(result): RealtimeSegment[];
```
Add a real-time result and collect stable segments.
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------- |
| `result` | [`RealtimeResult`](types#realtimeresult) |
**Returns**
[`RealtimeSegment`](types#realtimesegment)\[]
***
### markEndpoint()
```ts
markEndpoint(): RealtimeUtterance | undefined;
```
Mark an endpoint and flush the current utterance.
**Returns**
[`RealtimeUtterance`](types#realtimeutterance) | `undefined`
***
### reset()
```ts
reset(): void;
```
Clear buffered segments and tokens.
**Returns**
`void`
***
## SonioxError
### Extends
* `Error`
### Extended by
* [`SonioxHttpError`](classes#sonioxhttperror)
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
### Properties
| Property | Type | Description |
| ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## SonioxHttpError
HTTP error class for all HTTP-related failures (REST API).
Thrown when HTTP requests fail due to network issues, timeouts,
server errors, or response parsing failures.
### Extends
* [`SonioxError`](classes#sonioxerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Overrides**
[`SonioxError`](classes#sonioxerror).[`toJSON`](classes#sonioxerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Overrides**
[`SonioxError`](classes#sonioxerror).[`toString`](classes#sonioxerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | -------------------------------------------- | ------------------------------------------------------------------------------------ |
| `bodyText` | `string` \| `undefined` | Response body text, capped at 4KB (only for http\_error/parse\_error) |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`HttpErrorCode`](types#httperrorcode) | Categorized HTTP error code |
| `headers` | `Record`\<`string`, `string`> \| `undefined` | Response headers (only for http\_error) |
| `method` | [`HttpMethod`](types#httpmethod) | HTTP method |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
| `url` | `string` | Request URL |
***
## RealtimeError
Base error class for all real-time (WebSocket) SDK errors
### Extends
* [`SonioxError`](classes#sonioxerror)
### Extended by
* [`AuthError`](classes#autherror)
* [`BadRequestError`](classes#badrequesterror)
* [`QuotaError`](classes#quotaerror)
* [`ConnectionError`](classes#connectionerror)
* [`NetworkError`](classes#networkerror)
* [`AbortError`](classes#aborterror)
* [`StateError`](classes#stateerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Overrides**
[`SonioxError`](classes#sonioxerror).[`toJSON`](classes#sonioxerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Overrides**
[`SonioxError`](classes#sonioxerror).[`toString`](classes#sonioxerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## AuthError
Authentication error (401).
Thrown when the API key is invalid or expired.
### Extends
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## BadRequestError
Bad request error (400).
Thrown for invalid configuration or parameters.
### Extends
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## QuotaError
Quota error (402, 429).
Thrown when rate limits are exceeded or quota is exhausted.
### Extends
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## ConnectionError
Connection error.
Thrown for WebSocket connection failures and transport errors.
### Extends
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## NetworkError
Network error.
Thrown for server-side network issues (408, 500, 503).
### Extends
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## AbortError
Abort error.
Thrown when an operation is cancelled via AbortSignal.
### Extends
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## StateError
State error.
Thrown when an operation is attempted in an invalid state.
### Extends
* [`RealtimeError`](classes#realtimeerror)
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson)
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
[`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring)
### Properties
| Property | Type | Description |
| ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code |
| `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
# Full Node SDK reference
URL: /stt/SDKs/node-SDK/reference
Full SDK reference for the Node SDK
## Environment variables
Environment variables are used to configure the client. You can set them in your environment or pass them explicitly to the client.
| Variable | Description | Default |
| --------------------------- | ---------------------------- | ---------------------------------------------- |
| `SONIOX_API_KEY` | API key for REST requests | - |
| `SONIOX_API_BASE_URL` | REST base URL | `https://api.soniox.com` |
| `SONIOX_WS_URL` | Real-time WebSocket base URL | `wss://stt-rt.soniox.com/transcribe-websocket` |
| `SONIOX_API_WEBHOOK_HEADER` | Webhook auth header name | - |
| `SONIOX_API_WEBHOOK_SECRET` | Webhook auth header value | - |
## Client
### Available client methods
| Method | Description |
| ------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- |
| [`client.files.upload(file, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-upload) | Upload file |
| [`client.files.list(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-list) | List files |
| [`client.files.get(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-get) | Get file |
| [`client.files.delete(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete) | Delete file |
| [`client.files.delete_all()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete_all) | Delete all files |
| --- | --- |
| [`client.stt.create(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-create) | Create transcription |
| [`client.stt.list(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-list) | List transcriptions |
| [`client.stt.get(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-get) | Get transcription |
| [`client.stt.getTranscript(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) | Get transcription transcript |
| [`client.stt.delete(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete) | Delete transcription |
| [`client.stt.destroy(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy) | Delete transcription and its file |
| [`client.stt.wait(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-wait) | Wait for transcription to complete |
| [`client.stt.transcribe(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribe) | Transcribe audio |
| [`client.stt.transcribeFromUrl(url, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribefromurl) | Transcribe audio from URL |
| [`client.stt.transcribeFromFile(file, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribefromfile) | Transcribe audio from file |
| [`client.stt.transcribeFromFileId(id, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribefromfileid) | Transcribe audio from file ID |
| [`client.stt.delete_all()`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete_all) | Delete all transcriptions |
| [`client.stt.destroy_all()`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy_all) | Delete all transcriptions and their files |
| --- | --- |
| [`client.models.list()`](/stt/SDKs/node-SDK/reference/classes#sonioxmodelsapi-list) | List available models |
| --- | --- |
| [`client.webhooks.handle(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handle) | Handle webhook |
| [`client.webhooks.handleRequest(request, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlerequest) | Handle webhook with Fetch API |
| [`client.webhooks.handleExpress(req, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handleexpress) | Handle webhook with Express |
| [`client.webhooks.handleFastify(req, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlefastify) | Handle webhook with Fastify |
| [`client.webhooks.handleNestJS(req, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlenestjs) | Handle webhook with NestJS |
| [`client.webhooks.handleHono(c, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlehono) | Handle webhook with Hono |
| [`client.webhooks.getAuthFromEnv()`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-getauthfromenv) | Get webhook auth from environment variables |
| [`client.webhooks.verifyAuth(headers, auth)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-verifyauth) | Verify webhook auth |
| [`client.webhooks.parseEvent(body)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-parseevent) | Parse webhook event |
| [`client.webhooks.isEvent(body)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-isevent) | Check if body is a webhook event |
| --- | --- |
| [`client.auth.createTemporaryKey(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxauthapi-createtemporarykey) | Create temporary API key |
| --- | --- |
| [`client.realtime.stt(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxrealtimeapi-stt) | Create real-time session |
## File
### Available file instance methods
| Method | Description |
| ------------------------------------------------------------------------- | ----------- |
| [`file.delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxfile-delete) | Delete file |
## Transcription
### Available transcription instance methods
| Method | Description |
| --------------------------------------------------------------------------------------------------------- | ---------------------------------- |
| [`transcription.getTranscript()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-gettranscript) | Get transcription transcript |
| [`transcription.delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-delete) | Delete transcription |
| [`transcription.destroy()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-destroy) | Delete transcription and its file |
| [`transcription.wait()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-wait) | Wait for transcription to complete |
| [`transcription.refresh()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-refresh) | Refresh transcription |
## Transcript
| Method | Description |
| ----------------------------------------------------------------------------------------- | ----------------------- |
| [`transcript.segments()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscript-segments) | Get transcript segments |
| [`transcript.text`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscript-text) | Transcript text |
| [`transcript.tokens`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscript-tokens) | Transcript tokens |
## Real-time STT Session
| Method | Description |
| ---------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ |
| [`realtime.stt.connect()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-connect) | Establish websocket connection |
| [`realtime.stt.close()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-close) | Close websocket connection |
| [`realtime.stt.finalize()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-finalize) | Request server to finalize current transcription |
| [`realtime.stt.finish()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-finish) | Gracefully finish the session |
| [`realtime.stt.keepAlive()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-keepalive) | Send keepalive message |
| [`realtime.stt.off()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-off) | Remove event handler |
| [`realtime.stt.on()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-on) | Register event handler |
| [`realtime.stt.once()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-once) | Register one-time event handler |
| [`realtime.stt.sendAudio(audio)`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendaudio) | Send audio chunk |
| [`realtime.stt.sendStream(stream, options)`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendstream) | Send audio stream |
| [`realtime.stt.pause()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-pause) | Pause audio transmission |
| [`realtime.stt.resume()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-resume) | Resume audio transmission |
| [`realtime.stt.paused`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-paused) | Whether the session is currently paused |
| [`realtime.stt.state`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-state) | Current session state |
# Types
URL: /stt/SDKs/node-SDK/reference/types
Soniox Node SDK — Types Reference
## AudioData
```ts
type AudioData = Buffer | Uint8Array | ArrayBuffer;
```
Audio data types accepted by sendAudio.
In Node.js, Buffer is also accepted since Buffer extends Uint8Array.
***
## AudioFormat
```ts
type AudioFormat =
| "pcm_s8"
| "pcm_s8le"
| "pcm_s8be"
| "pcm_s16le"
| "pcm_s16be"
| "pcm_s24le"
| "pcm_s24be"
| "pcm_s32le"
| "pcm_s32be"
| "pcm_u8"
| "pcm_u8le"
| "pcm_u8be"
| "pcm_u16le"
| "pcm_u16be"
| "pcm_u24le"
| "pcm_u24be"
| "pcm_u32le"
| "pcm_u32be"
| "pcm_f32le"
| "pcm_f32be"
| "pcm_f64le"
| "pcm_f64be"
| "mulaw"
| "alaw"
| "aac"
| "aiff"
| "amr"
| "asf"
| "wav"
| "mp3"
| "flac"
| "ogg"
| "webm";
```
Supported audio formats for real-time transcription.
***
## CleanupTarget
```ts
type CleanupTarget = "file" | "transcription";
```
Resource types that can be cleaned up after transcription completes.
* `'file'` - The uploaded file
* `'transcription'` - The transcription record
***
## ContextGeneralEntry
```ts
type ContextGeneralEntry = {
key: string;
value: string;
};
```
Key-value pair for general context information.
**Properties**
| Property | Type | Description |
| -------- | -------- | ------------------------------------------------------------------------ |
| `key` | `string` | The key describing the context type (e.g., "domain", "topic", "doctor"). |
| `value` | `string` | The value for the context key. |
***
## ContextTranslationTerm
```ts
type ContextTranslationTerm = {
source: string;
target: string;
};
```
Custom translation term mapping.
**Properties**
| Property | Type | Description |
| -------- | -------- | ------------------------------------ |
| `source` | `string` | The source term to translate. |
| `target` | `string` | The target translation for the term. |
***
## CreateTranscriptionOptions
```ts
type CreateTranscriptionOptions = {
audio_url?: string;
client_reference_id?: string;
context?: TranscriptionContext;
enable_language_identification?: boolean;
enable_speaker_diarization?: boolean;
file_id?: string;
language_hints?: string[];
language_hints_strict?: boolean;
model: string;
translation?: TranslationConfig;
webhook_auth_header_name?: string;
webhook_auth_header_value?: string;
webhook_url?: string;
};
```
Options for creating a transcription.
**Properties**
| Property | Type | Description |
| --------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------- |
| `audio_url?` | `string` | URL of a publicly accessible audio file. **Max Length** 4096 |
| `client_reference_id?` | `string` | Optional tracking identifier. **Max Length** 256 |
| `context?` | [`TranscriptionContext`](types#transcriptioncontext) | Additional context to improve transcription accuracy and formatting of specialized terms. |
| `enable_language_identification?` | `boolean` | Enable automatic language identification. |
| `enable_speaker_diarization?` | `boolean` | Enable speaker diarization to identify different speakers. |
| `file_id?` | `string` | ID of a previously uploaded file. **Format** uuid |
| `language_hints?` | `string`\[] | Array of expected ISO language codes to bias recognition. |
| `language_hints_strict?` | `boolean` | When true, model relies more heavily on language hints. |
| `model` | `string` | Speech-to-text model to use. **Max Length** 32 |
| `translation?` | [`TranslationConfig`](types#translationconfig) | Translation configuration. |
| `webhook_auth_header_name?` | `string` | Name of the authentication header sent with webhook notifications. **Max Length** 256 |
| `webhook_auth_header_value?` | `string` | Authentication header value sent with webhook notifications. **Max Length** 256 |
| `webhook_url?` | `string` | URL to receive webhook notifications when transcription is completed or fails. **Max Length** 256 |
***
## DeleteAllFilesOptions
```ts
type DeleteAllFilesOptions = {
signal?: AbortSignal;
};
```
Options for purging all files.
**Properties**
| Property | Type | Description |
| --------- | ------------- | ----------------------------------------------------- |
| `signal?` | `AbortSignal` | AbortSignal for cancelling the delete\_all operation. |
***
## DeleteAllTranscriptionsOptions
```ts
type DeleteAllTranscriptionsOptions = {
on_progress?: (transcription, index) => void;
signal?: AbortSignal;
};
```
Options for deleting all transcriptions.
**Properties**
| Property | Type | Description |
| -------------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------- |
| `on_progress?` | (`transcription`, `index`) => `void` | Callback invoked before each transcription is deleted. Receives the transcription data and its 0-based index. |
| `signal?` | `AbortSignal` | AbortSignal for cancelling the delete\_all operation. |
***
## ExpressLikeRequest
```ts
type ExpressLikeRequest = {
body?: unknown;
headers: Record;
method: string;
};
```
Express/Connect-style request object
**Properties**
| Property | Type |
| --------- | ----------------------------------------------------------- |
| `body?` | `unknown` |
| `headers` | `Record`\<`string`, `string` \| `string`\[] \| `undefined`> |
| `method` | `string` |
***
## FastifyLikeRequest
```ts
type FastifyLikeRequest = {
body?: unknown;
headers: Record;
method: string;
};
```
Fastify-style request object
**Properties**
| Property | Type |
| --------- | ----------------------------------------------------------- |
| `body?` | `unknown` |
| `headers` | `Record`\<`string`, `string` \| `string`\[] \| `undefined`> |
| `method` | `string` |
***
## FileIdentifier
```ts
type FileIdentifier =
| string
| {
id: string;
};
```
File identifier - either a string ID or an object with an id property.
***
## HandleWebhookOptions
```ts
type HandleWebhookOptions = {
auth?: WebhookAuthConfig;
body: unknown;
headers: WebhookHeaders;
method: string;
};
```
Options for the handleWebhook function
**Properties**
| Property | Type | Description |
| --------- | ---------------------------------------------- | ---------------------------------------- |
| `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) | Optional authentication configuration |
| `body` | `unknown` | Request body (parsed JSON or raw string) |
| `headers` | [`WebhookHeaders`](types#webhookheaders) | Request headers |
| `method` | `string` | HTTP method of the request |
***
## HonoLikeContext
```ts
type HonoLikeContext = {
req: {
method: string;
header: string | undefined;
json: Promise;
};
};
```
Hono context object
**Properties**
| Property | Type |
| ------------ | ------------------------------------------------------------------------------------------ |
| `req` | \{ `method`: `string`; `header`: `string` \| `undefined`; `json`: `Promise`\<`unknown`>; } |
| `req.method` | `string` |
| `req.header` | `string` \| `undefined` |
| `req.json` | `Promise`\<`unknown`> |
***
## HttpErrorCode
```ts
type HttpErrorCode = "network_error" | "timeout" | "aborted" | "http_error" | "parse_error";
```
Error codes for HTTP client errors
***
## HttpMethod
```ts
type HttpMethod = "GET" | "POST" | "PUT" | "PATCH" | "DELETE" | "HEAD";
```
HTTP methods supported by the client
***
## HttpRequestBody
```ts
type HttpRequestBody =
| string
| Record
| ArrayBuffer
| Uint8Array
| FormData
| null;
```
Request body types
***
## HttpResponseType
```ts
type HttpResponseType = "json" | "text" | "arrayBuffer";
```
Response types
***
## ListFilesOptions
```ts
type ListFilesOptions = {
cursor?: string;
limit?: number;
signal?: AbortSignal;
};
```
Options for listing files.
**Properties**
| Property | Type | Description |
| --------- | ------------- | ------------------------------------------------------------------------------------ |
| `cursor?` | `string` | Pagination cursor for the next page of results. |
| `limit?` | `number` | Maximum number of files to return. **Default** `1000` **Minimum** 1 **Maximum** 1000 |
| `signal?` | `AbortSignal` | AbortSignal for cancelling the request |
***
## ListFilesResponse\
```ts
type ListFilesResponse = {
files: T[];
next_page_cursor: string | null;
};
```
Response from listing files.
**Type Parameters**
| Type Parameter |
| -------------- |
| `T` |
**Properties**
| Property | Type | Description |
| ------------------ | ------------------ | ------------------------------------------------------------------------------------------------------------ |
| `files` | `T`\[] | List of uploaded files. |
| `next_page_cursor` | `string` \| `null` | A pagination token that references the next page of results. When null, no additional results are available. |
***
## ListTranscriptionsOptions
```ts
type ListTranscriptionsOptions = {
cursor?: string;
limit?: number;
};
```
Options for listing transcriptions
**Properties**
| Property | Type | Description |
| --------- | -------- | --------------------------------------------------------------------------------------------- |
| `cursor?` | `string` | Pagination cursor for the next page of results |
| `limit?` | `number` | Maximum number of transcriptions to return. **Default** `1000` **Minimum** 1 **Maximum** 1000 |
***
## ListTranscriptionsResponse\
```ts
type ListTranscriptionsResponse = {
next_page_cursor: string | null;
transcriptions: T[];
};
```
Response from listing transcriptions.
**Type Parameters**
| Type Parameter |
| -------------- |
| `T` |
**Properties**
| Property | Type | Description |
| ------------------ | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `next_page_cursor` | `string` \| `null` | A pagination token that references the next page of results. When null, no additional results are available. TODO: potentially can be undefined? |
| `transcriptions` | `T`\[] | List of transcriptions. |
***
## NestJSLikeRequest
```ts
type NestJSLikeRequest = {
body?: unknown;
headers: Record;
method: string;
};
```
NestJS-style request object (uses Express under the hood by default)
**Properties**
| Property | Type |
| --------- | ----------------------------------------------------------- |
| `body?` | `unknown` |
| `headers` | `Record`\<`string`, `string` \| `string`\[] \| `undefined`> |
| `method` | `string` |
***
## OneWayTranslationConfig
```ts
type OneWayTranslationConfig = {
target_language: string;
type: "one_way";
};
```
One-way translation configuration.
Translates all spoken languages into a single target language.
**Properties**
| Property | Type | Description |
| ----------------- | ----------- | -------------------------------------------------------------- |
| `target_language` | `string` | Target language code for translation (e.g., "fr", "es", "de"). |
| `type` | `"one_way"` | Translation type. |
***
## QueryParams
```ts
type QueryParams = Record;
```
Query parameters
***
## RealtimeClientOptions
```ts
type RealtimeClientOptions = {
api_key: string;
default_session_options?: SttSessionOptions;
ws_base_url: string;
};
```
Real-time API configuration options for the client.
**Properties**
| Property | Type | Description |
| -------------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------- |
| `api_key` | `string` | API key for real-time sessions. |
| `default_session_options?` | [`SttSessionOptions`](types#sttsessionoptions) | Default session options applied to all real-time sessions. Can be overridden per-session. |
| `ws_base_url` | `string` | WebSocket base URL for real-time connections. **Default** `'wss://stt-rt.soniox.com/transcribe-websocket'` |
***
## RealtimeErrorCode
```ts
type RealtimeErrorCode =
| "auth_error"
| "bad_request"
| "quota_exceeded"
| "connection_error"
| "network_error"
| "aborted"
| "state_error"
| "realtime_error";
```
Error codes for Real-time (WebSocket) API errors
***
## RealtimeEvent
```ts
type RealtimeEvent =
| {
data: RealtimeResult;
kind: "result";
}
| {
kind: "endpoint";
}
| {
kind: "finalized";
}
| {
kind: "finished";
};
```
Typed event for async iterator consumption.
***
## RealtimeOptions
```ts
type RealtimeOptions = {
default_session_options?: SttSessionOptions;
ws_base_url?: string;
};
```
Real-time configuration options for the main client.
**Properties**
| Property | Type | Description |
| -------------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `default_session_options?` | [`SttSessionOptions`](types#sttsessionoptions) | Default session options applied to all real-time sessions. Can be overridden per-session. |
| `ws_base_url?` | `string` | WebSocket base URL for real-time connections. Falls back to SONIOX\_WS\_URL environment variable, then to 'wss\://stt-rt.soniox.com/transcribe-websocket'. |
***
## RealtimeResult
```ts
type RealtimeResult = {
final_audio_proc_ms: number;
finished?: boolean;
tokens: RealtimeToken[];
total_audio_proc_ms: number;
};
```
A result message from the real-time WebSocket.
**Properties**
| Property | Type | Description |
| --------------------- | ----------------------------------------- | -------------------------------------------------- |
| `final_audio_proc_ms` | `number` | Milliseconds of audio that have been finalized. |
| `finished?` | `boolean` | Whether this is the final result (session ending). |
| `tokens` | [`RealtimeToken`](types#realtimetoken)\[] | Tokens in this result. |
| `total_audio_proc_ms` | `number` | Total milliseconds of audio processed. |
***
## RealtimeSegment
```ts
type RealtimeSegment = {
end_ms?: number;
language?: string;
speaker?: string;
start_ms?: number;
text: string;
tokens: RealtimeToken[];
};
```
A segment of contiguous real-time tokens grouped by speaker/language.
**Properties**
| Property | Type | Description |
| ----------- | ----------------------------------------- | ------------------------------------------------------------- |
| `end_ms?` | `number` | End time of the segment in milliseconds (from last token). |
| `language?` | `string` | Detected language code (if language identification enabled). |
| `speaker?` | `string` | Speaker identifier (if diarization enabled). |
| `start_ms?` | `number` | Start time of the segment in milliseconds (from first token). |
| `text` | `string` | Concatenated text of all tokens in this segment. |
| `tokens` | [`RealtimeToken`](types#realtimetoken)\[] | Original tokens in this segment. |
***
## RealtimeSegmentBufferOptions
```ts
type RealtimeSegmentBufferOptions = {
final_only?: boolean;
group_by?: SegmentGroupKey[];
max_ms?: number;
max_tokens?: number;
};
```
Options for rolling real-time segmentation buffers.
**Properties**
| Property | Type | Description |
| ------------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| `final_only?` | `boolean` | When true, only tokens marked as final are buffered. **Default** `true` |
| `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` |
| `max_ms?` | `number` | Maximum time window to keep in milliseconds (requires token timings). |
| `max_tokens?` | `number` | Maximum number of tokens to keep in the buffer. **Default** `2000` |
***
## RealtimeSegmentOptions
```ts
type RealtimeSegmentOptions = {
final_only?: boolean;
group_by?: SegmentGroupKey[];
};
```
Options for segmenting real-time tokens.
**Properties**
| Property | Type | Description |
| ------------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| `final_only?` | `boolean` | When true, only tokens marked as final are included. **Default** `false` |
| `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` |
***
## RealtimeToken
```ts
type RealtimeToken = {
confidence: number;
end_ms?: number;
is_final: boolean;
language?: string;
source_language?: string;
speaker?: string;
start_ms?: number;
text: string;
translation_status?: "none" | "original" | "translation";
};
```
A single token from the real-time transcription.
**Properties**
| Property | Type | Description |
| --------------------- | ------------------------------------------- | ------------------------------------------------------------ |
| `confidence` | `number` | Confidence score (0.0 to 1.0). |
| `end_ms?` | `number` | End time in milliseconds relative to audio start. |
| `is_final` | `boolean` | Whether this is a finalized token. |
| `language?` | `string` | Detected language code (if language identification enabled). |
| `source_language?` | `string` | Source language for translated tokens. |
| `speaker?` | `string` | Speaker identifier (if diarization enabled). |
| `start_ms?` | `number` | Start time in milliseconds relative to audio start. |
| `text` | `string` | The transcribed text. |
| `translation_status?` | `"none"` \| `"original"` \| `"translation"` | Translation status of this token. |
***
## RealtimeUtterance
```ts
type RealtimeUtterance = {
end_ms?: number;
final_audio_proc_ms?: number;
language?: string;
segments: RealtimeSegment[];
speaker?: string;
start_ms?: number;
text: string;
tokens: RealtimeToken[];
total_audio_proc_ms?: number;
};
```
A single utterance built from real-time segments.
**Properties**
| Property | Type | Description |
| ---------------------- | --------------------------------------------- | ----------------------------------------------------------------- |
| `end_ms?` | `number` | End time of the utterance in milliseconds (from last segment). |
| `final_audio_proc_ms?` | `number` | Milliseconds of audio that have been finalized at flush time. |
| `language?` | `string` | Detected language code when consistent across segments. |
| `segments` | [`RealtimeSegment`](types#realtimesegment)\[] | Segments included in this utterance. |
| `speaker?` | `string` | Speaker identifier when consistent across segments. |
| `start_ms?` | `number` | Start time of the utterance in milliseconds (from first segment). |
| `text` | `string` | Concatenated text of all segments in this utterance. |
| `tokens` | [`RealtimeToken`](types#realtimetoken)\[] | Tokens included in this utterance. |
| `total_audio_proc_ms?` | `number` | Total milliseconds of audio processed at flush time. |
***
## RealtimeUtteranceBufferOptions
```ts
type RealtimeUtteranceBufferOptions = {
final_only?: boolean;
group_by?: SegmentGroupKey[];
max_ms?: number;
max_tokens?: number;
};
```
Options for buffering real-time utterances.
**Properties**
| Property | Type | Description |
| ------------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| `final_only?` | `boolean` | When true, only tokens marked as final are buffered. **Default** `true` |
| `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` |
| `max_ms?` | `number` | Maximum time window to keep in milliseconds (requires token timings). |
| `max_tokens?` | `number` | Maximum number of tokens to keep in the buffer. **Default** `2000` |
***
## SegmentGroupKey
```ts
type SegmentGroupKey = "speaker" | "language";
```
Fields that can be used to group tokens into segments
***
## SegmentTranscriptOptions
```ts
type SegmentTranscriptOptions = {
group_by?: SegmentGroupKey[];
};
```
Options for segmenting a transcript
**Properties**
| Property | Type | Description |
| ----------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` |
***
## SendStreamOptions
```ts
type SendStreamOptions = {
finish?: boolean;
pace_ms?: number;
};
```
Options for streaming audio from an async iterable source.
**Properties**
| Property | Type | Description |
| ---------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `finish?` | `boolean` | When true, calls finish() automatically after the stream ends. **Default** `false` |
| `pace_ms?` | `number` | Delay in milliseconds between sending chunks. Useful for simulating real-time pace when streaming pre-recorded files. Not needed for live audio sources. |
***
## SonioxErrorCode
```ts
type SonioxErrorCode =
| RealtimeErrorCode
| "soniox_error"
| HttpErrorCode;
```
All possible SDK error codes (core real-time + HTTP-specific codes)
***
## SonioxFileData
```ts
type SonioxFileData = {
client_reference_id?: string | null;
created_at: string;
filename: string;
id: string;
size: number;
};
```
Raw file metadata from the API.
**Properties**
| Property | Type | Description |
| ---------------------- | ------------------ | ------------------------------------------------------------------------- |
| `client_reference_id?` | `string` \| `null` | Optional tracking identifier string. |
| `created_at` | `string` | UTC timestamp indicating when the file was uploaded. **Format** date-time |
| `filename` | `string` | Name of the file. |
| `id` | `string` | Unique identifier of the file. **Format** uuid |
| `size` | `number` | Size of the file in bytes. |
***
## SonioxLanguage
```ts
type SonioxLanguage = {
code: string;
name: string;
};
```
**Properties**
| Property | Type | Description |
| -------- | -------- | ----------------------- |
| `code` | `string` | 2-letter language code. |
| `name` | `string` | Language name. |
***
## SonioxModel
```ts
type SonioxModel = {
aliased_model_id: string | null;
context_version: number | null;
id: string;
languages: SonioxLanguage[];
name: string;
one_way_translation: string | null;
supports_language_hints_strict: boolean;
supports_max_endpoint_delay: boolean;
transcription_mode: SonioxTranscriptionMode;
translation_targets: SonioxTranslationTarget[];
two_way_translation: string | null;
two_way_translation_pairs: string[];
};
```
**Properties**
| Property | Type | Description |
| -------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| `aliased_model_id` | `string` \| `null` | If this is an alias, the id of the aliased model. Null for non-alias models. |
| `context_version` | `number` \| `null` | Version of context supported. |
| `id` | `string` | Unique identifier of the model. |
| `languages` | [`SonioxLanguage`](types#sonioxlanguage)\[] | List of languages supported by the model. |
| `name` | `string` | Name of the model. |
| `one_way_translation` | `string` \| `null` | When contains string 'all\_languages', any laguage from languages can be used |
| `supports_language_hints_strict` | `boolean` | TODO: Add documentation |
| `supports_max_endpoint_delay` | `boolean` | - |
| `transcription_mode` | [`SonioxTranscriptionMode`](types#sonioxtranscriptionmode) | Transcription mode of the model. |
| `translation_targets` | [`SonioxTranslationTarget`](types#sonioxtranslationtarget)\[] | List of supported one-way translation targets. If list is empty, check for one\_way\_translation field |
| `two_way_translation` | `string` \| `null` | When contains string 'all\_languages',' any laguage pair from languages can be used |
| `two_way_translation_pairs` | `string`\[] | List of supported two-way translation pairs. If list is empty, check for two\_way\_translation field |
***
## SonioxNodeClientOptions
```ts
type SonioxNodeClientOptions = {
api_key?: string;
base_url?: string;
http_client?: HttpClient;
realtime?: RealtimeOptions;
};
```
**Properties**
| Property | Type | Description |
| -------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- |
| `api_key?` | `string` | API key for authentication. Falls back to SONIOX\_API\_KEY environment variable if not provided. |
| `base_url?` | `string` | Base URL for the REST API. Falls back to SONIOX\_API\_BASE\_URL environment variable, then to '[https://api.soniox.com](https://api.soniox.com)'. |
| `http_client?` | [`HttpClient`](types#httpclient) | Custom HTTP client implementation. |
| `realtime?` | [`RealtimeOptions`](types#realtimeoptions) | Real-time API configuration options. |
***
## SonioxTranscriptionData
```ts
type SonioxTranscriptionData = {
audio_duration_ms?: number | null;
audio_url?: string | null;
client_reference_id?: string | null;
context?: TranscriptionContext | null;
created_at: string;
enable_language_identification: boolean;
enable_speaker_diarization: boolean;
error_message?: string | null;
error_type?: string | null;
file_id?: string | null;
filename: string;
id: string;
language_hints?: string[] | null;
model: string;
status: TranscriptionStatus;
webhook_auth_header_name?: string | null;
webhook_auth_header_value?: string | null;
webhook_status_code?: number | null;
webhook_url?: string | null;
};
```
Raw transcription metadata from the API.
**Properties**
| Property | Type | Description |
| -------------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------------------------------------- |
| `audio_duration_ms?` | `number` \| `null` | Duration of the audio in milliseconds. Only available after processing begins. |
| `audio_url?` | `string` \| `null` | URL of the audio file being transcribed. |
| `client_reference_id?` | `string` \| `null` | Optional tracking identifier. **Max Length** 256 |
| `context?` | [`TranscriptionContext`](types#transcriptioncontext) \| `null` | Additional context provided for the transcription. |
| `created_at` | `string` | UTC timestamp when the transcription was created. **Format** date-time |
| `enable_language_identification` | `boolean` | When true, language is detected for each part of the transcription. |
| `enable_speaker_diarization` | `boolean` | When true, speakers are identified and separated in the transcription output. |
| `error_message?` | `string` \| `null` | Error message if transcription failed. Null for successful or in-progress transcriptions. |
| `error_type?` | `string` \| `null` | Error type if transcription failed. Null for successful or in-progress transcriptions. |
| `file_id?` | `string` \| `null` | ID of the uploaded file being transcribed. **Format** uuid |
| `filename` | `string` | Name of the file being transcribed. |
| `id` | `string` | Unique identifier of the transcription. **Format** uuid |
| `language_hints?` | `string`\[] \| `null` | Expected languages in the audio. If not specified, languages are automatically detected. |
| `model` | `string` | Speech-to-text model used. |
| `status` | [`TranscriptionStatus`](types#transcriptionstatus) | Current status of the transcription. |
| `webhook_auth_header_name?` | `string` \| `null` | Name of the authentication header sent with webhook notifications. |
| `webhook_auth_header_value?` | `string` \| `null` | Authentication header value. Always returned masked. |
| `webhook_status_code?` | `number` \| `null` | HTTP status code received from your server when webhook was delivered. Null if not yet sent. |
| `webhook_url?` | `string` \| `null` | URL to receive webhook notifications when transcription is completed or fails. |
***
## SonioxTranscriptionMode
```ts
type SonioxTranscriptionMode = "real_time" | "async";
```
Transcription mode of the model.
***
## SonioxTranslationTarget
```ts
type SonioxTranslationTarget = {
exclude_source_languages: string[];
source_languages: string[];
target_language: string;
};
```
**Properties**
| Property | Type |
| -------------------------- | ----------- |
| `exclude_source_languages` | `string`\[] |
| `source_languages` | `string`\[] |
| `target_language` | `string` |
***
## SttSessionConfig
```ts
type SttSessionConfig = {
audio_format?: "auto" | AudioFormat;
client_reference_id?: string;
context?: TranscriptionContext;
enable_endpoint_detection?: boolean;
enable_language_identification?: boolean;
enable_speaker_diarization?: boolean;
language_hints?: string[];
language_hints_strict?: boolean;
max_endpoint_delay_ms?: number;
model: string;
num_channels?: number;
sample_rate?: number;
translation?: TranslationConfig;
};
```
Configuration sent to the Soniox WebSocket API when starting a session.
**Properties**
| Property | Type | Description |
| --------------------------------- | ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `audio_format?` | `"auto"` \| [`AudioFormat`](types#audioformat) | Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample\_rate and num\_channels. **Default** `'auto'` |
| `client_reference_id?` | `string` | Optional tracking identifier (max 256 chars). |
| `context?` | [`TranscriptionContext`](types#transcriptioncontext) | Additional context to improve transcription accuracy. |
| `enable_endpoint_detection?` | `boolean` | Enable endpoint detection for utterance boundaries. Useful for voice AI agents. |
| `enable_language_identification?` | `boolean` | Enable automatic language detection. |
| `enable_speaker_diarization?` | `boolean` | Enable speaker identification. |
| `language_hints?` | `string`\[] | Expected languages in the audio (ISO language codes). |
| `language_hints_strict?` | `boolean` | When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee. |
| `max_endpoint_delay_ms?` | `number` | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms |
| `model` | `string` | Speech-to-text model to use. |
| `num_channels?` | `number` | Number of audio channels (required for raw audio formats). |
| `sample_rate?` | `number` | Sample rate in Hz (required for PCM formats). |
| `translation?` | [`TranslationConfig`](types#translationconfig) | Translation configuration. |
***
## SttSessionEvents
```ts
type SttSessionEvents = {
connected: () => void;
disconnected: (reason?) => void;
endpoint: () => void;
error: (error) => void;
finalized: () => void;
finished: () => void;
result: (result) => void;
state_change: (update) => void;
token: (token) => void;
};
```
Event handlers for the STT session.
**Properties**
| Property | Type | Description |
| -------------- | --------------------- | ------------------------------------------------- |
| `connected` | () => `void` | Session connected and ready. |
| `disconnected` | (`reason?`) => `void` | Session disconnected. |
| `endpoint` | () => `void` | Endpoint detected (\ token). |
| `error` | (`error`) => `void` | Error occurred. |
| `finalized` | () => `void` | Finalization complete (\ token). |
| `finished` | () => `void` | Session finished (server signaled end of stream). |
| `result` | (`result`) => `void` | Parsed result received. |
| `state_change` | (`update`) => `void` | Session state transition. |
| `token` | (`token`) => `void` | Individual token received. |
***
## SttSessionOptions
```ts
type SttSessionOptions = {
keepalive_interval_ms?: number;
signal?: AbortSignal;
};
```
SDK-level session options (not sent to the server).
**Properties**
| Property | Type | Description |
| ------------------------ | ------------- | --------------------------------------------------------------------------------------- |
| `keepalive_interval_ms?` | `number` | Interval for sending keepalive messages while paused (milliseconds). **Default** `5000` |
| `signal?` | `AbortSignal` | AbortSignal for cancellation. |
***
## SttSessionState
```ts
type SttSessionState =
| "idle"
| "connecting"
| "connected"
| "finishing"
| "finished"
| "canceled"
| "closed"
| "error";
```
Session lifecycle states.
***
## TemporaryApiKeyRequest
```ts
type TemporaryApiKeyRequest = {
client_reference_id?: string;
expires_in_seconds: number;
usage_type: TemporaryApiKeyUsageType;
};
```
**Properties**
| Property | Type | Description |
| ---------------------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------- |
| `client_reference_id?` | `string` | Optional tracking identifier string. Does not need to be unique **Max Length** 256 |
| `expires_in_seconds` | `number` | Duration in seconds until the temporary API key expires **Minimum** 1 **Maximum** 3600 |
| `usage_type` | [`TemporaryApiKeyUsageType`](types#temporaryapikeyusagetype) | Intended usage of the temporary API key. |
***
## TemporaryApiKeyResponse
```ts
type TemporaryApiKeyResponse = {
api_key: string;
expires_at: string;
};
```
**Properties**
| Property | Type | Description |
| ------------ | -------- | ------------------------------------------------------------------------------------------ |
| `api_key` | `string` | Created temporary API key. |
| `expires_at` | `string` | UTC timestamp indicating when generated temporary API key will expire **Format** date-time |
***
## TemporaryApiKeyUsageType
```ts
type TemporaryApiKeyUsageType = "transcribe_websocket";
```
***
## TranscribeBaseOptions
```ts
type TranscribeBaseOptions = {
cleanup?: CleanupTarget[];
client_reference_id?: string;
context?: TranscriptionContext;
enable_language_identification?: boolean;
enable_speaker_diarization?: boolean;
fetch_transcript?: boolean;
language_hints?: string[];
language_hints_strict?: boolean;
model: string;
signal?: AbortSignal;
timeout_ms?: number;
translation?: TranslationConfig;
wait?: boolean;
wait_options?: WaitOptions;
webhook_auth_header_name?: string;
webhook_auth_header_value?: string;
webhook_query?: string | URLSearchParams | Record;
webhook_url?: string;
};
```
Base options shared by all audio source variants.
**Properties**
| Property | Type | Description |
| --------------------------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cleanup?` | [`CleanupTarget`](types#cleanuptarget)\[] | Resources to clean up after transcription completes or on error/timeout. Only applies when `wait: true`. Cleanup runs in all cases when `wait: true`: - After successful completion - After transcription errors (status: 'error') - On timeout or abort This ensures no orphaned resources are left behind. **Example** `// Delete only the uploaded file cleanup: ['file'] // Delete only the transcription record cleanup: ['transcription'] // Delete both file and transcription cleanup: ['file', 'transcription']` |
| `client_reference_id?` | `string` | Optional tracking identifier. **Max Length** 256 |
| `context?` | [`TranscriptionContext`](types#transcriptioncontext) | Additional context to improve transcription accuracy and formatting of specialized terms. |
| `enable_language_identification?` | `boolean` | Enable automatic language identification. |
| `enable_speaker_diarization?` | `boolean` | Enable speaker diarization to identify different speakers. |
| `fetch_transcript?` | `boolean` | When true (default), fetches the transcript and attaches it to the result when wait=true and the transcription completes successfully. Set to false to skip fetching the full transcript payload. **Default** `true` |
| `language_hints?` | `string`\[] | Array of expected ISO language codes to bias recognition. |
| `language_hints_strict?` | `boolean` | When true, model relies more heavily on language hints. |
| `model` | `string` | Speech-to-text model to use. **Max Length** 32 |
| `signal?` | `AbortSignal` | AbortSignal to cancel the operation |
| `timeout_ms?` | `number` | Timeout in milliseconds |
| `translation?` | [`TranslationConfig`](types#translationconfig) | Translation configuration. |
| `wait?` | `boolean` | When true, waits for transcription to complete before returning. **Default** `false` |
| `wait_options?` | [`WaitOptions`](types#waitoptions) | Options for waiting (only used when wait=true). |
| `webhook_auth_header_name?` | `string` | Name of the authentication header sent with webhook notifications. **Max Length** 256 |
| `webhook_auth_header_value?` | `string` | Authentication header value sent with webhook notifications. **Max Length** 256 |
| `webhook_query?` | `string` \| `URLSearchParams` \| `Record`\<`string`, `string`> | Query parameters to append to the webhook URL. Useful for encoding metadata like transcription ID in the webhook callback. Can be a string, URLSearchParams, or Record\. |
| `webhook_url?` | `string` | URL to receive webhook notifications when transcription is completed or fails. **Max Length** 256 |
***
## TranscribeFromFile
```ts
type TranscribeFromFile = TranscribeBaseOptions & {
audio_url?: never;
file: UploadFileInput;
file_id?: never;
filename?: string;
};
```
Transcribe from a direct file upload (Buffer, Uint8Array, Blob, or ReadableStream)
**Type Declaration**
| Name | Type | Description |
| ------------ | ------------------------------------------ | ----------------------------------- |
| `audio_url?` | `never` | - |
| `file` | [`UploadFileInput`](types#uploadfileinput) | File data to upload and transcribe. |
| `file_id?` | `never` | - |
| `filename?` | `string` | - |
***
## TranscribeFromFileId
```ts
type TranscribeFromFileId = TranscribeBaseOptions & {
audio_url?: never;
file?: never;
file_id: string;
filename?: never;
};
```
Transcribe from a previously uploaded file
**Type Declaration**
| Name | Type | Description |
| ------------ | -------- | ------------------------------------------------- |
| `audio_url?` | `never` | - |
| `file?` | `never` | - |
| `file_id` | `string` | ID of a previously uploaded file. **Format** uuid |
| `filename?` | `never` | - |
***
## TranscribeFromFileIdOptions
```ts
type TranscribeFromFileIdOptions = Omit;
```
Options for transcribing from an uploaded file ID via `transcribeFromFileId`.
***
## TranscribeFromFileOptions
```ts
type TranscribeFromFileOptions = Omit;
```
Options for transcribing from a file via `transcribeFromFile`.
***
## TranscribeFromUrl
```ts
type TranscribeFromUrl = TranscribeBaseOptions & {
audio_url: string;
file?: never;
file_id?: never;
filename?: never;
};
```
Transcribe from a publicly accessible audio URL
**Type Declaration**
| Name | Type | Description |
| ----------- | -------- | ------------------------------------------------------------ |
| `audio_url` | `string` | URL of a publicly accessible audio file. **Max Length** 4096 |
| `file?` | `never` | - |
| `file_id?` | `never` | - |
| `filename?` | `never` | - |
***
## TranscribeFromUrlOptions
```ts
type TranscribeFromUrlOptions = Omit;
```
Options for transcribing from a URL via `transcribeFromUrl`.
***
## TranscribeOptions
```ts
type TranscribeOptions =
| TranscribeFromFile
| TranscribeFromFileId
| TranscribeFromUrl;
```
Options for the unified transcribe method
Exactly one audio source must be provided: `file`, `file_id`, or `audio_url`
***
## TranscriptResponse
```ts
type TranscriptResponse = {
id: string;
text: string;
tokens: TranscriptToken[];
};
```
Response from getting a transcription transcript.
**Properties**
| Property | Type | Description |
| -------- | --------------------------------------------- | ---------------------------------------------------------------------------------- |
| `id` | `string` | Unique identifier of the transcription this transcript belongs to. **Format** uuid |
| `text` | `string` | Complete transcribed text content. |
| `tokens` | [`TranscriptToken`](types#transcripttoken)\[] | List of detailed token information with timestamps and metadata. |
***
## TranscriptSegment
```ts
type TranscriptSegment = {
end_ms: number;
language?: string;
speaker?: string;
start_ms: number;
text: string;
tokens: TranscriptToken[];
};
```
A segment of contiguous tokens grouped by speaker and language
**Properties**
| Property | Type | Description |
| ----------- | --------------------------------------------- | ---------------------------------------------------------------- |
| `end_ms` | `number` | End time of the segment in milliseconds (from last token). |
| `language?` | `string` | Detected language code (if language identification was enabled). |
| `speaker?` | `string` | Speaker identifier (if speaker diarization was enabled). |
| `start_ms` | `number` | Start time of the segment in milliseconds (from first token). |
| `text` | `string` | Concatenated text of all tokens in this segment. |
| `tokens` | [`TranscriptToken`](types#transcripttoken)\[] | Original tokens in this segment. |
***
## TranscriptToken
```ts
type TranscriptToken = {
confidence: number;
end_ms: number;
is_audio_event?: boolean | null;
language?: string | null;
speaker?: string | null;
start_ms: number;
text: string;
translation_status?: "none" | "original" | "translation" | null;
};
```
A single token from the transcript with timing and confidence information.
**Properties**
| Property | Type | Description |
| --------------------- | ----------------------------------------------------- | ---------------------------------------------------------------- |
| `confidence` | `number` | Confidence score for this token (0.0 to 1.0). |
| `end_ms` | `number` | End time of the token in milliseconds. |
| `is_audio_event?` | `boolean` \| `null` | Whether this token represents an audio event. |
| `language?` | `string` \| `null` | Detected language code (if language identification was enabled). |
| `speaker?` | `string` \| `null` | Speaker identifier (if speaker diarization was enabled). |
| `start_ms` | `number` | Start time of the token in milliseconds. |
| `text` | `string` | The text content of this token. |
| `translation_status?` | `"none"` \| `"original"` \| `"translation"` \| `null` | Translation status for this token. |
***
## TranscriptionContext
```ts
type TranscriptionContext = {
general?: ContextGeneralEntry[];
terms?: string[];
text?: string;
translation_terms?: ContextTranslationTerm[];
};
```
Additional context to improve transcription and translation accuracy.
All sections are optional - include only what's relevant for your use case.
**Properties**
| Property | Type | Description |
| -------------------- | ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| `general?` | [`ContextGeneralEntry`](types#contextgeneralentry)\[] | Structured key-value pairs describing domain, topic, intent, participant names, etc. |
| `terms?` | `string`\[] | Domain-specific or uncommon words to recognize. |
| `text?` | `string` | Longer free-form background text, prior interaction history, reference documents, or meeting notes. |
| `translation_terms?` | [`ContextTranslationTerm`](types#contexttranslationterm)\[] | Custom translations for ambiguous terms. |
***
## TranscriptionIdentifier
```ts
type TranscriptionIdentifier =
| string
| {
id: string;
};
```
Transcription identifier - either a string ID or an object with an id property.
***
## TranscriptionStatus
```ts
type TranscriptionStatus = "queued" | "processing" | "completed" | "error";
```
Status of a transcription request.
***
## TranslationConfig
```ts
type TranslationConfig =
| OneWayTranslationConfig
| TwoWayTranslationConfig;
```
Translation configuration.
***
## TwoWayTranslationConfig
```ts
type TwoWayTranslationConfig = {
language_a: string;
language_b: string;
type: "two_way";
};
```
Two-way translation configuration.
Translates between two specified languages.
**Properties**
| Property | Type | Description |
| ------------ | ----------- | --------------------- |
| `language_a` | `string` | First language code. |
| `language_b` | `string` | Second language code. |
| `type` | `"two_way"` | Translation type. |
***
## UploadFileInput
```ts
type UploadFileInput =
| Buffer
| Uint8Array
| Blob
| ReadableStream
| NodeJS.ReadableStream;
```
Supported input types for file upload
***
## UploadFileOptions
```ts
type UploadFileOptions = {
client_reference_id?: string;
filename?: string;
signal?: AbortSignal;
timeout_ms?: number;
};
```
Options for uploading a file
**Properties**
| Property | Type | Description |
| ---------------------- | ------------- | ---------------------------------------------------------------------------------- |
| `client_reference_id?` | `string` | Optional tracking identifier string. Does not need to be unique **Max Length** 256 |
| `filename?` | `string` | Custom filename for the uploaded file |
| `signal?` | `AbortSignal` | AbortSignal for cancelling the upload |
| `timeout_ms?` | `number` | Request timeout in milliseconds |
***
## WaitOptions
```ts
type WaitOptions = {
interval_ms?: number;
on_status_change?: (status, transcription) => void;
signal?: AbortSignal;
timeout_ms?: number;
};
```
Options for polling/waiting for transcription completion.
**Properties**
| Property | Type | Description |
| ------------------- | ------------------------------------- | ---------------------------------------------------------------------- |
| `interval_ms?` | `number` | Polling interval in milliseconds. **Default** `1000` **Minimum** 1000 |
| `on_status_change?` | (`status`, `transcription`) => `void` | Callback invoked when status changes. |
| `signal?` | `AbortSignal` | AbortSignal to cancel waiting. |
| `timeout_ms?` | `number` | Maximum time to wait in milliseconds. **Default** `300000 (5 minutes)` |
***
## WebhookAuthConfig
```ts
type WebhookAuthConfig = {
name: string;
value: string;
};
```
Authentication configuration for webhook verification
**Properties**
| Property | Type | Description |
| -------- | -------- | -------------------------------------------------- |
| `name` | `string` | Expected header name (case-insensitive comparison) |
| `value` | `string` | Expected header value (exact match) |
***
## WebhookEvent
```ts
type WebhookEvent = {
id: string;
status: WebhookEventStatus;
};
```
Webhook event payload sent by Soniox when a transcription completes or fails.
**Properties**
| Property | Type | Description |
| -------- | ------------------------------------------------ | -------------------------------- |
| `id` | `string` | Transcription ID **Format** uuid |
| `status` | [`WebhookEventStatus`](types#webhookeventstatus) | Transcription result status |
***
## WebhookEventStatus
```ts
type WebhookEventStatus = "completed" | "error";
```
Webhook event status values
***
## WebhookHandlerResult
```ts
type WebhookHandlerResult = {
error?: string;
event?: WebhookEvent;
ok: boolean;
status: number;
};
```
Result of webhook handling
**Properties**
| Property | Type | Description |
| -------- | ------------------------------------ | ------------------------------------------------ |
| `error?` | `string` | Error message (only present when ok=false) |
| `event?` | [`WebhookEvent`](types#webhookevent) | Parsed webhook event (only present when ok=true) |
| `ok` | `boolean` | Whether the webhook was handled successfully |
| `status` | `number` | HTTP status code to return |
***
## WebhookHandlerResultWithFetch
```ts
type WebhookHandlerResultWithFetch = WebhookHandlerResult & {
fetchTranscript: | () => Promise
| undefined;
fetchTranscription: | () => Promise
| undefined;
};
```
Result of webhook handling with lazy fetch capabilities.
When using `client.webhooks.handleExpress()` (or other framework handlers),
the result includes helper methods to fetch the transcript or transcription.
**Type Declaration**
| Name | Type | Description |
| -------------------- | -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `fetchTranscript` | \| () => `Promise`\<[`ISonioxTranscript`](types#isonioxtranscript) \| `null`> \| `undefined` | Fetch the transcript for a completed transcription. Only available when `ok=true` and `event.status='completed'`. **Example** `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); }` |
| `fetchTranscription` | \| () => `Promise`\<[`ISonioxTranscription`](types#isonioxtranscription) \| `null`> \| `undefined` | Fetch the full transcription object. Useful for both completed (metadata) and error (error details) statuses. **Example** `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'error') { const transcription = await result.fetchTranscription(); console.log(transcription?.error_message); }` |
***
## WebhookHeaders
```ts
type WebhookHeaders =
| Headers
| Record
| {
get: string | null;
};
```
Headers object type - supports both standard headers and record types
***
## HttpClient
Pluggable HTTP client interface
**Methods**
**request()**
```ts
request(request): Promise>;
```
Perform an HTTP request
**Type Parameters**
| Type Parameter |
| -------------- |
| `T` |
**Parameters**
| Parameter | Type | Description |
| --------- | ---------------------------------- | --------------------- |
| `request` | [`HttpRequest`](types#httprequest) | Request configuration |
**Returns**
`Promise`\<[`HttpResponse`](types#httpresponset)\<`T`>>
Promise resolving to the response
**Throws**
[SonioxHttpError](classes#sonioxhttperror) On network errors, timeouts, HTTP errors, or parse errors
***
## HttpErrorDetails
Error details for SonioxHttpError
**Properties**
| Property | Type | Description |
| ------------- | -------------------------------------- | ---------------------------------- |
| `bodyText?` | `string` | Response body text (capped at 4KB) |
| `cause?` | `unknown` | - |
| `code` | [`HttpErrorCode`](types#httperrorcode) | - |
| `headers?` | `Record`\<`string`, `string`> | - |
| `message` | `string` | - |
| `method` | [`HttpMethod`](types#httpmethod) | - |
| `statusCode?` | `number` | - |
| `url` | `string` | - |
***
## HttpRequest
HTTP request configuration
**Properties**
| Property | Type | Description |
| --------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| `body?` | [`HttpRequestBody`](types#httprequestbody) | Request body |
| `headers?` | `Record`\<`string`, `string`> | Request headers |
| `method` | [`HttpMethod`](types#httpmethod) | HTTP method |
| `path` | `string` | URL path (relative to baseUrl) or absolute URL |
| `query?` | [`QueryParams`](types#queryparams) | Query parameters (will be URL-encoded) |
| `responseType?` | [`HttpResponseType`](types#httpresponsetype) | Expected response type **Default** `'json'` |
| `signal?` | `AbortSignal` | Optional AbortSignal for request cancellation If provided along with timeoutMs, both will be respected |
| `timeoutMs?` | `number` | Request timeout in milliseconds If not specified, uses the client's default timeout |
***
## HttpResponse\
HTTP response from the client
**Type Parameters**
| Type Parameter |
| -------------- |
| `T` |
**Properties**
| Property | Type | Description |
| --------- | ----------------------------- | ----------------------------------------------- |
| `data` | `T` | Parsed response data |
| `headers` | `Record`\<`string`, `string`> | Response headers (normalized to lowercase keys) |
| `status` | `number` | HTTP status code |
***
## ISonioxTranscript
Type contract for SonioxTranscript class.
**See**
SonioxTranscript for full documentation.
**Methods**
**segments()**
```ts
segments(options?): TranscriptSegment[];
```
**Parameters**
| Parameter | Type |
| ---------- | ------------------------------------------------------------ |
| `options?` | [`SegmentTranscriptOptions`](types#segmenttranscriptoptions) |
**Returns**
[`TranscriptSegment`](types#transcriptsegment)\[]
**Properties**
| Property | Type |
| -------- | --------------------------------------------- |
| `id` | `string` |
| `text` | `string` |
| `tokens` | [`TranscriptToken`](types#transcripttoken)\[] |
***
## ISonioxTranscription
Type contract for SonioxTranscription class.
**See**
SonioxTranscription for full documentation.
**Methods**
**delete()**
```ts
delete(): Promise;
```
**Returns**
`Promise`\<`void`>
***
**destroy()**
```ts
destroy(): Promise;
```
**Returns**
`Promise`\<`void`>
***
**getTranscript()**
```ts
getTranscript(options?): Promise;
```
**Parameters**
| Parameter | Type |
| ----------------- | --------------------------------------------------- |
| `options?` | \{ `force?`: `boolean`; `signal?`: `AbortSignal`; } |
| `options.force?` | `boolean` |
| `options.signal?` | `AbortSignal` |
**Returns**
`Promise`\<[`ISonioxTranscript`](types#isonioxtranscript) | `null`>
***
**refresh()**
```ts
refresh(signal?): Promise;
```
**Parameters**
| Parameter | Type |
| --------- | ------------- |
| `signal?` | `AbortSignal` |
**Returns**
`Promise`\<`ISonioxTranscription`>
***
**toJSON()**
```ts
toJSON(): SonioxTranscriptionData;
```
**Returns**
[`SonioxTranscriptionData`](types#sonioxtranscriptiondata)
***
**wait()**
```ts
wait(options?): Promise;
```
**Parameters**
| Parameter | Type |
| ---------- | ---------------------------------- |
| `options?` | [`WaitOptions`](types#waitoptions) |
**Returns**
`Promise`\<`ISonioxTranscription`>
**Properties**
| Property | Type |
| -------------------------------- | -------------------------------------------------------------------------------- |
| `audio_duration_ms` | `number` \| `null` \| `undefined` |
| `audio_url` | `string` \| `null` \| `undefined` |
| `client_reference_id` | `string` \| `null` \| `undefined` |
| `context` | \| [`TranscriptionContext`](types#transcriptioncontext) \| `null` \| `undefined` |
| `created_at` | `string` |
| `enable_language_identification` | `boolean` |
| `enable_speaker_diarization` | `boolean` |
| `error_message` | `string` \| `null` \| `undefined` |
| `error_type` | `string` \| `null` \| `undefined` |
| `file_id` | `string` \| `null` \| `undefined` |
| `filename` | `string` |
| `id` | `string` |
| `language_hints` | `string`\[] \| `undefined` |
| `model` | `string` |
| `status` | [`TranscriptionStatus`](types#transcriptionstatus) |
| `transcript` | [`ISonioxTranscript`](types#isonioxtranscript) \| `null` \| `undefined` |
| `webhook_auth_header_name` | `string` \| `null` \| `undefined` |
| `webhook_auth_header_value` | `string` \| `null` \| `undefined` |
| `webhook_status_code` | `number` \| `null` \| `undefined` |
| `webhook_url` | `string` \| `null` \| `undefined` |
# Async Client
URL: /stt/SDKs/python-SDK/Full-SDK-reference/async_client
Soniox Python SDK - Async Client Reference
***
## AsyncSonioxClient
Asynchronous Soniox REST client exposing HTTP and realtime helpers.
### Constructor
```python
AsyncSonioxClient(api_key: str | None = None, api_base_url: str | None = None, websocket_base_url: str | None = None, timeout_sec: float | None = None, webhook_secret: str | None = None, webhook_signature_header: str | None = None, **client_kwargs: Any)
```
**Parameters**
| Parameter | Type | Description |
| -------------------------- | --------------- | ------------------------------------------------ |
| `api_key` | `str \| None` | API key used for authentication. |
| `api_base_url` | `str \| None` | Base URL for Soniox REST API requests. |
| `websocket_base_url` | `str \| None` | Base URL for Soniox realtime WebSocket endpoint. |
| `timeout_sec` | `float \| None` | Maximum wait time in seconds. |
| `webhook_secret` | `str \| None` | Webhook secret used for signature verification. |
| `webhook_signature_header` | `str \| None` | Webhook signature header name. |
| `client_kwargs` | `Any` | Additional HTTP client keyword arguments. |
**Returns**
`None`
### Properties
| Property | Type | Description |
| ---------- | ------------------------ | ----------------------------------------------------------- |
| `files` | `AsyncFilesAPI` | List of uploaded files. |
| `stt` | `AsyncSttAPI` | Speech-to-text API namespace. |
| `models` | `AsyncModelsAPI` | List of all available models. |
| `auth` | `AsyncAuthAPI` | Authentication API namespace. |
| `webhooks` | `AsyncSonioxWebhooksAPI` | Webhook utilities API namespace. |
| `realtime` | `AsyncRealtimeAPI` | Entrypoint for async realtime helpers on AsyncSonioxClient. |
### request()
```python
request(method: str, path: str, *, params: Mapping[str, Any] | None = None, json: Any | None = None, data: Mapping[str, Any] | None = None, files: Mapping[str, Any] | None = None) -> httpx.Response
```
Perform a request against the configured Soniox REST endpoint.
**Parameters**
| Parameter | Type | Description |
| --------- | --------------------------- | ----------------------------------- |
| `method` | `str` | HTTP method to use for the request. |
| `path` | `str` | Relative API path for the request. |
| `params` | `Mapping[str, Any] \| None` | Query parameters for the request. |
| `json` | `Any \| None` | JSON request payload. |
| `data` | `Mapping[str, Any] \| None` | Form-encoded request payload. |
| `files` | `Mapping[str, Any] \| None` | Multipart file payload mapping. |
**Returns**
`httpx.Response`
***
### aclose()
```python
aclose() -> None
```
Close any outstanding async HTTP connections.
**Returns**
`None`
***
## AsyncFilesAPI
### Constructor
```python
AsyncFilesAPI(client: AsyncSonioxClient)
```
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | ----------------------- |
| `client` | `AsyncSonioxClient` | Soniox client instance. |
**Returns**
`None`
### list()
```python
list(limit: int = 100, cursor: str | None = None) -> GetFilesResponse
```
List uploaded files.
Performs a GET request to `/files` with optional pagination.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------- | ----------------------------------------------- |
| `limit` | `int` | Maximum number of files to return. |
| `cursor` | `str \| None` | Pagination cursor for the next page of results. |
**Returns**
`GetFilesResponse`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### list\_all()
```python
list_all(limit: int = 100) -> AsyncGenerator[File, None]
```
Iterate through all uploaded files across all pages.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | ---------------------------------- |
| `limit` | `int` | Maximum number of files to return. |
**Yields**
`AsyncGenerator[File, None]`
File: The next file object from the API.
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### get()
```python
get(file_id: str) -> File
```
Retrieve a file by ID.
Performs a GET request to `/files/{file_id}`.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | --------------------------------- |
| `file_id` | `str` | ID of a previously uploaded file. |
**Returns**
`File`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### get\_or\_none()
```python
get_or_none(file_id: str) -> File | None
```
Retrieve a file by ID.
Returns `None` if the file does not exist.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | --------------------------------- |
| `file_id` | `str` | ID of a previously uploaded file. |
**Returns**
`File | None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### delete()
```python
delete(file_id: str) -> None
```
Delete a file by ID.
Performs a DELETE request to `/files/{file_id}`.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | --------------------------------- |
| `file_id` | `str` | ID of a previously uploaded file. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### delete\_if\_exists()
```python
delete_if_exists(file_id: str) -> None
```
Delete a file by ID if it exists.
Ignores missing files.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | --------------------------------- |
| `file_id` | `str` | ID of a previously uploaded file. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### upload()
```python
upload(file: BinaryIO | bytes | Path | str, *, filename: str | None = None, client_reference_id: str | None = None) -> File
```
Upload a file.
Performs a multipart POST request to `/files`.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ---------------------------------- | --------------------------------------------------------------- |
| `file` | `BinaryIO \| bytes \| Path \| str` | File input to upload or transcribe. |
| `filename` | `str \| None` | Filename associated with uploaded file data. |
| `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique |
**Returns**
`File`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### delete\_all()
```python
delete_all(limit: int = 100) -> None
```
Delete all files.
Iterates through all pages and deletes each file. Stops and raises on the first failed deletion.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | ---------------------------------- |
| `limit` | `int` | Maximum number of files to return. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
## AsyncSttAPI
### Constructor
```python
AsyncSttAPI(client: AsyncSonioxClient)
```
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | ----------------------- |
| `client` | `AsyncSonioxClient` | Soniox client instance. |
**Returns**
`None`
### list()
```python
list(limit: int = 100, cursor: str | None = None) -> GetTranscriptionsResponse
```
List transcriptions.
Performs a GET request to `/transcriptions` with optional pagination.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------- | ----------------------------------------------- |
| `limit` | `int` | Maximum number of transcriptions to return. |
| `cursor` | `str \| None` | Pagination cursor for the next page of results. |
**Returns**
`GetTranscriptionsResponse`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### list\_all()
```python
list_all(limit: int = 100) -> AsyncGenerator[Transcription, None]
```
Iterate through all transcriptions across all pages.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | ------------------------------------------- |
| `limit` | `int` | Maximum number of transcriptions to return. |
**Yields**
`AsyncGenerator[Transcription, None]`
File: The next transcription object from the API.
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### delete\_all()
```python
delete_all(limit: int = 100) -> None
```
Delete all transcriptions.
Iterates through all pages and deletes each transcription. Stops and raises on the first failed deletion.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | ------------------------------------------- |
| `limit` | `int` | Maximum number of transcriptions to return. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### create()
```python
create(*, model: str = DEFAULT_MODEL, file_id: str | None = None, audio_url: str | None = None, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription
```
Create a transcription.
Performs a POST request to `/transcriptions`.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ----------------------------------- | ----------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `file_id` | `str \| None` | ID of a previously uploaded file. |
| `audio_url` | `str \| None` | Publicly accessible audio URL. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### get()
```python
get(transcription_id: str) -> Transcription
```
Retrieve a transcription by ID.
Performs a GET request to `/transcriptions/{transcription_id}`.
**Parameters**
| Parameter | Type | Description |
| ------------------ | ----- | ------------------------- |
| `transcription_id` | `str` | Transcription identifier. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### get\_or\_none()
```python
get_or_none(transcription_id: str) -> Transcription | None
```
Retrieve a transcription by ID.
Returns `None` if the transcription does not exist.
**Parameters**
| Parameter | Type | Description |
| ------------------ | ----- | ------------------------- |
| `transcription_id` | `str` | Transcription identifier. |
**Returns**
`Transcription | None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### delete()
```python
delete(transcription_id: str) -> None
```
Delete a transcription by ID.
Performs a DELETE request to `/transcriptions/{transcription_id}`.
**Parameters**
| Parameter | Type | Description |
| ------------------ | ----- | ------------------------- |
| `transcription_id` | `str` | Transcription identifier. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### delete\_if\_exists()
```python
delete_if_exists(transcription_id: str) -> None
```
Delete a transcription by ID if it exists.
Ignores missing transcriptions.
**Parameters**
| Parameter | Type | Description |
| ------------------ | ----- | ------------------------- |
| `transcription_id` | `str` | Transcription identifier. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### destroy()
```python
destroy(transcription_id: str) -> None
```
Delete a transcription and its associated uploaded file.
**Parameters**
| Parameter | Type | Description |
| ------------------ | ----- | ------------------------- |
| `transcription_id` | `str` | Transcription identifier. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### destroy\_all()
```python
destroy_all(limit: int = 100) -> None
```
Delete all transcriptions and their associated files. Stops and raises on the first failed deletion.
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | ------------------------------------------- |
| `limit` | `int` | Maximum number of transcriptions to return. |
**Returns**
`None`
**Raises**
* `SonioxAPIError` When the API returns an error during listing.
***
### get\_transcript()
```python
get_transcript(transcription_id: str) -> TranscriptionTranscript
```
Retrieve the transcript for a transcription.
Performs a GET request to `/transcriptions/{transcription_id}/transcript`.
**Parameters**
| Parameter | Type | Description |
| ------------------ | ----- | ------------------------- |
| `transcription_id` | `str` | Transcription identifier. |
**Returns**
`TranscriptionTranscript`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### wait()
```python
wait(transcription_id: str, *, interval_sec: float = 5.0, timeout_sec: float | None = None) -> Transcription
```
Poll a transcription until it leaves the queued or processing state.
**Parameters**
| Parameter | Type | Description |
| ------------------ | --------------- | ----------------------------- |
| `transcription_id` | `str` | Transcription identifier. |
| `interval_sec` | `float` | Polling interval in seconds. |
| `timeout_sec` | `float \| None` | Maximum wait time in seconds. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
* `TimeoutError` Waiting for the transcription to finish exceeded `timeout_sec`.
***
### transcribe\_from\_url()
```python
transcribe_from_url(*, model: str = DEFAULT_MODEL, audio_url: str, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription
```
Create a transcription from an audio URL.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ----------------------------------- | ----------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `audio_url` | `str` | Publicly accessible audio URL. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### transcribe\_from\_file\_id()
```python
transcribe_from_file_id(*, model: str = DEFAULT_MODEL, file_id: str, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription
```
Create a transcription from an existing uploaded file.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ----------------------------------- | ----------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `file_id` | `str` | ID of a previously uploaded file. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### transcribe\_from\_file()
```python
transcribe_from_file(*, model: str = DEFAULT_MODEL, file: BinaryIO | bytes | Path | str, filename: str | None = None, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription
```
Upload a file and create a transcription from it.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ----------------------------------- | -------------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `file` | `BinaryIO \| bytes \| Path \| str` | File input to upload or transcribe. |
| `filename` | `str \| None` | Filename associated with uploaded file data. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### transcribe()
```python
transcribe(*, model: str = DEFAULT_MODEL, audio_url: str | None = None, file_id: str | None = None, file: BinaryIO | bytes | Path | str | None = None, filename: str | None = None, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription
```
Create a transcription from a file, file ID, or audio URL.
Validates mutually exclusive inputs before submission.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ------------------------------------------ | -------------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `audio_url` | `str \| None` | Publicly accessible audio URL. |
| `file_id` | `str \| None` | ID of a previously uploaded file. |
| `file` | `BinaryIO \| bytes \| Path \| str \| None` | File input to upload or transcribe. |
| `filename` | `str \| None` | Filename associated with uploaded file data. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
* `SonioxValidationError` When the payload fails validation.
***
### transcribe\_file\_with\_webhook()
```python
transcribe_file_with_webhook(*, model: str = DEFAULT_MODEL, file: BinaryIO | bytes | Path | str, webhook_url: str, filename: str | None = None, client_reference_id: str | None = None, webhook_auth: WebhookAuthConfig | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription
```
Upload a file, configure a webhook, and start transcription.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ----------------------------------- | -------------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `file` | `BinaryIO \| bytes \| Path \| str` | File input to upload or transcribe. |
| `webhook_url` | `str` | URL to receive webhook notifications. |
| `filename` | `str \| None` | Filename associated with uploaded file data. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `webhook_auth` | `WebhookAuthConfig \| None` | Webhook authentication configuration. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
### transcribe\_and\_wait()
```python
transcribe_and_wait(*, model: str = DEFAULT_MODEL, audio_url: str | None = None, file_id: str | None = None, file: BinaryIO | bytes | Path | str | None = None, filename: str | None = None, client_reference_id: str | None = None, delete_after: bool = False, wait_interval_sec: float = 5.0, wait_timeout_sec: float | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription
```
Create a transcription and wait for completion.
Returns a Transcription object after it is completed. Optionally deletes
the transcription and the uploaded file after completion.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ------------------------------------------ | ----------------------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `audio_url` | `str \| None` | Publicly accessible audio URL. |
| `file_id` | `str \| None` | ID of a previously uploaded file. |
| `file` | `BinaryIO \| bytes \| Path \| str \| None` | File input to upload or transcribe. |
| `filename` | `str \| None` | Filename associated with uploaded file data. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `delete_after` | `bool` | Whether to delete created resources after completion. |
| `wait_interval_sec` | `float` | Polling interval in seconds while waiting. |
| `wait_timeout_sec` | `float \| None` | Maximum wait time in seconds while polling. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`Transcription`
**Raises**
* `SonioxAPIError` When the API returns an error.
* `SonioxValidationError` When the payload fails validation.
* `TimeoutError` Waiting for the transcription to finish exceeded `timeout_sec`.
***
### transcribe\_and\_wait\_with\_tokens()
```python
transcribe_and_wait_with_tokens(*, model: str = DEFAULT_MODEL, audio_url: str | None = None, file_id: str | None = None, file: BinaryIO | bytes | Path | str | None = None, filename: str | None = None, client_reference_id: str | None = None, delete_after: bool = False, wait_interval_sec: float = 5.0, wait_timeout_sec: float | None = None, config: CreateTranscriptionConfig | None = None) -> TranscriptionTranscript
```
Create a transcription, wait for completion, and return the transcript.
Optionally deletes the transcription and uploaded file after completion.
**Parameters**
| Parameter | Type | Description |
| --------------------- | ------------------------------------------ | ----------------------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `audio_url` | `str \| None` | Publicly accessible audio URL. |
| `file_id` | `str \| None` | ID of a previously uploaded file. |
| `file` | `BinaryIO \| bytes \| Path \| str \| None` | File input to upload or transcribe. |
| `filename` | `str \| None` | Filename associated with uploaded file data. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
| `delete_after` | `bool` | Whether to delete created resources after completion. |
| `wait_interval_sec` | `float` | Polling interval in seconds while waiting. |
| `wait_timeout_sec` | `float \| None` | Maximum wait time in seconds while polling. |
| `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. |
**Returns**
`TranscriptionTranscript`
**Raises**
* `SonioxAPIError` When the API returns an error.
* `SonioxValidationError` When the payload fails validation.
* `TimeoutError` Waiting for the transcription to finish exceeded `timeout_sec`.
***
## AsyncModelsAPI
### Constructor
```python
AsyncModelsAPI(client: AsyncSonioxClient)
```
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | ----------------------- |
| `client` | `AsyncSonioxClient` | Soniox client instance. |
**Returns**
`None`
### list()
```python
list() -> GetModelsResponse
```
List available models.
Performs a GET request to `/models`.
**Returns**
`GetModelsResponse`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
## AsyncAuthAPI
### Constructor
```python
AsyncAuthAPI(client: AsyncSonioxClient)
```
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | ----------------------- |
| `client` | `AsyncSonioxClient` | Soniox client instance. |
**Returns**
`None`
### create\_temporary\_api\_key()
```python
create_temporary_api_key(*, usage_type: TemporaryApiKeyUsageType = 'transcribe_websocket', expires_in_seconds: int = 5 * 60, client_reference_id: str | None = None) -> CreateTemporaryApiKeyResponse
```
Create a temporary API key.
Performs a POST request to `/auth/temporary-api-key`.
**Parameters**
| Parameter | Type | Description |
| --------------------- | -------------------------- | --------------------------------------------------------------- |
| `usage_type` | `TemporaryApiKeyUsageType` | Intended usage of the temporary API key. |
| `expires_in_seconds` | `int` | Duration in seconds until the temporary API key expires |
| `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique |
**Returns**
`CreateTemporaryApiKeyResponse`
**Raises**
* `SonioxAPIError` When the API returns an error.
***
## AsyncSonioxWebhooksAPI
# Realtime Client
URL: /stt/SDKs/python-SDK/Full-SDK-reference/realtime_client
Soniox Python SDK - Realtime Client Reference
***
## RealtimeAPI
Entrypoint for realtime helpers on SonioxClient.
### Constructor
```python
RealtimeAPI(client: SonioxClient)
```
**Parameters**
| Parameter | Type | Description |
| --------- | -------------- | ----------------------- |
| `client` | `SonioxClient` | Soniox client instance. |
**Returns**
`None`
### Properties
| Property | Type | Description |
| -------- | ------------------- | ----------------------------- |
| `stt` | `RealtimeSTTClient` | Speech-to-text API namespace. |
***
## AsyncRealtimeAPI
Entrypoint for async realtime helpers on AsyncSonioxClient.
### Constructor
```python
AsyncRealtimeAPI(client: AsyncSonioxClient)
```
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | ----------------------- |
| `client` | `AsyncSonioxClient` | Soniox client instance. |
**Returns**
`None`
### Properties
| Property | Type | Description |
| -------- | ------------------------ | ----------------------------- |
| `stt` | `AsyncRealtimeSTTClient` | Speech-to-text API namespace. |
***
## RealtimeSTTClient
Factory for creating synchronous realtime speech-to-text sessions.
This class validates credentials and prepares session configuration,
but does not itself manage WebSocket connections.
### Constructor
```python
RealtimeSTTClient(client: SonioxClient)
```
Create a realtime STT client bound to an existing API client.
**Parameters**
| Parameter | Type | Description |
| --------- | -------------- | ------------------------------------------------------------- |
| `client` | `SonioxClient` | Parent Soniox client providing configuration and credentials. |
**Returns**
`None`
### connect()
```python
connect(*, config: RealtimeSTTConfig, api_key: str | None = None) -> RealtimeSTTSession
```
Create a new realtime STT session.
The returned session is not connected until entered as a
context manager.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | --------------------------------------------------------------------------------- |
| `config` | `RealtimeSTTConfig` | Realtime transcription configuration. |
| `api_key` | `str \| None` | Optional API key override. If not provided, the client's default API key is used. |
**Returns**
`RealtimeSTTSession`
A new RealtimeSTTSession instance.
**Raises**
* `SonioxValidationError` If no API key is available.
***
## RealtimeSTTSession
Synchronous WebSocket session for a single real-time speech-to-text stream.
This class manages the full lifecycle of a real-time transcription session:
connecting to the WebSocket endpoint, streaming audio data, receiving events,
and gracefully closing the stream. A session is stateful and represents
exactly one streaming interaction with the Soniox realtime API.
Instances are designed to be used as context managers.
### Constructor
```python
RealtimeSTTSession(url: str, config: RealtimeSTTConfig)
```
Create a new realtime STT session.
This does not open a network connection. The WebSocket connection
is established when entering the context manager.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | -------------------------------------------------------------------------------------- |
| `url` | `str` | WebSocket URL for the realtime transcription endpoint. |
| `config` | `RealtimeSTTConfig` | Configuration describing the audio format and transcription behavior for this session. |
**Returns**
`None`
### Properties
| Property | Type | Description |
| -------------- | ----------------------- | --------------------------------------------------------- |
| `config` | `RealtimeSTTConfig` | Return the configuration used to initialize this session. |
| `paused` | `bool` | Return True if the session is currently paused. |
| `last_message` | `RealtimeEvent \| None` | Return the most recently received realtime event, if any. |
### close()
```python
close() -> None
```
Gracefully close the realtime session.
Sends a final empty message to signal end-of-stream, then closes
the WebSocket connection. Calling this method multiple times is safe.
**Returns**
`None`
***
### send\_byte\_chunk()
```python
send_byte_chunk(chunk: bytes) -> None
```
Send a single chunk of raw audio bytes to the realtime stream.
The audio data must match the format declared in the session
configuration (sample rate, channels, encoding).
**Parameters**
| Parameter | Type | Description |
| --------- | ------- | ------------------------ |
| `chunk` | `bytes` | Raw audio bytes to send. |
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected or the send operation fails.
***
### send\_bytes()
```python
send_bytes(chunks: bytes | Iterator[bytes], *, finish: bool = True) -> None
```
Send audio data to the realtime stream.
This method accepts either a single bytes object or an iterator
yielding audio chunks. When an iterator is provided, a FINISH
control message is sent automatically after all chunks have
been transmitted.
**Parameters**
| Parameter | Type | Description |
| --------- | -------------------------- | ---------------------------------------------------------- |
| `chunks` | `bytes \| Iterator[bytes]` | Audio data as raw bytes or an iterator of byte chunks. |
| `finish` | `bool` | Whether to send a finish signal after streaming completes. |
**Returns**
`None`
***
### send\_control\_message()
```python
send_control_message(control_type: RealtimeControlType) -> None
```
Send a control message to the realtime session.
Control messages modify the state of the stream, such as signaling
end-of-audio or requesting finalization.
**Parameters**
| Parameter | Type | Description |
| -------------- | --------------------- | ------------------------------------ |
| `control_type` | `RealtimeControlType` | The type of control message to send. |
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected or the message cannot be sent.
***
### finish()
```python
finish() -> None
```
Signal that no more audio will be sent for this session.
**Returns**
`None`
***
### keep\_alive()
```python
keep_alive() -> None
```
Send a keep-alive message to prevent the session from timing out.
**Returns**
`None`
***
### finalize()
```python
finalize() -> None
```
Finalize all outstanding non-final tokens while keeping the session open.
Subsequent tokens will be delivered with `is_final=True`.
**Returns**
`None`
***
### recv\_bytes()
```python
recv_bytes() -> bytes
```
Receive a raw message from the WebSocket connection.
**Returns**
`bytes`
The received message as bytes. An empty bytes object indicates
that the connection has been closed.
***
### parse\_event()
```python
parse_event(raw: str | bytes) -> RealtimeEvent
```
Parse a raw WebSocket message into a structured realtime event.
**Parameters**
| Parameter | Type | Description |
| --------- | -------------- | --------------------------------------------- |
| `raw` | `str \| bytes` | Raw message payload received from the server. |
**Returns**
`RealtimeEvent`
A validated RealtimeEvent instance.
***
### receive\_event()
```python
receive_event() -> RealtimeEvent | None
```
Receive and parse the next realtime event from the server.
**Returns**
`RealtimeEvent | None`
The next RealtimeEvent, or None if the connection has closed.
**Raises**
* `SonioxRealtimeError` If the session is not connected.
***
### receive\_events()
```python
receive_events() -> Iterator[RealtimeEvent]
```
Yield realtime events as they are received from the server.
Iteration stops automatically when the connection is closed.
**Returns**
`Iterator[RealtimeEvent]`
***
### handle\_events()
```python
handle_events(handler: Callable[[RealtimeEvent], None]) -> None
```
Receive realtime events and dispatch them to a handler callback.
**Parameters**
| Parameter | Type | Description |
| --------- | --------------------------------- | ------------------------------------------------- |
| `handler` | `Callable[[RealtimeEvent], None]` | Callable invoked for each received RealtimeEvent. |
**Returns**
`None`
***
### pause()
```python
pause() -> None
```
Pause the session, suppressing outgoing audio and starting a
background keepalive thread.
While paused, calls to :meth:`send_byte_chunk` are silently dropped.
A background thread sends a keepalive message every
`KEEP_ALIVE_INTERVAL_SEC` seconds to prevent the server from
timing out the session.
Calling `pause` on an already-paused session is a no-op.
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected.
***
### resume()
```python
resume() -> None
```
Resume a paused session, stopping the keepalive thread and
allowing audio to be sent again.
Calling `resume` on a session that is not paused is a no-op.
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected.
***
## AsyncRealtimeSTTClient
Factory for creating asynchronous realtime speech-to-text sessions.
This class validates credentials and prepares session configuration,
but does not itself manage WebSocket connections.
### Constructor
```python
AsyncRealtimeSTTClient(client: AsyncSonioxClient)
```
Create a realtime STT client bound to an existing API client.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | ------------------------------------------------------------- |
| `client` | `AsyncSonioxClient` | Parent Soniox client providing configuration and credentials. |
**Returns**
`None`
### connect()
```python
connect(*, config: RealtimeSTTConfig, api_key: str | None = None) -> AsyncRealtimeSTTSession
```
Create a new realtime STT session.
The returned session is not connected until entered as an async
context manager.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | --------------------------------------------------------------------------------- |
| `config` | `RealtimeSTTConfig` | Realtime transcription configuration. |
| `api_key` | `str \| None` | Optional API key override. If not provided, the client's default API key is used. |
**Returns**
`AsyncRealtimeSTTSession`
A new AsyncRealtimeSTTSession instance.
**Raises**
* `SonioxValidationError` If no API key is available.
***
## AsyncRealtimeSTTSession
Asynchronous WebSocket session for a single real-time speech-to-text stream.
This class manages the full lifecycle of a real-time transcription session:
connecting to the WebSocket endpoint, streaming audio data, receiving events,
and gracefully closing the stream. A session is stateful and represents
exactly one streaming interaction with the Soniox realtime API.
Instances are designed to be used as async context managers.
### Constructor
```python
AsyncRealtimeSTTSession(url: str, config: RealtimeSTTConfig)
```
Create a new realtime STT session.
This does not open a network connection. The WebSocket connection
is established when entering the async context manager.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------- | -------------------------------------------------------------------------------------- |
| `url` | `str` | WebSocket URL for the realtime transcription endpoint. |
| `config` | `RealtimeSTTConfig` | Configuration describing the audio format and transcription behavior for this session. |
**Returns**
`None`
### Properties
| Property | Type | Description |
| -------------- | ----------------------- | --------------------------------------------------------- |
| `config` | `RealtimeSTTConfig` | Return the configuration used to initialize this session. |
| `paused` | `bool` | Return True if the session is currently paused. |
| `last_message` | `RealtimeEvent \| None` | Return the most recently received realtime event, if any. |
### close()
```python
close() -> None
```
Gracefully close the realtime session.
Sends a final empty message to signal end-of-stream, then closes
the WebSocket connection. Calling this method multiple times is safe.
**Returns**
`None`
***
### send\_byte\_chunk()
```python
send_byte_chunk(chunk: bytes) -> None
```
Send a single chunk of raw audio bytes to the realtime stream.
The audio data must match the format declared in the session
configuration (sample rate, channels, encoding).
**Parameters**
| Parameter | Type | Description |
| --------- | ------- | ------------------------ |
| `chunk` | `bytes` | Raw audio bytes to send. |
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected or the send operation fails.
***
### send\_bytes()
```python
send_bytes(chunks: bytes | AsyncIterator[bytes], *, finish: bool = True) -> None
```
Send audio data to the realtime stream.
This method accepts either a single bytes object or an iterator
yielding audio chunks. When an iterator is provided, a
FINISH control message is sent automatically after all chunks
have been transmitted.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------------------- | ---------------------------------------------------------- |
| `chunks` | `bytes \| AsyncIterator[bytes]` | Audio data as raw bytes or an iterator of byte chunks. |
| `finish` | `bool` | Whether to send a finish signal after streaming completes. |
**Returns**
`None`
***
### send\_control\_message()
```python
send_control_message(control_type: RealtimeControlType) -> None
```
Send a control message to the realtime session.
Control messages modify the state of the stream, such as signaling
end-of-audio or requesting finalization.
**Parameters**
| Parameter | Type | Description |
| -------------- | --------------------- | ------------------------------------ |
| `control_type` | `RealtimeControlType` | The type of control message to send. |
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected or the message cannot be sent.
***
### finish()
```python
finish() -> None
```
Signal that no more audio will be sent for this session.
**Returns**
`None`
***
### keep\_alive()
```python
keep_alive() -> None
```
Send a keep-alive message to prevent the session from timing out.
**Returns**
`None`
***
### finalize()
```python
finalize() -> None
```
Finalize all outstanding non-final tokens while keeping the session open.
Subsequent tokens will be delivered with `is_final=True`.
**Returns**
`None`
***
### recv\_bytes()
```python
recv_bytes() -> bytes
```
Receive a raw message from the WebSocket connection.
**Returns**
`bytes`
The received message as bytes. An empty bytes object indicates
that the connection has been closed.
***
### parse\_event()
```python
parse_event(raw: str | bytes) -> RealtimeEvent
```
Parse a raw WebSocket message into a structured realtime event.
**Parameters**
| Parameter | Type | Description |
| --------- | -------------- | --------------------------------------------- |
| `raw` | `str \| bytes` | Raw message payload received from the server. |
**Returns**
`RealtimeEvent`
A validated RealtimeEvent instance.
***
### receive\_event()
```python
receive_event() -> RealtimeEvent | None
```
Receive and parse the next realtime event from the server.
**Returns**
`RealtimeEvent | None`
The next RealtimeEvent, or None if the connection has closed.
**Raises**
* `SonioxRealtimeError` If the session is not connected.
***
### receive\_events()
```python
receive_events() -> AsyncIterator[RealtimeEvent]
```
Yield realtime events as they are received from the server.
Iteration stops automatically when the connection is closed.
**Returns**
`AsyncIterator[RealtimeEvent]`
***
### handle\_events()
```python
handle_events(handler: Callable[[RealtimeEvent], Awaitable[None]]) -> None
```
Receive realtime events and dispatch them to a handler callback.
**Parameters**
| Parameter | Type | Description |
| --------- | -------------------------------------------- | ------------------------------------------------- |
| `handler` | `Callable[[RealtimeEvent], Awaitable[None]]` | Callable invoked for each received RealtimeEvent. |
**Returns**
`None`
***
### pause()
```python
pause() -> None
```
Pause the session, suppressing outgoing audio and starting a
background keepalive task.
While paused, calls to :meth:`send_byte_chunk` are silently dropped.
A background task sends a keepalive message every
`KEEP_ALIVE_INTERVAL_SEC` seconds to prevent the server from
timing out the session.
Calling `pause` on an already-paused session is a no-op.
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected.
***
### resume()
```python
resume() -> None
```
Resume a paused session, stopping the keepalive task and
allowing audio to be sent again.
Calling `resume` on a session that is not paused is a no-op.
**Returns**
`None`
**Raises**
* `SonioxRealtimeError` If the session is not connected.
# Types
URL: /stt/SDKs/python-SDK/Full-SDK-reference/types
Soniox Python SDK - Types Reference
***
## Token
Token metadata emitted during realtime streaming transcriptions.
### Properties
| Property | Type | Description |
| -------------------- | --------------- | ------------------------------------------------------------ |
| `text` | `str` | The transcribed text. |
| `start_ms` | `int \| None` | Start time in milliseconds relative to audio start. |
| `end_ms` | `int \| None` | End time in milliseconds relative to audio start. |
| `confidence` | `float \| None` | Confidence score (0.0 to 1.0). |
| `is_final` | `bool \| None` | Whether this is a finalized token. |
| `speaker` | `str \| None` | Speaker identifier (if diarization enabled). |
| `translation_status` | `str \| None` | Translation status of this token. |
| `language` | `str \| None` | Detected language code (if language identification enabled). |
| `source_language` | `str \| None` | Source language for translated tokens. |
***
## ApiError
Structured representation of a non-2xx API response payload.
### Properties
| Property | Type | Description |
| ------------------- | ------------------------------- | ------------------------------------------------------------------------------------------ |
| `status_code` | `int` | HTTP status code. |
| `error_type` | `str` | High-level error code (e.g., 'bad\_request', 'quota\_exceeded') for programmatic handling. |
| `message` | `str` | Detailed error message describing the failure. |
| `validation_errors` | `list[ApiErrorValidationError]` | List of specific field validation failures, if applicable. |
| `request_id` | `str \| None` | Unique identifier for the request, useful for troubleshooting. |
***
## ApiErrorValidationError
Details a single validation error reported by the Soniox API.
### Properties
| Property | Type | Description |
| ------------ | ----- | -------------------------------------------------------- |
| `error_type` | `str` | The category of validation error. |
| `location` | `str` | The location of the error, e.g. \['body', 'audio\_url']. |
| `message` | `str` | A human-readable description of the validation failure. |
***
## CreateTemporaryApiKeyPayload
Payload for requesting a temporary API key (e.g., websocket).
### Properties
| Property | Type | Description |
| --------------------- | -------------------------- | --------------------------------------------------------------- |
| `usage_type` | `TemporaryApiKeyUsageType` | Intended usage of the temporary API key. |
| `expires_in_seconds` | `int` | Duration in seconds until the temporary API key expires |
| `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique |
***
## CreateTemporaryApiKeyResponse
Response data for a temp API key request.
### Properties
| Property | Type | Description |
| ------------ | ---------- | --------------------------------------------------------------------- |
| `api_key` | `str` | Created temporary API key. |
| `expires_at` | `datetime` | UTC timestamp indicating when generated temporary API key will expire |
***
## CreateTranscriptionPayload
Payload sent to create an asynchronous transcription job.
### Properties
| Property | Type | Description |
| -------------------------------- | --------------------------- | ------------------------------------------------------------------------------------------------- |
| `model` | `str` | Speech-to-text model to use. |
| `audio_url` | `str \| None` | URL of a publicly accessible audio file. |
| `file_id` | `str \| None` | ID of a previously uploaded file (UUID). |
| `language_hints` | `list[str] \| None` | Array of expected ISO language codes to bias recognition. |
| `language_hints_strict` | `bool \| None` | When true, model relies more heavily on language hints (best results with one language hint set). |
| `enable_speaker_diarization` | `bool \| None` | Enable speaker diarization to identify different speakers. |
| `enable_language_identification` | `bool \| None` | Enable automatic language identification. |
| `translation` | `TranslationConfig \| None` | Translation configuration. |
| `context` | `StructuredContext \| None` | Additional context to improve transcription accuracy and formatting of specialized terms. |
| `webhook_url` | `str \| None` | URL to receive webhook notifications when transcription is completed or fails. |
| `webhook_auth_header_name` | `str \| None` | Name of the authentication header sent with webhook notifications |
| `webhook_auth_header_value` | `str \| None` | Authentication header value sent with webhook notifications. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
***
## CreateTranscriptionConfig
Helper config used when building transcription payloads.
### Properties
| Property | Type | Description |
| -------------------------------- | --------------------------- | ----------------------------------------------------------------------------------------- |
| `model` | `str \| None` | Speech-to-text model to use. |
| `language_hints` | `list[str] \| None` | Array of expected ISO language codes to bias recognition. |
| `language_hints_strict` | `bool \| None` | When true, model relies more heavily on language hints. |
| `enable_speaker_diarization` | `bool \| None` | Enable speaker diarization to identify different speakers. |
| `enable_language_identification` | `bool \| None` | Enable automatic language identification |
| `translation` | `TranslationConfig \| None` | Translation configuration |
| `context` | `StructuredContext \| None` | Additional context to improve transcription accuracy and formatting of specialized terms. |
| `webhook_url` | `str \| None` | URL to receive webhook notifications when transcription is completed or fails. |
| `webhook_auth_header_name` | `str \| None` | Name of the authentication header sent with webhook notifications |
| `webhook_auth_header_value` | `str \| None` | Authentication header value sent with webhook notifications |
| `client_reference_id` | `str \| None` | Optional tracking identifier |
***
## File
Metadata describing an uploaded file in the Soniox API.
### Properties
| Property | Type | Description |
| --------------------- | ------------- | ---------------------------------------------------- |
| `id` | `str` | Unique identifier of the file (UUID). |
| `filename` | `str` | Name of the file. |
| `size` | `int` | Size of the file in bytes. |
| `created_at` | `datetime` | UTC timestamp indicating when the file was uploaded. |
| `client_reference_id` | `str \| None` | Optional tracking identifier string. |
***
## GetFilesPayload
Parameters accepted by the file listing endpoint.
### Properties
| Property | Type | Description |
| -------- | ------------- | ----------------------------------------------- |
| `limit` | `int` | Maximum number of files to return. |
| `cursor` | `str \| None` | Pagination cursor for the next page of results. |
***
## GetFilesResponse
Paginated response returned when listing uploaded files.
### Properties
| Property | Type | Description |
| ------------------ | ------------- | ------------------------------------------------------------------------------------------------------------ |
| `files` | `list[File]` | List of uploaded files. |
| `next_page_cursor` | `str \| None` | A pagination token that references the next page of results. When None, no additional results are available. |
***
## GetModelsResponse
Response returned when listing available models.
### Properties
| Property | Type | Description |
| -------- | ------------- | ----------------------------- |
| `models` | `list[Model]` | List of all available models. |
***
## GetTranscriptionsPayload
Parameters for listing transcription jobs.
### Properties
| Property | Type | Description |
| -------- | ------------- | ----------------------------------------------- |
| `limit` | `int` | Maximum number of transcriptions to return. |
| `cursor` | `str \| None` | Pagination cursor for the next page of results. |
***
## GetTranscriptionsResponse
Paginated response for transcription listings.
### Properties
| Property | Type | Description |
| ------------------ | --------------------- | ------------------------------------------------------------------------------------------------------------ |
| `transcriptions` | `list[Transcription]` | List of transcriptions. |
| `next_page_cursor` | `str \| None` | A pagination token that references the next page of results. When None, no additional results are available. |
***
## Model
Describes a Soniox transcription model.
### Properties
| Property | Type | Description |
| -------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------- |
| `id` | `str` | Unique identifier of the model. |
| `aliased_model_id` | `str \| None` | If this is an alias, the id of the aliased model. None for non-alias models. |
| `name` | `str` | Name of the model. |
| `context_version` | `int \| None` | Version of context supported. |
| `transcription_mode` | `TranscriptionMode` | Transcription mode of the model. |
| `languages` | `list[Language]` | List of languages supported by the model. |
| `supports_language_hints_strict` | `bool` | If model supports 'language\_hints\_strict' option. |
| `translation_targets` | `list[TranslationTarget]` | List of supported one-way translation targets. If list is empty, check for one\_way\_translation field. |
| `two_way_translation_pairs` | `list[str]` | List of supported two-way translation pairs. If list is empty, check for one\_way\_translation field. |
| `one_way_translation` | `str \| None` | When contains string 'all\_languages', any language from languages can be used |
| `two_way_translation` | `str \| None` | When contains string 'all\_languages',' any language pair from languages can be used |
***
## StructuredContext
Optional structured context provided to the transcription engine.
### Properties
| Property | Type | Description |
| ------------------- | ------------------------------------------------ | --------------------------------------------------------------------------------------------------- |
| `general` | `list[StructuredContextGeneralItem] \| None` | Structured key-value pairs describing domain, topic, intent, participant names, etc. |
| `text` | `str \| None` | Longer free-form background text, prior interaction history, reference documents, or meeting notes. |
| `terms` | `list[str] \| None` | Domain-specific or uncommon words to recognize. |
| `translation_terms` | `list[StructuredContextTranslationTerm] \| None` | Custom translations for ambiguous terms. |
***
## StructuredContextGeneralItem
Single general context key/value pair for transcription context.
### Properties
| Property | Type | Description |
| -------- | ----- | ------------------------------------------------------------------------ |
| `key` | `str` | The key describing the context type (e.g., "domain", "topic", "doctor"). |
| `value` | `str` | The value for the context key. |
***
## StructuredContextTranslationTerm
Defines a translation term mapping used in structured context.
### Properties
| Property | Type | Description |
| -------- | ----- | ------------------------------------ |
| `source` | `str` | The source term to translate. |
| `target` | `str` | The target translation for the term. |
***
## Transcription
Represents a transcription job tracked by Soniox.
### Properties
| Property | Type | Description |
| -------------------------------- | --------------------- | -------------------------------------------------------------------------------------------- |
| `id` | `str` | Unique identifier of the transcription (UUID). |
| `status` | `TranscriptionStatus` | Current status of the transcription. |
| `created_at` | `datetime` | UTC timestamp when the transcription was created. |
| `model` | `str` | Speech-to-text model used. |
| `audio_url` | `str \| None` | URL of the audio file being transcribed. |
| `file_id` | `str \| None` | ID of the uploaded file being transcribed (UUID). |
| `filename` | `str` | Name of the file being transcribed. |
| `language_hints` | `list[str] \| None` | Expected languages in the audio. If not specified, languages are automatically detected. |
| `enable_speaker_diarization` | `bool` | When true, speakers are identified and separated in the transcription output. |
| `enable_language_identification` | `bool` | When true, language is detected for each part of the transcription. |
| `audio_duration_ms` | `int \| None` | Duration of the audio in milliseconds. Only available after processing begins. |
| `error_type` | `str \| None` | Error type if transcription failed. None for successful or in-progress transcriptions. |
| `error_message` | `str \| None` | Error message if transcription failed. None for successful or in-progress transcriptions. |
| `webhook_url` | `str \| None` | URL to receive webhook notifications when transcription is completed or fails. |
| `webhook_auth_header_name` | `str \| None` | Name of the authentication header sent with webhook notifications. |
| `webhook_auth_header_value` | `str \| None` | Authentication header value. Always returned masked. |
| `webhook_status_code` | `int \| None` | HTTP status code received from your server when webhook was delivered. None if not yet sent. |
| `client_reference_id` | `str \| None` | Optional tracking identifier. |
***
## TranscriptionStatus
```python
TranscriptionStatus = Literal["queued", "processing", "completed", "error"]
```
Current status of the transcription job.
***
## TranscriptionTranscript
Transcript data including the full text and tokens.
### Properties
| Property | Type | Description |
| -------- | ------------- | ------------------------------------------------------------------------- |
| `id` | `str` | Unique identifier of the transcription this transcript belongs to (UUID). |
| `text` | `str` | Complete transcribed text content. |
| `tokens` | `list[Token]` | List of detailed token information with timestamps and metadata. |
***
## TranslationConfig
Configuration describing how translation should be performed.
### Properties
| Property | Type | Description |
| ----------------- | ----------------- | ------------------------------------------------------------------------- |
| `type` | `TranslationType` | Translation type. |
| `target_language` | `str \| None` | Target language code for translation (e.g., "fr", "es", "de") (one\_way). |
| `language_a` | `str \| None` | First language code (two\_way). |
| `language_b` | `str \| None` | Second language code (two\_way). |
### validate\_logic()
```python
validate_logic() -> TranslationConfig
```
**Returns**
`TranslationConfig`
***
## TranslationTarget
Describes translation targets offered by a model.
### Properties
| Property | Type | Description |
| -------------------------- | ----------- | ------------------------------------------------------------------------- |
| `target_language` | `str` | Target language code for translation (e.g., "fr", "es", "de") (one\_way). |
| `source_languages` | `list[str]` | List of source language codes. |
| `exclude_source_languages` | `list[str]` | Source language codes excluded for this target. |
***
## TranslationType
```python
TranslationType = Literal["one_way", "two_way"]
```
Supported translation configuration types.
***
## TemporaryApiKeyUsageType
```python
TemporaryApiKeyUsageType = Literal["transcribe_websocket"]
```
Intended usage for temporary API keys.
***
## UploadFilePayload
Optional metadata supplied at upload time.
### Properties
| Property | Type | Description |
| --------------------- | ------------- | --------------------------------------------------------------- |
| `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique |
***
## RealtimeEvent
Event payload received from the realtime STT websocket.
### Properties
| Property | Type | Description |
| --------------------- | ------------- | -------------------------------------------------- |
| `tokens` | `list[Token]` | Tokens in this result. |
| `final_audio_proc_ms` | `int \| None` | Milliseconds of audio that have been finalized. |
| `total_audio_proc_ms` | `int \| None` | Total milliseconds of audio processed. |
| `finished` | `bool` | Whether this is the final result (session ending). |
| `error_code` | `int \| None` | Error code if the realtime operation failed. |
| `error_message` | `str \| None` | Human-readable description of the error. |
### validate\_event()
```python
validate_event(raw: str | bytes) -> RealtimeEvent
```
**Parameters**
| Parameter | Type | Description |
| --------- | -------------- | ---------------------------------------- |
| `raw` | `str \| bytes` | Raw event payload from the realtime API. |
**Returns**
`RealtimeEvent`
***
## RealtimeSTTConfig
Configuration for initiating a realtime transcription session.
### Properties
| Property | Type | Description |
| -------------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `api_key` | `str \| None` | API key for real-time sessions. |
| `model` | `str` | Speech-to-text model to use. |
| `audio_format` | `str` | Audio format. Use 'auto' for automatic detection of container formats. |
| `num_channels` | `int \| None` | Number of audio channels (required for raw audio formats). |
| `sample_rate` | `int \| None` | Sample rate in Hz (required for PCM formats). |
| `language_hints` | `list[str] \| None` | Expected languages in the audio (ISO language codes). |
| `language_hints_strict` | `bool \| None` | When true, recognition is strongly biased toward language hints (best results when using one language in language\_hints). |
| `context` | `StructuredContext \| None` | Additional context to improve transcription accuracy. |
| `enable_speaker_diarization` | `bool \| None` | Enable speaker identification. |
| `enable_language_identification` | `bool \| None` | Enable automatic language detection. |
| `enable_endpoint_detection` | `bool \| None` | Enable endpoint detection for utterance boundaries. |
| `max_endpoint_delay_ms` | `int \| None` | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms |
| `translation` | `TranslationConfig \| None` | Translation configuration. |
| `client_reference_id` | `str \| None` | Optional tracking identifier (max 256 chars). |
### build\_payload()
```python
build_payload(api_key: str) -> RealtimeSTTConfig
```
**Parameters**
| Parameter | Type | Description |
| --------- | ----- | -------------------------------- |
| `api_key` | `str` | API key used for authentication. |
**Returns**
`RealtimeSTTConfig`
***
## Headers
```python
Headers = Mapping[str, str]
```
***
## WebhookAuthConfig
Configuration for webhook authentication headers.
### Properties
| Property | Type | Description |
| -------- | ----- | --------------------------------------------------- |
| `name` | `str` | Expected header name (case-insensitive comparison). |
| `value` | `str` | Expected header value (exact match). |
***
## WebhookEvent
Basic webhook event metadata.
### Properties
| Property | Type | Description |
| -------- | ------------------------------- | ---------------------------- |
| `id` | `str` | Transcription ID (UUID). |
| `status` | `Literal['completed', 'error']` | Transcription result status. |
# Full React SDK reference
URL: /stt/SDKs/react-SDK/reference
Full SDK reference for the React SDK
## Components
| Component | Description |
| ---------------------------------------------------------------------- | --------------------------------------------------------------------- |
| [`SonioxProvider`](/stt/SDKs/react-SDK/reference/types#sonioxprovider) | Provider component for the Soniox client |
| [`AudioLevel`](/stt/SDKs/react-SDK/reference/types#audiolevel) | Component to display the audio level (not available for React Native) |
## Hooks
| Hook | Description |
| ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- |
| [`useRecording`](/stt/SDKs/react-SDK/reference/types#userecording) | Main hook for real-time speech-to-text session |
| [`useMicrophonePermission`](/stt/SDKs/react-SDK/reference/types#usemicrophonepermission) | Hook for checking microphone permission |
| [`useAudioLevel`](/stt/SDKs/react-SDK/reference/types#useaudiolevel) | Hook for real-time audio volume and spectrum data (not available for React Native) |
| [`useSoniox`](/stt/SDKs/react-SDK/reference/types#usesoniox) | Hook toaccess the SonioxClient from context |
# Types
URL: /stt/SDKs/react-SDK/reference/types
Soniox React SDK — Types Reference
## SonioxProviderProps
```ts
type SonioxProviderProps = {
children: ReactNode;
} & SonioxProviderConfigProps | SonioxProviderClientProps;
```
Props for SonioxProvider.
Supply either a pre-built `client` instance or configuration props
**Type Declaration**
| Name | Type |
| ---------- | ----------- |
| `children` | `ReactNode` |
***
## UnsupportedReason
```ts
type UnsupportedReason = "ssr" | "no-mediadevices" | "no-getusermedia" | "insecure-context";
```
Reason why the built-in browser `MicrophoneSource` is unavailable:
* `'ssr'` — `navigator` is undefined (SSR, React Native, or other non-browser JS runtimes).
* `'no-mediadevices'` — `navigator` exists but `navigator.mediaDevices` is missing.
* `'no-getusermedia'` — `navigator.mediaDevices` exists but `getUserMedia` is not a function.
* `'insecure-context'` — the page is not served over HTTPS.
This only reflects whether the **default** `MicrophoneSource` can work.
Custom `AudioSource` implementations (e.g. for React Native) bypass this
check entirely and can record regardless of this value.
***
## AudioLevelProps
**Extends**
* [`UseAudioLevelOptions`](types#useaudioleveloptions)
**Properties**
| Property | Type | Description |
| ------------ | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `active?` | `boolean` | Whether volume metering is active. When false, resources are released. |
| `bands?` | `number` | Number of frequency bands to return. When set, the `bands` array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. |
| `children` | (`state`) => `ReactNode` | - |
| `fftSize?` | `number` | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. **Default** `256` |
| `smoothing?` | `number` | Exponential smoothing factor (0-1). Higher = smoother/slower decay. **Default** `0.85` |
***
## AudioSupportResult
**Properties**
| Property | Type |
| ------------- | ---------------------------------------------- |
| `isSupported` | `boolean` |
| `reason?` | [`UnsupportedReason`](types#unsupportedreason) |
***
## MicrophonePermissionState
**Properties**
| Property | Type | Description |
| ------------- | ------------------------ | ---------------------------------------------------------------------- |
| `canRequest` | `boolean` | Whether the permission can be requested (e.g., via a prompt). |
| `check` | () => `Promise`\<`void`> | Check (or re-check) the microphone permission. No-op when unsupported. |
| `isDenied` | `boolean` | `status === 'denied'`. |
| `isGranted` | `boolean` | `status === 'granted'`. |
| `isSupported` | `boolean` | Whether permission checking is available. |
| `status` | `MicPermissionStatus` | Current permission status. |
***
## RecordingSnapshot
Immutable snapshot of the recording state exposed to React.
**Extended by**
* [`UseRecordingReturn`](types#userecordingreturn)
**Properties**
| Property | Type | Description |
| --------------- | ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `error` | `Error` \| `null` | Latest error, if any. |
| `finalText` | `string` | Accumulated finalized text. |
| `finalTokens` | readonly `RealtimeToken`\[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with `partialTokens` for the complete ordered stream. |
| `groups` | `Readonly`\<`Record`\<`string`, `TokenGroup`>> | Tokens grouped by the active `groupBy` strategy. Auto-populated when `translation` config is provided: - `one_way` → keys: `"original"`, `"translation"` - `two_way` → keys: language codes (e.g. `"en"`, `"es"`) Empty `{}` when no grouping is active. |
| `isActive` | `boolean` | `true` when state is not idle/stopped/canceled/error. |
| `isPaused` | `boolean` | `true` when `state === 'paused'`. |
| `isRecording` | `boolean` | `true` when `state === 'recording'`. |
| `isSourceMuted` | `boolean` | `true` when the audio source is muted externally (e.g. OS-level or hardware mute). |
| `partialText` | `string` | Text from current non-final tokens. |
| `partialTokens` | readonly `RealtimeToken`\[] | Non-final tokens from the latest result. |
| `result` | `RealtimeResult` \| `null` | Latest raw result from the server. |
| `segments` | readonly `RealtimeSegment`\[] | Accumulated final segments. |
| `state` | `RecordingState` | Current recording lifecycle state. |
| `text` | `string` | Full transcript: `finalText + partialText`. |
| `tokens` | readonly `RealtimeToken`\[] | Tokens from the latest result message. |
| `utterances` | readonly `RealtimeUtterance`\[] | Accumulated utterances (one per endpoint). |
***
## UseAudioLevelOptions
**Extended by**
* [`AudioLevelProps`](types#audiolevelprops)
**Properties**
| Property | Type | Description |
| ------------ | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `active?` | `boolean` | Whether volume metering is active. When false, resources are released. |
| `bands?` | `number` | Number of frequency bands to return. When set, the `bands` array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. |
| `fftSize?` | `number` | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. **Default** `256` |
| `smoothing?` | `number` | Exponential smoothing factor (0-1). Higher = smoother/slower decay. **Default** `0.85` |
***
## UseAudioLevelReturn
**Properties**
| Property | Type | Description |
| -------- | -------------------- | ------------------------------------------------------------------------------------ |
| `bands` | readonly `number`\[] | Per-band frequency levels, each 0-1. Empty array when the `bands` option is not set. |
| `volume` | `number` | Current volume level, 0 to 1. Updated every animation frame. |
***
## UseMicrophonePermissionOptions
**Properties**
| Property | Type | Description |
| ------------ | --------- | ---------------------------------------- |
| `autoCheck?` | `boolean` | Automatically check permission on mount. |
***
## UseRecordingConfig
Configuration for useRecording.
Extends the STT session config (model, language\_hints, etc.) with
recording-specific and React-specific options.
Can be used **with or without** a ``:
* **With Provider:** omit `apiKey` — the client is read from context.
* **Without Provider:** pass `apiKey` directly — a client is created internally.
**Extends**
* `SttSessionConfig`
**Properties**
| Property | Type | Description |
| --------------------------------- | ----------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `apiKey?` | `ApiKeyConfig` | API key — string or async function that fetches a temporary key. Required when not using ``. |
| `audio_format?` | `"auto"` \| `AudioFormat` | Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample\_rate and num\_channels. **Default** `'auto'` |
| `buffer_queue_size?` | `number` | Maximum audio chunks to buffer during connection setup. |
| `client_reference_id?` | `string` | Optional tracking identifier (max 256 chars). |
| `context?` | `TranscriptionContext` | Additional context to improve transcription accuracy. |
| `enable_endpoint_detection?` | `boolean` | Enable endpoint detection for utterance boundaries. Useful for voice AI agents. |
| `enable_language_identification?` | `boolean` | Enable automatic language detection. |
| `enable_speaker_diarization?` | `boolean` | Enable speaker identification. |
| `groupBy?` | `"translation"` \| `"language"` \| `"speaker"` \| (`token`) => `string` | Group tokens by a key for easy splitting (e.g. translation, language, speaker). - `'translation'` — group by `translation_status`: keys `"original"` and `"translation"` - `'language'` — group by token `language` field: keys are language codes - `'speaker'` — group by token `speaker` field: keys are speaker identifiers - `(token) => string` — custom grouping function **Auto-defaults** when `translation` config is provided: - `one_way` → `'translation'` - `two_way` → `'language'` |
| `language_hints?` | `string`\[] | Expected languages in the audio (ISO language codes). |
| `language_hints_strict?` | `boolean` | When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee. |
| `max_endpoint_delay_ms?` | `number` | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms |
| `model` | `string` | Speech-to-text model to use. |
| `num_channels?` | `number` | Number of audio channels (required for raw audio formats). |
| `onConnected?` | () => `void` | Called when the WebSocket connects. |
| `onEndpoint?` | () => `void` | Called when an endpoint is detected. |
| `onError?` | (`error`) => `void` | Called when an error occurs. |
| `onFinished?` | () => `void` | Called when the recording session finishes. |
| `onResult?` | (`result`) => `void` | Called on each result from the server. |
| `onSourceMuted?` | () => `void` | Called when the audio source is muted externally (e.g. OS-level or hardware mute). |
| `onSourceUnmuted?` | () => `void` | Called when the audio source is unmuted after an external mute. |
| `onStateChange?` | (`update`) => `void` | Called on each state transition. |
| `permissions?` | `PermissionResolver` \| `null` | Permission resolver override (only used when `apiKey` is provided). Pass `null` to explicitly disable. |
| `resetOnStart?` | `boolean` | Reset transcript state when `start()` is called. **Default** `true` |
| `sample_rate?` | `number` | Sample rate in Hz (required for PCM formats). |
| `session_options?` | `SttSessionOptions` | SDK-level session options (signal, etc.). |
| `source?` | `AudioSource` | Custom audio source (bypasses default MicrophoneSource). |
| `translation?` | `TranslationConfig` | Translation configuration. |
| `wsBaseUrl?` | `string` | WebSocket URL override (only used when `apiKey` is provided). |
***
## UseRecordingReturn
Immutable snapshot of the recording state exposed to React.
**Extends**
* [`RecordingSnapshot`](types#recordingsnapshot)
**Properties**
| Property | Type | Description |
| ------------------- | ------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cancel` | () => `void` | Immediately cancel — does not wait for final results. |
| `clearTranscript` | () => `void` | Clear transcript state (finalText, partialText, utterances, segments). |
| `error` | `Error` \| `null` | Latest error, if any. |
| `finalize` | (`options?`) => `void` | Request the server to finalize current non-final tokens. |
| `finalText` | `string` | Accumulated finalized text. |
| `finalTokens` | readonly `RealtimeToken`\[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with `partialTokens` for the complete ordered stream. |
| `groups` | `Readonly`\<`Record`\<`string`, `TokenGroup`>> | Tokens grouped by the active `groupBy` strategy. Auto-populated when `translation` config is provided: - `one_way` → keys: `"original"`, `"translation"` - `two_way` → keys: language codes (e.g. `"en"`, `"es"`) Empty `{}` when no grouping is active. |
| `isActive` | `boolean` | `true` when state is not idle/stopped/canceled/error. |
| `isPaused` | `boolean` | `true` when `state === 'paused'`. |
| `isRecording` | `boolean` | `true` when `state === 'recording'`. |
| `isSourceMuted` | `boolean` | `true` when the audio source is muted externally (e.g. OS-level or hardware mute). |
| `isSupported` | `boolean` | Whether the built-in browser `MicrophoneSource` is available. Custom `AudioSource` implementations work regardless of this value. |
| `partialText` | `string` | Text from current non-final tokens. |
| `partialTokens` | readonly `RealtimeToken`\[] | Non-final tokens from the latest result. |
| `pause` | () => `void` | Pause recording — pauses audio capture and activates keepalive. |
| `result` | `RealtimeResult` \| `null` | Latest raw result from the server. |
| `resume` | () => `void` | Resume recording after pause. |
| `segments` | readonly `RealtimeSegment`\[] | Accumulated final segments. |
| `start` | () => `void` | Start a new recording. Aborts any in-flight recording first. |
| `state` | `RecordingState` | Current recording lifecycle state. |
| `stop` | () => `Promise`\<`void`> | Gracefully stop — waits for final results from the server. |
| `text` | `string` | Full transcript: `finalText + partialText`. |
| `tokens` | readonly `RealtimeToken`\[] | Tokens from the latest result message. |
| `unsupportedReason` | [`UnsupportedReason`](types#unsupportedreason) \| `undefined` | Why the built-in `MicrophoneSource` is unavailable, if applicable. Custom `AudioSource` implementations bypass this check entirely. |
| `utterances` | readonly `RealtimeUtterance`\[] | Accumulated utterances (one per endpoint). |
***
## AudioLevel()
```ts
function AudioLevel(__namedParameters): ReactNode;
```
**Parameters**
| Parameter | Type |
| ------------------- | ------------------------------------------ |
| `__namedParameters` | [`AudioLevelProps`](types#audiolevelprops) |
**Returns**
`ReactNode`
***
## SonioxProvider()
```ts
function SonioxProvider(props): ReactNode;
```
**Parameters**
| Parameter | Type |
| --------- | -------------------------------------------------- |
| `props` | [`SonioxProviderProps`](types#sonioxproviderprops) |
**Returns**
`ReactNode`
***
## checkAudioSupport()
```ts
function checkAudioSupport(): AudioSupportResult;
```
Check whether the current environment supports the built-in browser
`MicrophoneSource` (which uses `navigator.mediaDevices.getUserMedia`).
This does **not** reflect general recording capability — custom `AudioSource`
implementations (e.g. for React Native) bypass this check entirely and can
record regardless of the result.
**Returns**
[`AudioSupportResult`](types#audiosupportresult)
**Platform**
browser
***
## useAudioLevel()
```ts
function useAudioLevel(options?): UseAudioLevelReturn;
```
**Parameters**
| Parameter | Type |
| ---------- | ---------------------------------------------------- |
| `options?` | [`UseAudioLevelOptions`](types#useaudioleveloptions) |
**Returns**
[`UseAudioLevelReturn`](types#useaudiolevelreturn)
***
## useMicrophonePermission()
```ts
function useMicrophonePermission(options?): MicrophonePermissionState;
```
**Parameters**
| Parameter | Type |
| ---------- | ------------------------------------------------------------------------ |
| `options?` | [`UseMicrophonePermissionOptions`](types#usemicrophonepermissionoptions) |
**Returns**
[`MicrophonePermissionState`](types#microphonepermissionstate)
***
## useRecording()
```ts
function useRecording(config): UseRecordingReturn;
```
**Parameters**
| Parameter | Type |
| --------- | ------------------------------------------------ |
| `config` | [`UseRecordingConfig`](types#userecordingconfig) |
**Returns**
[`UseRecordingReturn`](types#userecordingreturn)
***
## useSoniox()
```ts
function useSoniox(): SonioxClient;
```
Returns the `SonioxClient` instance provided by the nearest `SonioxProvider`
**Returns**
`SonioxClient`
**Throws**
Error if called outside a `SonioxProvider`
# Classes
URL: /stt/SDKs/web-SDK/reference/classes
Soniox Client SDK — Class Reference
## SonioxClient
Main entry point for the Soniox client SDK.
### Example
```typescript
const client = new SonioxClient({
api_key: async () => {
const res = await fetch('/api/get-temporary-key', { method: 'POST' });
return (await res.json()).api_key;
},
});
// High-level: record from microphone
const recording = client.realtime.record({ model: 'stt-rt-v4' });
recording.on('result', (r) => console.log(r.tokens));
await recording.stop();
// Low-level: direct session access
const session = client.realtime.stt({ model: 'stt-rt-v4' }, { api_key: key });
await session.connect();
```
### permissions
```ts
get permissions(): PermissionResolver | undefined;
```
Permission resolver, if configured.
Returns `undefined` if no resolver was provided (SSR-safe).
**Example**
```typescript
const mic = await client.permissions?.check('microphone');
if (mic?.status === 'denied') {
showSettingsMessage();
}
```
**Returns**
[`PermissionResolver`](types#permissionresolver) | `undefined`
### Constructor
```ts
new SonioxClient(options): SonioxClient;
```
**Parameters**
| Parameter | Type |
| --------- | -------------------------------------------------- |
| `options` | [`SonioxClientOptions`](types#sonioxclientoptions) |
**Returns**
`SonioxClient`
### Properties
| Property | Type | Description |
| ----------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `realtime` | \{ `record`: (`options`) => [`Recording`](classes#recording); `stt`: (`config`, `options`) => `RealtimeSttSession`; } | Real-time API namespace |
| `realtime.record` | (`options`) => [`Recording`](classes#recording) | Start a high-level recording session. Returns synchronously so callers can attach event listeners before any async work (key fetch, mic access, connection) begins. |
| `realtime.stt` | (`config`, `options`) => `RealtimeSttSession` | Create a low-level STT session |
***
## Recording
High-level recording orchestrator
Manages the lifecycle of audio capture and real-time transcription:
1. Starts audio source immediately (buffers chunks)
2. Resolves the API key (from string or async function)
3. Connects to the Soniox WebSocket API
4. Drains buffered audio, then pipes live audio to the session
### Example
```typescript
const recording = client.realtime.record({ model: 'stt-rt-v4' });
recording.on('result', (r) => console.log(r.tokens));
recording.on('error', (e) => console.error(e));
// Later:
await recording.stop();
```
### state
```ts
get state(): RecordingState;
```
Current recording state
**Returns**
[`RecordingState`](types#recordingstate)
### cancel()
```ts
cancel(): void;
```
Immediately cancel recording without waiting for final results
**Returns**
`void`
***
### finalize()
```ts
finalize(options?): void;
```
Request the server to finalize current non-final tokens.
**Parameters**
| Parameter | Type |
| ------------------------------ | -------------------------------------- |
| `options?` | \{ `trailing_silence_ms?`: `number`; } |
| `options.trailing_silence_ms?` | `number` |
**Returns**
`void`
***
### off()
```ts
off(event, handler): this;
```
Remove an event handler
**Type Parameters**
| Type Parameter |
| -------------------------------------------------------------- |
| `E` *extends* keyof [`RecordingEvents`](types#recordingevents) |
**Parameters**
| Parameter | Type |
| --------- | ------------------------------------------------ |
| `event` | `E` |
| `handler` | [`RecordingEvents`](types#recordingevents)\[`E`] |
**Returns**
`this`
***
### on()
```ts
on(event, handler): this;
```
Register an event handler
**Type Parameters**
| Type Parameter |
| -------------------------------------------------------------- |
| `E` *extends* keyof [`RecordingEvents`](types#recordingevents) |
**Parameters**
| Parameter | Type |
| --------- | ------------------------------------------------ |
| `event` | `E` |
| `handler` | [`RecordingEvents`](types#recordingevents)\[`E`] |
**Returns**
`this`
***
### once()
```ts
once(event, handler): this;
```
Register a one-time event handler
**Type Parameters**
| Type Parameter |
| -------------------------------------------------------------- |
| `E` *extends* keyof [`RecordingEvents`](types#recordingevents) |
**Parameters**
| Parameter | Type |
| --------- | ------------------------------------------------ |
| `event` | `E` |
| `handler` | [`RecordingEvents`](types#recordingevents)\[`E`] |
**Returns**
`this`
***
### pause()
```ts
pause(): void;
```
Pause recording.
Pauses the audio source (stops microphone capture) and pauses the
session (activates automatic keepalive to prevent server disconnect).
**Returns**
`void`
***
### resume()
```ts
resume(): void;
```
Resume recording after pause.
Resumes the audio source and session. Audio capture and transmission
continue from where they left off.
**Returns**
`void`
***
### stop()
```ts
stop(): Promise;
```
Gracefully stop recording
Stops the audio source and waits for the server to process all
buffered audio and return final results.
**Returns**
`Promise`\<`void`>
Promise that resolves when the server acknowledges completion
***
## MicrophoneSource
Browser microphone audio source
Uses `navigator.mediaDevices.getUserMedia` to capture audio from the microphone
and `MediaRecorder` to encode it into chunks.
### Example
```typescript
const source = new MicrophoneSource();
await source.start({
onData: (chunk) => session.sendAudio(chunk),
onError: (err) => console.error(err),
});
// Later:
source.stop();
```
### Constructor
```ts
new MicrophoneSource(options): MicrophoneSource;
```
**Parameters**
| Parameter | Type |
| --------- | ---------------------------------------------------------- |
| `options` | [`MicrophoneSourceOptions`](types#microphonesourceoptions) |
**Returns**
`MicrophoneSource`
### pause()
```ts
pause(): void;
```
Pause audio capture
**Returns**
`void`
***
### resume()
```ts
resume(): void;
```
Resume audio capture
**Returns**
`void`
***
### start()
```ts
start(handlers): Promise;
```
Request microphone access and start recording
**Parameters**
| Parameter | Type |
| ---------- | -------------------------------------------------- |
| `handlers` | [`AudioSourceHandlers`](types#audiosourcehandlers) |
**Returns**
`Promise`\<`void`>
**Throws**
AudioUnavailableError if getUserMedia or MediaRecorder is not supported
**Throws**
AudioPermissionError if microphone access is denied
**Throws**
AudioDeviceError if no microphone is found
***
### stop()
```ts
stop(): void;
```
Stop recording and release all resources
**Returns**
`void`
***
## BrowserPermissionResolver
Browser permission resolver for checking and requesting microphone access.
### Example
```typescript
const resolver = new BrowserPermissionResolver();
const mic = await resolver.check('microphone');
if (mic.status === 'prompt') {
const result = await resolver.request('microphone');
if (result.status === 'denied') {
showDeniedMessage();
}
}
```
### Constructor
```ts
new BrowserPermissionResolver(): BrowserPermissionResolver;
```
**Returns**
`BrowserPermissionResolver`
### check()
```ts
check(permission): Promise;
```
Check current microphone permission status without prompting the user.
**Parameters**
| Parameter | Type |
| ------------ | -------------- |
| `permission` | `"microphone"` |
**Returns**
`Promise`\<[`PermissionResult`](types#permissionresult)>
***
### request()
```ts
request(permission): Promise;
```
Request microphone permission from the user.
This may show a browser permission prompt.
**Parameters**
| Parameter | Type |
| ------------ | -------------- |
| `permission` | `"microphone"` |
**Returns**
`Promise`\<[`PermissionResult`](types#permissionresult)>
***
## AudioPermissionError
Thrown when microphone access is denied by the user or blocked by the browser.
Maps to `getUserMedia` `NotAllowedError` DOMException.
### Extends
* `SonioxError`
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
```ts
SonioxError.toJSON
```
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
```ts
SonioxError.toString
```
### Properties
| Property | Type | Description |
| ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## AudioDeviceError
Thrown when no audio input device is found
Maps to `getUserMedia` `NotFoundError` DOMException.
### Extends
* `SonioxError`
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
```ts
SonioxError.toJSON
```
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
```ts
SonioxError.toString
```
### Properties
| Property | Type | Description |
| ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
***
## AudioUnavailableError
Thrown when audio capture is not supported in the current environment
For example, when `getUserMedia` or `MediaRecorder` is not available.
### Extends
* `SonioxError`
### toJSON()
```ts
toJSON(): Record;
```
Converts to a plain object for logging/serialization
**Returns**
`Record`\<`string`, `unknown`>
**Inherited from**
```ts
SonioxError.toJSON
```
***
### toString()
```ts
toString(): string;
```
Creates a human-readable string representation
**Returns**
`string`
**Inherited from**
```ts
SonioxError.toString
```
### Properties
| Property | Type | Description |
| ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cause` | `unknown` | The underlying error that caused this error, if any. |
| `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. |
| `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). |
# Full Web SDK reference
URL: /stt/SDKs/web-SDK/reference
Full SDK reference for the Web SDK
## Client
### Available client methods
| Method | Description |
| --------------------------------------------------------------------------- | --------------------------- |
| [`client.realtime.record()`](/stt/SDKs/web-SDK/reference/classes#recording) | Create a recording instance |
## Recording
### Available recording methods
| Method | Description |
| ---------------------------------------------------------------------- | ------------------------------------------------------- |
| [`recording.finalize()`](/stt/SDKs/web-SDK/reference/classes#finalize) | Request the server to finalize current non-final tokens |
| [`recording.on()`](/stt/SDKs/web-SDK/reference/classes#on) | Register an event handler |
| [`recording.once()`](/stt/SDKs/web-SDK/reference/classes#once) | Register a one-time event handler |
| [`recording.off()`](/stt/SDKs/web-SDK/reference/classes#off) | Remove an event handler |
| [`recording.pause()`](/stt/SDKs/web-SDK/reference/classes#pause) | Pause recording |
| [`recording.resume()`](/stt/SDKs/web-SDK/reference/classes#resume) | Resume recording |
| [`recording.stop()`](/stt/SDKs/web-SDK/reference/classes#stop) | Stop recording |
| [`recording.cancel()`](/stt/SDKs/web-SDK/reference/classes#cancel) | Cancel recording |
## AudioSource
### Available audio source methods
| Method | Description |
| ------------------------------------------------------------------ | ---------------------- |
| [`source.start()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Start capturing audio |
| [`source.stop()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Stop capturing audio |
| [`source.pause()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Pause capturing audio |
| [`source.resume()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Resume capturing audio |
## PermissionResolver
### Available browser permission resolver methods
| Method | Description |
| ---------------------------------------------------------------------------- | -------------------------------- |
| [`resolver.check()`](/stt/SDKs/web-SDK/reference/types#permissionresolver) | Check current permission status |
| [`resolver.request()`](/stt/SDKs/web-SDK/reference/types#permissionresolver) | Request permission from the user |
# Types
URL: /stt/SDKs/web-SDK/reference/types
Soniox Client SDK — Types Reference
## ApiKeyConfig
```ts
type ApiKeyConfig = string | () => Promise;
```
API key configuration.
* `string` - A pre-fetched temporary API key (e.g., injected from SSR)
* `() => Promise` - An async function that fetches a fresh temporary key
from your backend. Called once per recording session.
**Example**
```typescript
// Static key (for demos or SSR-injected keys)
const client = new SonioxClient({ api_key: 'temp:...' });
// Async function (recommended for production)
const client = new SonioxClient({
api_key: async () => {
const res = await fetch('/api/get-temporary-key', { method: 'POST' });
const { api_key } = await res.json();
return api_key;
},
});
```
Note: If you use Node.js, you can use the `SonioxNodeClient` to fetch a temporary API key via `client.auth.createTemporaryKey()`.
***
## AudioErrorCode
```ts
type AudioErrorCode = "permission_denied" | "device_not_found" | "audio_unavailable";
```
Error codes for audio-related errors
***
## AudioSourceHandlers
```ts
type AudioSourceHandlers = {
onData: (chunk) => void;
onError: (error) => void;
onMuted?: () => void;
onUnmuted?: () => void;
};
```
Callbacks for receiving audio data and errors from an AudioSource.
**Properties**
| Property | Type | Description |
| ------------ | ------------------- | ---------------------------------------------------------------------------------- |
| `onData` | (`chunk`) => `void` | Called when an audio chunk is available. |
| `onError` | (`error`) => `void` | Called when a runtime error occurs during audio capture (after start). |
| `onMuted?` | () => `void` | Called when the audio source is muted externally (e.g. OS-level or hardware mute). |
| `onUnmuted?` | () => `void` | Called when the audio source is unmuted after an external mute. |
***
## MicrophoneSourceOptions
```ts
type MicrophoneSourceOptions = {
constraints?: MediaTrackConstraints;
recorderOptions?: MediaRecorderOptions;
timesliceMs?: number;
};
```
Options for MicrophoneSource
**Properties**
| Property | Type | Description |
| ------------------ | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `constraints?` | `MediaTrackConstraints` | MediaTrackConstraints for the audio track. **Default** `{ echoCancellation: false, noiseSuppression: false, autoGainControl: false, channelCount: 1, sampleRate: 44100 }` |
| `recorderOptions?` | `MediaRecorderOptions` | MediaRecorder options. **See** [https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/MediaRecorder](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/MediaRecorder) |
| `timesliceMs?` | `number` | Time interval in milliseconds between audio data chunks. **Default** `60` |
***
## PermissionResult
```ts
type PermissionResult = {
can_request: boolean;
status: PermissionStatus;
};
```
Result of a permission check or request.
**Properties**
| Property | Type | Description |
| ------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `can_request` | `boolean` | Whether the user can be prompted again. `false` means permanently denied (e.g., browser "Block" or iOS settings). Useful for showing "go to settings" instructions. |
| `status` | [`PermissionStatus`](types#permissionstatus) | Current permission status. |
***
## PermissionStatus
```ts
type PermissionStatus = "granted" | "denied" | "prompt" | "unavailable";
```
Unified permission status across all platforms.
***
## PermissionType
```ts
type PermissionType = "microphone";
```
Permission types supported by the resolver.
***
## RecordOptions
```ts
type RecordOptions = SttSessionConfig & {
buffer_queue_size?: number;
session_options?: SttSessionOptions;
signal?: AbortSignal;
source?: AudioSource;
};
```
Options for creating a recording
**Type Declaration**
| Name | Type | Description |
| -------------------- | ---------------------------------- | -------------------------------------------------------------------------------------------- |
| `buffer_queue_size?` | `number` | Maximum number of audio chunks to buffer while waiting for key/connection **Default** `1000` |
| `session_options?` | `SttSessionOptions` | SDK-level session options (signal, etc.) |
| `signal?` | `AbortSignal` | AbortSignal for cancellation |
| `source?` | [`AudioSource`](types#audiosource) | Audio source to use. Defaults to MicrophoneSource if not provided. |
***
## RecordingEvents
```ts
type RecordingEvents = {
connected: () => void;
endpoint: () => void;
error: (error) => void;
finalized: () => void;
finished: () => void;
result: (result) => void;
source_muted: () => void;
source_unmuted: () => void;
state_change: (update) => void;
token: (token) => void;
};
```
Events emitted by a Recording instance
**Properties**
| Property | Type | Description |
| ---------------- | -------------------- | ------------------------------------------------------------------- |
| `connected` | () => `void` | WebSocket connected and ready. |
| `endpoint` | () => `void` | Endpoint detected (speaker finished talking). |
| `error` | (`error`) => `void` | Error occurred during recording. |
| `finalized` | () => `void` | Finalization complete. |
| `finished` | () => `void` | Recording finished (server acknowledged end of stream). |
| `result` | (`result`) => `void` | Parsed result received from the server. |
| `source_muted` | () => `void` | Audio source was muted externally (e.g. OS-level or hardware mute). |
| `source_unmuted` | () => `void` | Audio source was unmuted after an external mute. |
| `state_change` | (`update`) => `void` | Recording state transition. |
| `token` | (`token`) => `void` | Individual token received. |
***
## RecordingState
```ts
type RecordingState =
| "idle"
| "starting"
| "connecting"
| "recording"
| "paused"
| "stopping"
| "stopped"
| "error"
| "canceled";
```
Unified recording lifecycle states.
***
## SonioxClientOptions
```ts
type SonioxClientOptions = {
api_key: ApiKeyConfig;
buffer_queue_size?: number;
default_session_options?: SttSessionOptions;
permissions?: PermissionResolver;
ws_base_url?: string;
};
```
Options for creating a SonioxClient instance.
**Properties**
| Property | Type | Description |
| -------------------------- | ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `api_key` | [`ApiKeyConfig`](types#apikeyconfig) | API key configuration. - `string` - A pre-fetched temporary API key (e.g., injected from SSR) - `() => Promise` - Async function that fetches a fresh key from your backend |
| `buffer_queue_size?` | `number` | Default maximum number of audio chunks to buffer while waiting for key/connection. Can be overridden per-recording. **Default** `1000` |
| `default_session_options?` | `SttSessionOptions` | Default session options applied to all sessions. Can be overridden per-recording. |
| `permissions?` | [`PermissionResolver`](types#permissionresolver) | Optional permission resolver for pre-flight microphone permission checks. Not set by default (SSR-safe, RN-safe). **Example** `import { BrowserPermissionResolver } from '@soniox/client'; const client = new SonioxClient({ api_key: fetchKey, permissions: new BrowserPermissionResolver(), });` |
| `ws_base_url?` | `string` | WebSocket URL for real-time connections. **Default** `'wss://stt-rt.soniox.com/transcribe-websocket'` |
***
## SttOptions
```ts
type SttOptions = {
api_key: string;
session_options?: SttSessionOptions;
};
```
Options for creating a low-level STT session.
**Properties**
| Property | Type | Description |
| ------------------ | ------------------- | ---------------------------------------- |
| `api_key` | `string` | Resolved API key string (temporary key). |
| `session_options?` | `SttSessionOptions` | Session options (signal, etc.). |
***
## AudioSource
Platform-agnostic audio source interface.
Implementations must:
* Begin capturing audio in `start()` and deliver chunks via `handlers.onData`
* Stop all capture and release resources in `stop()`
* Throw typed errors from `start()` if capture cannot begin (e.g., permission denied)
**Example**
```typescript
// Built-in browser source
const source = new MicrophoneSource();
// Custom source (e.g., React Native)
class MyAudioSource implements AudioSource {
async start(handlers: AudioSourceHandlers) { ... }
stop() { ... }
}
```
**Methods**
**pause()?**
```ts
optional pause(): void;
```
Pause audio capture (optional).
When paused, no data should be delivered via onData.
**Returns**
`void`
***
**resume()?**
```ts
optional resume(): void;
```
Resume audio capture after pause (optional).
**Returns**
`void`
***
**start()**
```ts
start(handlers): Promise;
```
Start capturing audio.
**Parameters**
| Parameter | Type | Description |
| ---------- | -------------------------------------------------- | ----------------------------------- |
| `handlers` | [`AudioSourceHandlers`](types#audiosourcehandlers) | Callbacks for audio data and errors |
**Returns**
`Promise`\<`void`>
**Throws**
AudioPermissionError if microphone access is denied
**Throws**
AudioDeviceError if no audio device is found
**Throws**
AudioUnavailableError if audio capture is not supported
***
**stop()**
```ts
stop(): void;
```
Stop capturing audio and release all resources.
Safe to call multiple times.
**Returns**
`void`
***
## PermissionResolver
Platform-agnostic permission resolver.
Implementations handle platform-specific permission APIs:
* Browser: `navigator.permissions.query` + `getUserMedia`
* React Native: `expo-av` or `react-native-permissions`
**Example**
```typescript
// Check before recording
const mic = await resolver.check('microphone');
if (mic.status === 'denied' && !mic.can_request) {
showGoToSettingsMessage();
}
```
**Methods**
**check()**
```ts
check(permission): Promise;
```
Check current permission status WITHOUT prompting the user.
**Parameters**
| Parameter | Type |
| ------------ | -------------- |
| `permission` | `"microphone"` |
**Returns**
`Promise`\<[`PermissionResult`](types#permissionresult)>
***
**request()**
```ts
request(permission): Promise;
```
Request permission from the user (may show a system prompt).
On platforms where status is already 'granted', this is a no-op.
**Parameters**
| Parameter | Type |
| ------------ | -------------- |
| `permission` | `"microphone"` |
**Returns**
`Promise`\<[`PermissionResult`](types#permissionresult)>
***
## resolveApiKey()
```ts
function resolveApiKey(config): Promise;
```
Resolves an ApiKeyConfig to a plain API key string.
**Parameters**
| Parameter | Type | Description |
| --------- | ------------------------------------ | ------------------------- |
| `config` | [`ApiKeyConfig`](types#apikeyconfig) | The API key configuration |
**Returns**
`Promise`\<`string`>
The resolved API key string
**Throws**
If the function rejects or returns a non-string value
# Create temporary API key
URL: /stt/api-reference/auth/create_temporary_api_key
Creates a short-lived API key for specific temporary use cases. The key will automatically expire after the specified duration.
## Create temporary API key
**Endpoint:** `POST /v1/auth/temporary-api-key`
Creates a short-lived API key for specific temporary use cases. The key will automatically expire after the specified duration.
### Request Body
Content-Type: `application/json` (Required)
Example (JSON):
```json
{
"client_reference_id": "reference_id",
"expires_in_seconds": 1800,
"usage_type": "transcribe_websocket"
}
```
Schema (YAML Structural Definition):
```yaml
properties:
usage_type:
description: Intended usage of the temporary API key.
enum:
- transcribe_websocket
type: string
expires_in_seconds:
description: Duration in seconds until the temporary API key expires.
maximum: 3600
minimum: 1
type: integer
client_reference_id:
anyOf:
- maxLength: 256
type: string
- type: 'null'
description: Optional tracking identifier string. Does not need to be unique.
required:
- usage_type
- expires_in_seconds
type: object
```
### Responses
* **201**: Created temporary API key.
Example (JSON):
```json
{
"api_key": "temp:WYJ67RBEFUWQXXPKYPD2UGXKWB",
"expires_at": "2025-02-22T22:47:37.150Z"
}
```
Schema (YAML Structural Definition):
```yaml
properties:
api_key:
description: Created temporary API key.
type: string
expires_at:
description: UTC timestamp indicating when generated temporary API key will expire.
format: date-time
type: string
required:
- api_key
- expires_at
type: object
```
* **400**: Invalid request.
Error types:
* `invalid_request`: Invalid request.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Delete file
URL: /stt/api-reference/files/delete_file
Permanently deletes specified file.
## Delete file
**Endpoint:** `DELETE /v1/files/{file_id}`
Permanently deletes specified file.
### Parameters
* `file_id` (path) (Required):
### Responses
* **204**: File deleted.
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **404**: File not found.
Error types:
* `file_not_found`: File could not be found.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Get file
URL: /stt/api-reference/files/get_file
Retrieve metadata for an uploaded file.
## Get file
**Endpoint:** `GET /v1/files/{file_id}`
Retrieve metadata for an uploaded file.
### Parameters
* `file_id` (path) (Required):
### Responses
* **200**: File metadata.
Example (JSON):
```json
{
"client_reference_id": "some_internal_id",
"created_at": "2024-11-26T00:00:00Z",
"filename": "example.mp3",
"id": "84c32fc6-4fb5-4e7a-b656-b5ec70493753",
"size": 123456
}
```
Schema (YAML Structural Definition):
```yaml
description: File metadata.
properties:
id:
description: Unique identifier of the file.
format: uuid
type: string
filename:
description: Name of the file.
type: string
size:
description: Size of the file in bytes.
type: integer
created_at:
description: UTC timestamp indicating when the file was uploaded.
format: date-time
type: string
client_reference_id:
anyOf:
- type: string
- type: 'null'
description: Tracking identifier string.
required:
- id
- filename
- size
- created_at
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **404**: File not found.
Error types:
* `file_not_found`: File could not be found.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Get files
URL: /stt/api-reference/files/get_files
Retrieves list of uploaded files.
## Get files
**Endpoint:** `GET /v1/files`
Retrieves list of uploaded files.
### Parameters
* `limit` (query): Maximum number of files to return.
* `cursor` (query): Pagination cursor for the next page of results.
### Responses
* **200**: List of files.
Example (JSON):
```json
{
"files": [
{
"created_at": "2024-11-26T00:00:00Z",
"filename": "example.mp3",
"id": "84c32fc6-4fb5-4e7a-b656-b5ec70493753",
"size": 123456
}
],
"next_page_cursor": "cursor_or_null"
}
```
Schema (YAML Structural Definition):
```yaml
description: A list of files.
properties:
files:
description: List of uploaded files.
items:
description: File metadata.
example:
client_reference_id: some_internal_id
created_at: '2024-11-26T00:00:00Z'
filename: example.mp3
id: 84c32fc6-4fb5-4e7a-b656-b5ec70493753
size: 123456
properties:
id:
description: Unique identifier of the file.
format: uuid
type: string
filename:
description: Name of the file.
type: string
size:
description: Size of the file in bytes.
type: integer
created_at:
description: UTC timestamp indicating when the file was uploaded.
format: date-time
type: string
client_reference_id:
anyOf:
- type: string
- type: 'null'
description: Tracking identifier string.
required:
- id
- filename
- size
- created_at
type: object
type: array
next_page_cursor:
anyOf:
- type: string
- type: 'null'
description: >-
A pagination token that references the next page of results. When more
data is available, this field contains a value to pass in the cursor
parameter of a subsequent request. When null, no additional results are
available.
required:
- files
type: object
```
* **400**: Invalid request.
Error types:
* `invalid_cursor`: Invalid cursor parameter.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Upload file
URL: /stt/api-reference/files/upload_file
Uploads a new file.
## Upload file
**Endpoint:** `POST /v1/files`
Uploads a new file.
### Request Body
Content-Type: `multipart/form-data` (Required)
Schema (YAML Structural Definition):
```yaml
type: object
properties:
client_reference_id:
anyOf:
- maxLength: 256
type: string
- type: 'null'
description: Optional tracking identifier string. Does not need to be unique.
file:
description: >-
The file to upload. Original file name will be used unless a custom
filename is provided.
format: binary
type: string
required:
- file
```
### Responses
* **201**: Uploaded file.
Example (JSON):
```json
{
"client_reference_id": "some_internal_id",
"created_at": "2024-11-26T00:00:00Z",
"filename": "example.mp3",
"id": "84c32fc6-4fb5-4e7a-b656-b5ec70493753",
"size": 123456
}
```
Schema (YAML Structural Definition):
```yaml
description: File metadata.
properties:
id:
description: Unique identifier of the file.
format: uuid
type: string
filename:
description: Name of the file.
type: string
size:
description: Size of the file in bytes.
type: integer
created_at:
description: UTC timestamp indicating when the file was uploaded.
format: date-time
type: string
client_reference_id:
anyOf:
- type: string
- type: 'null'
description: Tracking identifier string.
required:
- id
- filename
- size
- created_at
type: object
```
* **400**: Invalid request.
Error types:
* `invalid_request`:
* Invalid request.
* Exceeded maximum file size (maximum is 1073741824 bytes).
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Get models
URL: /stt/api-reference/models/get_models
Retrieves list of available models and their attributes.
## Get models
**Endpoint:** `GET /v1/models`
Retrieves list of available models and their attributes.
### Responses
* **200**: List of available models and their attributes.
Example (JSON):
```json
{
"models": [
{
"aliased_model_id": null,
"context_version": 2,
"id": "stt-rt-v4",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Real-time v4",
"one_way_translation": "all_languages",
"supports_language_hints_strict": true,
"supports_max_endpoint_delay": true,
"transcription_mode": "real_time",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": null,
"context_version": 2,
"id": "stt-rt-v3",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Real-time v3",
"one_way_translation": "all_languages",
"supports_language_hints_strict": true,
"supports_max_endpoint_delay": false,
"transcription_mode": "real_time",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": null,
"context_version": 2,
"id": "stt-async-v4",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Async v4",
"one_way_translation": "all_languages",
"supports_language_hints_strict": true,
"supports_max_endpoint_delay": false,
"transcription_mode": "async",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": null,
"context_version": 2,
"id": "stt-async-v3",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Async v3",
"one_way_translation": "all_languages",
"supports_language_hints_strict": false,
"supports_max_endpoint_delay": false,
"transcription_mode": "async",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": "stt-rt-v3",
"context_version": 2,
"id": "stt-rt-preview",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Real-time Preview",
"one_way_translation": "all_languages",
"supports_language_hints_strict": true,
"supports_max_endpoint_delay": false,
"transcription_mode": "real_time",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": "stt-async-v3",
"context_version": 2,
"id": "stt-async-preview",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Async Preview",
"one_way_translation": "all_languages",
"supports_language_hints_strict": false,
"supports_max_endpoint_delay": false,
"transcription_mode": "async",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": "stt-rt-v3",
"context_version": 2,
"id": "stt-rt-v3-preview",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Real-time v3 Preview",
"one_way_translation": "all_languages",
"supports_language_hints_strict": true,
"supports_max_endpoint_delay": false,
"transcription_mode": "real_time",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": "stt-rt-v3",
"context_version": 2,
"id": "stt-rt-preview-v2",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Real-time Preview v2",
"one_way_translation": "all_languages",
"supports_language_hints_strict": true,
"supports_max_endpoint_delay": false,
"transcription_mode": "real_time",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
},
{
"aliased_model_id": "stt-async-v3",
"context_version": 2,
"id": "stt-async-preview-v1",
"languages": [
{
"code": "af",
"name": "Afrikaans"
},
{
"code": "sq",
"name": "Albanian"
},
{
"code": "ar",
"name": "Arabic"
},
{
"code": "az",
"name": "Azerbaijani"
},
{
"code": "eu",
"name": "Basque"
},
{
"code": "be",
"name": "Belarusian"
},
{
"code": "bn",
"name": "Bengali"
},
{
"code": "bs",
"name": "Bosnian"
},
{
"code": "bg",
"name": "Bulgarian"
},
{
"code": "ca",
"name": "Catalan"
},
{
"code": "zh",
"name": "Chinese"
},
{
"code": "hr",
"name": "Croatian"
},
{
"code": "cs",
"name": "Czech"
},
{
"code": "da",
"name": "Danish"
},
{
"code": "nl",
"name": "Dutch"
},
{
"code": "en",
"name": "English"
},
{
"code": "et",
"name": "Estonian"
},
{
"code": "fi",
"name": "Finnish"
},
{
"code": "fr",
"name": "French"
},
{
"code": "gl",
"name": "Galician"
},
{
"code": "de",
"name": "German"
},
{
"code": "el",
"name": "Greek"
},
{
"code": "gu",
"name": "Gujarati"
},
{
"code": "he",
"name": "Hebrew"
},
{
"code": "hi",
"name": "Hindi"
},
{
"code": "hu",
"name": "Hungarian"
},
{
"code": "id",
"name": "Indonesian"
},
{
"code": "it",
"name": "Italian"
},
{
"code": "ja",
"name": "Japanese"
},
{
"code": "kn",
"name": "Kannada"
},
{
"code": "kk",
"name": "Kazakh"
},
{
"code": "ko",
"name": "Korean"
},
{
"code": "lv",
"name": "Latvian"
},
{
"code": "lt",
"name": "Lithuanian"
},
{
"code": "mk",
"name": "Macedonian"
},
{
"code": "ms",
"name": "Malay"
},
{
"code": "ml",
"name": "Malayalam"
},
{
"code": "mr",
"name": "Marathi"
},
{
"code": "no",
"name": "Norwegian"
},
{
"code": "fa",
"name": "Persian"
},
{
"code": "pl",
"name": "Polish"
},
{
"code": "pt",
"name": "Portuguese"
},
{
"code": "pa",
"name": "Punjabi"
},
{
"code": "ro",
"name": "Romanian"
},
{
"code": "ru",
"name": "Russian"
},
{
"code": "sr",
"name": "Serbian"
},
{
"code": "sk",
"name": "Slovak"
},
{
"code": "sl",
"name": "Slovenian"
},
{
"code": "es",
"name": "Spanish"
},
{
"code": "sw",
"name": "Swahili"
},
{
"code": "sv",
"name": "Swedish"
},
{
"code": "tl",
"name": "Tagalog"
},
{
"code": "ta",
"name": "Tamil"
},
{
"code": "te",
"name": "Telugu"
},
{
"code": "th",
"name": "Thai"
},
{
"code": "tr",
"name": "Turkish"
},
{
"code": "uk",
"name": "Ukrainian"
},
{
"code": "ur",
"name": "Urdu"
},
{
"code": "vi",
"name": "Vietnamese"
},
{
"code": "cy",
"name": "Welsh"
}
],
"name": "Speech-to-Text Async Preview v1",
"one_way_translation": "all_languages",
"supports_language_hints_strict": false,
"supports_max_endpoint_delay": false,
"transcription_mode": "async",
"translation_targets": [],
"two_way_translation": "all_languages",
"two_way_translation_pairs": []
}
]
}
```
Schema (YAML Structural Definition):
```yaml
properties:
models:
description: List of available models and their attributes.
items:
properties:
id:
description: Unique identifier of the model.
type: string
aliased_model_id:
anyOf:
- type: string
- type: 'null'
description: If this is an alias, the id of the aliased model.
name:
description: Name of the model.
type: string
context_version:
anyOf:
- type: integer
- type: 'null'
description: Version of context supported.
transcription_mode:
description: Transcription mode of the model.
enum:
- real_time
- async
type: string
languages:
description: List of languages supported by the model.
items:
properties:
code:
description: 2-letter language code.
type: string
name:
description: Language name.
type: string
required:
- code
- name
type: object
type: array
supports_language_hints_strict:
type: boolean
supports_max_endpoint_delay:
type: boolean
translation_targets:
description: >-
List of supported one-way translation targets. If list is empty,
check for one_way_translation field
items:
properties:
target_language:
type: string
source_languages:
items:
type: string
type: array
exclude_source_languages:
items:
type: string
type: array
required:
- target_language
- source_languages
- exclude_source_languages
type: object
type: array
two_way_translation_pairs:
description: >-
List of supported two-way translation pairs. If list is empty,
check for two_way_translation field
items:
type: string
type: array
one_way_translation:
anyOf:
- type: string
- type: 'null'
description: >-
When contains string 'all_languages', any laguage from languages can
be used
two_way_translation:
anyOf:
- type: string
- type: 'null'
description: >-
When contains string 'all_languages',' any laguage pair from
languages can be used
required:
- id
- aliased_model_id
- name
- context_version
- transcription_mode
- languages
- supports_language_hints_strict
- supports_max_endpoint_delay
- translation_targets
- two_way_translation_pairs
- one_way_translation
- two_way_translation
type: object
type: array
required:
- models
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Create transcription
URL: /stt/api-reference/transcriptions/create_transcription
Creates a new transcription.
## Create transcription
**Endpoint:** `POST /v1/transcriptions`
Creates a new transcription.
### Request Body
Content-Type: `application/json` (Required)
Schema (YAML Structural Definition):
```yaml
properties:
model:
description: Speech-to-text model to use for the transcription.
maxLength: 32
type: string
audio_url:
anyOf:
- maxLength: 4096
pattern: ^https?://[^\s]+$
type: string
- type: 'null'
description: >-
URL of the audio file to transcribe. Cannot be specified if `file_id` is
specified.
file_id:
anyOf:
- format: uuid
type: string
- type: 'null'
description: >-
ID of the uploaded file to transcribe. Cannot be specified if `audio_url`
is specified.
language_hints:
anyOf:
- items:
maxLength: 10
type: string
maxItems: 100
type: array
- type: 'null'
description: >-
Expected languages in the audio. If not specified, languages are
automatically detected.
language_hints_strict:
anyOf:
- type: boolean
- type: 'null'
description: When `true`, the model will rely more on language hints.
enable_speaker_diarization:
anyOf:
- type: boolean
- type: 'null'
description: >-
When `true`, speakers are identified and separated in the transcription
output.
enable_language_identification:
anyOf:
- type: boolean
- type: 'null'
description: When `true`, language is detected for each part of the transcription.
translation:
anyOf:
- properties:
type:
enum:
- one_way
- two_way
type: string
target_language:
anyOf:
- type: string
- type: 'null'
language_a:
anyOf:
- type: string
- type: 'null'
language_b:
anyOf:
- type: string
- type: 'null'
required:
- type
type: object
- type: 'null'
description: Translation configuration.
context:
anyOf:
- properties:
general:
anyOf:
- items:
properties:
key:
description: Item key (e.g. "Domain").
type: string
value:
description: Item value (e.g. "medicine").
type: string
required:
- key
- value
type: object
type: array
- type: 'null'
description: General context items.
text:
anyOf:
- type: string
- type: 'null'
description: Text context.
terms:
anyOf:
- items:
type: string
type: array
- type: 'null'
description: Terms that might occur in speech.
translation_terms:
anyOf:
- items:
properties:
source:
description: Source term.
type: string
target:
description: Target term to translate to.
type: string
required:
- source
- target
type: object
type: array
- type: 'null'
description: >-
Hints how to translate specific terms. Ignored if translation is
not enabled.
type: object
- type: string
- type: 'null'
description: >-
Additional context to improve transcription accuracy and formatting of
specialized terms.
webhook_url:
anyOf:
- maxLength: 256
pattern: ^https?://[^\s]+$
type: string
- type: 'null'
description: >-
URL to receive webhook notifications when transcription is completed or
fails.
webhook_auth_header_name:
anyOf:
- maxLength: 256
type: string
- type: 'null'
description: Name of the authentication header sent with webhook notifications.
webhook_auth_header_value:
anyOf:
- maxLength: 256
type: string
- type: 'null'
description: Authentication header value sent with webhook notifications.
client_reference_id:
anyOf:
- maxLength: 256
type: string
- type: 'null'
description: Optional tracking identifier string. Does not need to be unique.
required:
- model
type: object
```
### Responses
* **201**: Created transcription.
Example (JSON):
```json
{
"audio_duration_ms": 0,
"audio_url": "https://soniox.com/media/examples/coffee_shop.mp3",
"client_reference_id": "some_internal_id",
"created_at": "2024-11-26T00:00:00Z",
"error_message": null,
"error_type": null,
"file_id": null,
"filename": "coffee_shop.mp3",
"id": "73d4357d-cad2-4338-a60d-ec6f2044f721",
"language_hints": [
"en",
"fr"
],
"model": "stt-async-preview",
"status": "queued",
"webhook_auth_header_name": "Authorization",
"webhook_auth_header_value": "******************",
"webhook_status_code": null,
"webhook_url": "https://example.com/webhook"
}
```
Schema (YAML Structural Definition):
```yaml
description: A transcription.
properties:
id:
description: Unique identifier for the transcription request.
format: uuid
type: string
status:
description: Transcription status.
enum:
- queued
- processing
- completed
- error
type: string
created_at:
description: UTC timestamp indicating when the transcription was created.
format: date-time
type: string
model:
description: Speech-to-text model used for the transcription.
type: string
audio_url:
anyOf:
- type: string
- type: 'null'
description: URL of the file being transcribed.
file_id:
anyOf:
- format: uuid
type: string
- type: 'null'
description: ID of the file being transcribed.
filename:
description: Name of the file being transcribed.
type: string
language_hints:
anyOf:
- items:
type: string
type: array
- type: 'null'
description: >-
Expected languages in the audio. If not specified, languages are
automatically detected.
enable_speaker_diarization:
description: >-
When `true`, speakers are identified and separated in the transcription
output.
type: boolean
enable_language_identification:
description: When `true`, language is detected for each part of the transcription.
type: boolean
audio_duration_ms:
anyOf:
- type: integer
- type: 'null'
description: >-
Duration of the audio in milliseconds. Only available after processing
begins.
error_type:
anyOf:
- type: string
- type: 'null'
description: >-
Error type if transcription failed. `null` for successful or in-progress
transcriptions.
error_message:
anyOf:
- type: string
- type: 'null'
description: >-
Error message if transcription failed. `null` for successful or
in-progress transcriptions.
webhook_url:
anyOf:
- type: string
- type: 'null'
description: >-
URL to receive webhook notifications when transcription is completed or
fails.
webhook_auth_header_name:
anyOf:
- type: string
- type: 'null'
description: Name of the authentication header sent with webhook notifications.
webhook_auth_header_value:
anyOf:
- type: string
- type: 'null'
description: >-
Authentication header value. Always returned masked as
`******************`.
webhook_status_code:
anyOf:
- type: integer
- type: 'null'
description: >-
HTTP status code received from your server when webhook was delivered.
`null` if not yet sent.
client_reference_id:
anyOf:
- type: string
- type: 'null'
description: Tracking identifier string.
required:
- id
- status
- created_at
- model
- filename
- enable_speaker_diarization
- enable_language_identification
type: object
```
* **400**: Invalid request.
Error types:
* `invalid_request`: Invalid request.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Delete transcription
URL: /stt/api-reference/transcriptions/delete_transcription
Permanently deletes a transcription and its associated files. Cannot delete transcriptions that are currently processing.
## Delete transcription
**Endpoint:** `DELETE /v1/transcriptions/{transcription_id}`
Permanently deletes a transcription and its associated files. Cannot delete transcriptions that are currently processing.
### Parameters
* `transcription_id` (path) (Required):
### Responses
* **204**: Transcription deleted.
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **404**: Transcription not found.
Error types:
* `transcription_not_found`: Transcription could not be found.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **409**: Invalid transcription state.
Error types:
* `transcription_invalid_state`:
* Cannot delete transcription with processing status.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Get transcription
URL: /stt/api-reference/transcriptions/get_transcription
Retrieves detailed information about a specific transcription.
## Get transcription
**Endpoint:** `GET /v1/transcriptions/{transcription_id}`
Retrieves detailed information about a specific transcription.
### Parameters
* `transcription_id` (path) (Required):
### Responses
* **200**: Transcription details.
Example (JSON):
```json
{
"audio_duration_ms": 0,
"audio_url": "https://soniox.com/media/examples/coffee_shop.mp3",
"client_reference_id": "some_internal_id",
"created_at": "2024-11-26T00:00:00Z",
"error_message": null,
"error_type": null,
"file_id": null,
"filename": "coffee_shop.mp3",
"id": "73d4357d-cad2-4338-a60d-ec6f2044f721",
"language_hints": [
"en",
"fr"
],
"model": "stt-async-preview",
"status": "queued",
"webhook_auth_header_name": "Authorization",
"webhook_auth_header_value": "******************",
"webhook_status_code": null,
"webhook_url": "https://example.com/webhook"
}
```
Schema (YAML Structural Definition):
```yaml
description: A transcription.
properties:
id:
description: Unique identifier for the transcription request.
format: uuid
type: string
status:
description: Transcription status.
enum:
- queued
- processing
- completed
- error
type: string
created_at:
description: UTC timestamp indicating when the transcription was created.
format: date-time
type: string
model:
description: Speech-to-text model used for the transcription.
type: string
audio_url:
anyOf:
- type: string
- type: 'null'
description: URL of the file being transcribed.
file_id:
anyOf:
- format: uuid
type: string
- type: 'null'
description: ID of the file being transcribed.
filename:
description: Name of the file being transcribed.
type: string
language_hints:
anyOf:
- items:
type: string
type: array
- type: 'null'
description: >-
Expected languages in the audio. If not specified, languages are
automatically detected.
enable_speaker_diarization:
description: >-
When `true`, speakers are identified and separated in the transcription
output.
type: boolean
enable_language_identification:
description: When `true`, language is detected for each part of the transcription.
type: boolean
audio_duration_ms:
anyOf:
- type: integer
- type: 'null'
description: >-
Duration of the audio in milliseconds. Only available after processing
begins.
error_type:
anyOf:
- type: string
- type: 'null'
description: >-
Error type if transcription failed. `null` for successful or in-progress
transcriptions.
error_message:
anyOf:
- type: string
- type: 'null'
description: >-
Error message if transcription failed. `null` for successful or
in-progress transcriptions.
webhook_url:
anyOf:
- type: string
- type: 'null'
description: >-
URL to receive webhook notifications when transcription is completed or
fails.
webhook_auth_header_name:
anyOf:
- type: string
- type: 'null'
description: Name of the authentication header sent with webhook notifications.
webhook_auth_header_value:
anyOf:
- type: string
- type: 'null'
description: >-
Authentication header value. Always returned masked as
`******************`.
webhook_status_code:
anyOf:
- type: integer
- type: 'null'
description: >-
HTTP status code received from your server when webhook was delivered.
`null` if not yet sent.
client_reference_id:
anyOf:
- type: string
- type: 'null'
description: Tracking identifier string.
required:
- id
- status
- created_at
- model
- filename
- enable_speaker_diarization
- enable_language_identification
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **404**: Transcription not found.
Error types:
* `transcription_not_found`: Transcription could not be found.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Get transcription transcript
URL: /stt/api-reference/transcriptions/get_transcription_transcript
Retrieves the full transcript text and detailed tokens for a completed transcription. Only available for successfully completed transcriptions.
## Get transcription transcript
**Endpoint:** `GET /v1/transcriptions/{transcription_id}/transcript`
Retrieves the full transcript text and detailed tokens for a completed transcription. Only available for successfully completed transcriptions.
### Parameters
* `transcription_id` (path) (Required):
### Responses
* **200**: Transcription transcript.
Example (JSON):
```json
{
"id": "19b6d61d-02db-4c25-bc71-b4094dc310c8",
"text": "Hello",
"tokens": [
{
"confidence": 0.95,
"end_ms": 90,
"start_ms": 10,
"text": "Hel"
},
{
"confidence": 0.98,
"end_ms": 160,
"start_ms": 110,
"text": "lo"
}
]
}
```
Schema (YAML Structural Definition):
```yaml
description: The transcription text.
properties:
id:
description: Unique identifier of the transcription this transcript belongs to.
format: uuid
type: string
text:
description: Complete transcribed text content.
type: string
tokens:
description: List of detailed token information with timestamps and metadata.
items:
description: The transcript token.
example:
confidence: 0.95
end_ms: 90
start_ms: 10
text: Hel
properties:
text:
description: Token text content.
type: string
start_ms:
description: Start time of the token in milliseconds.
type: integer
end_ms:
description: End time of the token in milliseconds.
type: integer
confidence:
description: Confidence score of the token, between 0.0 and 1.0.
type: number
speaker:
anyOf:
- type: string
- type: 'null'
description: >-
Speaker identifier. Only present when speaker diarization is
enabled.
language:
anyOf:
- type: string
- type: 'null'
description: >-
Detected language code for this token. Only present when language
identification is enabled.
is_audio_event:
anyOf:
- type: boolean
- type: 'null'
description: >-
Boolean indicating if this token represents an audio event. Only
present when audio event detection is enabled.
translation_status:
anyOf:
- type: string
- type: 'null'
description: >-
Translation status ("none", "original" or "translation"). Only when
if translation is enabled.
required:
- text
- start_ms
- end_ms
- confidence
type: object
type: array
required:
- id
- text
- tokens
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **404**: Transcription not found.
Error types:
* `transcription_not_found`: Transcription could not be found.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **409**: Invalid transcription state.
Error types:
* `transcription_invalid_state`:
* Can only get transcript with completed status.
* File transcription has failed.
* Transcript no longer available.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
# Get transcriptions
URL: /stt/api-reference/transcriptions/get_transcriptions
Retrieves list of transcriptions.
## Get transcriptions
**Endpoint:** `GET /v1/transcriptions`
Retrieves list of transcriptions.
### Parameters
* `limit` (query): Maximum number of transcriptions to return.
* `cursor` (query): Pagination cursor for the next page of results.
### Responses
* **200**: A list of transcriptions.
Schema (YAML Structural Definition):
```yaml
properties:
transcriptions:
description: List of transcriptions.
items:
description: A transcription.
example:
audio_duration_ms: 0
audio_url: https://soniox.com/media/examples/coffee_shop.mp3
client_reference_id: some_internal_id
created_at: '2024-11-26T00:00:00Z'
error_message: null
error_type: null
file_id: null
filename: coffee_shop.mp3
id: 73d4357d-cad2-4338-a60d-ec6f2044f721
language_hints:
- en
- fr
model: stt-async-preview
status: queued
webhook_auth_header_name: Authorization
webhook_auth_header_value: '******************'
webhook_status_code: null
webhook_url: https://example.com/webhook
properties:
id:
description: Unique identifier for the transcription request.
format: uuid
type: string
status:
description: Transcription status.
enum:
- queued
- processing
- completed
- error
type: string
created_at:
description: UTC timestamp indicating when the transcription was created.
format: date-time
type: string
model:
description: Speech-to-text model used for the transcription.
type: string
audio_url:
anyOf:
- type: string
- type: 'null'
description: URL of the file being transcribed.
file_id:
anyOf:
- format: uuid
type: string
- type: 'null'
description: ID of the file being transcribed.
filename:
description: Name of the file being transcribed.
type: string
language_hints:
anyOf:
- items:
type: string
type: array
- type: 'null'
description: >-
Expected languages in the audio. If not specified, languages are
automatically detected.
enable_speaker_diarization:
description: >-
When `true`, speakers are identified and separated in the
transcription output.
type: boolean
enable_language_identification:
description: >-
When `true`, language is detected for each part of the
transcription.
type: boolean
audio_duration_ms:
anyOf:
- type: integer
- type: 'null'
description: >-
Duration of the audio in milliseconds. Only available after
processing begins.
error_type:
anyOf:
- type: string
- type: 'null'
description: >-
Error type if transcription failed. `null` for successful or
in-progress transcriptions.
error_message:
anyOf:
- type: string
- type: 'null'
description: >-
Error message if transcription failed. `null` for successful or
in-progress transcriptions.
webhook_url:
anyOf:
- type: string
- type: 'null'
description: >-
URL to receive webhook notifications when transcription is completed
or fails.
webhook_auth_header_name:
anyOf:
- type: string
- type: 'null'
description: Name of the authentication header sent with webhook notifications.
webhook_auth_header_value:
anyOf:
- type: string
- type: 'null'
description: >-
Authentication header value. Always returned masked as
`******************`.
webhook_status_code:
anyOf:
- type: integer
- type: 'null'
description: >-
HTTP status code received from your server when webhook was
delivered. `null` if not yet sent.
client_reference_id:
anyOf:
- type: string
- type: 'null'
description: Tracking identifier string.
required:
- id
- status
- created_at
- model
- filename
- enable_speaker_diarization
- enable_language_identification
type: object
type: array
next_page_cursor:
anyOf:
- type: string
- type: 'null'
description: >-
A pagination token that references the next page of results. When more
data is available, this field contains a value to pass in the cursor
parameter of a subsequent request. When null, no additional results are
available.
required:
- transcriptions
type: object
```
* **400**: Invalid request.
Error types:
* `invalid_cursor`: Invalid cursor parameter.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **401**: Authentication error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```
* **500**: Internal server error.
Schema (YAML Structural Definition):
```yaml
properties:
status_code:
type: integer
error_type:
type: string
message:
type: string
validation_errors:
items:
properties:
error_type:
type: string
location:
type: string
message:
type: string
required:
- error_type
- location
- message
type: object
type: array
request_id:
type: string
required:
- status_code
- error_type
- message
- validation_errors
- request_id
type: object
```