# Soniox docs Documentation from the [soniox.com/docs](https://soniox.com/docs) website. ## Links to docs content pages: - [Community and support](https://soniox.com/docs/community-and-support) - [FAQ](https://soniox.com/docs/faq) - [Introduction](https://soniox.com/docs/) - [AI engineering](https://soniox.com/docs/stt/ai-engineering) - [Data residency](https://soniox.com/docs/stt/data-residency) - [Get started](https://soniox.com/docs/stt/get-started) - [Models](https://soniox.com/docs/stt/models) - [Security and privacy](https://soniox.com/docs/stt/security-and-privacy) - [React Native SDK](https://soniox.com/docs/stt/SDKs/react-native-SDK) - [Async transcription](https://soniox.com/docs/stt/async/async-transcription) - [Async translation](https://soniox.com/docs/stt/async/async-translation) - [Error handling](https://soniox.com/docs/stt/async/error-handling) - [Limits & quotas](https://soniox.com/docs/stt/async/limits-and-quotas) - [Webhooks](https://soniox.com/docs/stt/async/webhooks) - [Confidence scores](https://soniox.com/docs/stt/concepts/confidence-scores) - [Context](https://soniox.com/docs/stt/concepts/context) - [Language hints](https://soniox.com/docs/stt/concepts/language-hints) - [Language identification](https://soniox.com/docs/stt/concepts/language-identification) - [Language restrictions](https://soniox.com/docs/stt/concepts/language-restrictions) - [Speaker diarization](https://soniox.com/docs/stt/concepts/speaker-diarization) - [Supported languages](https://soniox.com/docs/stt/concepts/supported-languages) - [Timestamps](https://soniox.com/docs/stt/concepts/timestamps) - [Soniox Live](https://soniox.com/docs/stt/demo-apps/soniox-live) - [API reference](https://soniox.com/docs/stt/api-reference) - [WebSocket API](https://soniox.com/docs/stt/api-reference/websocket-api) - [Connection keepalive](https://soniox.com/docs/stt/rt/connection-keepalive) - [Endpoint detection](https://soniox.com/docs/stt/rt/endpoint-detection) - [Error handling](https://soniox.com/docs/stt/rt/error-handling) - [Limits & quotas](https://soniox.com/docs/stt/rt/limits-and-quotas) - [Manual finalization](https://soniox.com/docs/stt/rt/manual-finalization) - [Real-time transcription](https://soniox.com/docs/stt/rt/real-time-transcription) - [Real-time translation](https://soniox.com/docs/stt/rt/real-time-translation) - [Community integrations](https://soniox.com/docs/stt/integrations/community-integrations) - [Integrations](https://soniox.com/docs/stt/integrations) - [LiveKit](https://soniox.com/docs/stt/integrations/livekit) - [n8n](https://soniox.com/docs/stt/integrations/n8n) - [Pipecat](https://soniox.com/docs/stt/integrations/pipecat) - [TanStack AI SDK](https://soniox.com/docs/stt/integrations/tanstack-ai-sdk) - [Twilio](https://soniox.com/docs/stt/integrations/twilio) - [Vercel AI SDK](https://soniox.com/docs/stt/integrations/vercel-ai-sdk) - [Direct stream](https://soniox.com/docs/stt/guides/direct-stream) - [Proxy stream](https://soniox.com/docs/stt/guides/proxy-stream) - [Async transcription with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/async-transcription) - [Handling files with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/files) - [Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK) - [Real-time transcription with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/realtime-transcription) - [Handling webhooks with Node SDK](https://soniox.com/docs/stt/SDKs/node-SDK/webhooks) - [Async transcription with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/async-transcription) - [Handling files with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/files) - [Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK) - [Real-time transcription with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/realtime-transcription) - [Sync vs async clients](https://soniox.com/docs/stt/SDKs/python-SDK/sync-vs-async-clients) - [Handling webhooks with Python SDK](https://soniox.com/docs/stt/SDKs/python-SDK/webhooks) - [React SDK](https://soniox.com/docs/stt/SDKs/react-SDK) - [Real-time transcription with React SDK](https://soniox.com/docs/stt/SDKs/react-SDK/realtime-transcription) - [Web SDK](https://soniox.com/docs/stt/SDKs/web-SDK) - [Real-time transcription with Web SDK](https://soniox.com/docs/stt/SDKs/web-SDK/realtime-transcription) - [LangChain.js (JavaScript)](https://soniox.com/docs/stt/integrations/langchain/langchain-js) - [LangChain (Python)](https://soniox.com/docs/stt/integrations/langchain/langchain) - [Classes](https://soniox.com/docs/stt/SDKs/node-SDK/reference/classes) - [Full Node SDK reference](https://soniox.com/docs/stt/SDKs/node-SDK/reference) - [Types](https://soniox.com/docs/stt/SDKs/node-SDK/reference/types) - [Async Client](https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/async_client) - [Realtime Client](https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/realtime_client) - [Types](https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/types) - [Full React SDK reference](https://soniox.com/docs/stt/SDKs/react-SDK/reference) - [Types](https://soniox.com/docs/stt/SDKs/react-SDK/reference/types) - [Classes](https://soniox.com/docs/stt/SDKs/web-SDK/reference/classes) - [Full Web SDK reference](https://soniox.com/docs/stt/SDKs/web-SDK/reference) - [Types](https://soniox.com/docs/stt/SDKs/web-SDK/reference/types) - [Create temporary API key](https://soniox.com/docs/stt/api-reference/auth/create_temporary_api_key) - [Delete file](https://soniox.com/docs/stt/api-reference/files/delete_file) - [Get file](https://soniox.com/docs/stt/api-reference/files/get_file) - [Get files](https://soniox.com/docs/stt/api-reference/files/get_files) - [Upload file](https://soniox.com/docs/stt/api-reference/files/upload_file) - [Get models](https://soniox.com/docs/stt/api-reference/models/get_models) - [Create transcription](https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription) - [Delete transcription](https://soniox.com/docs/stt/api-reference/transcriptions/delete_transcription) - [Get transcription](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription) - [Get transcription transcript](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription_transcript) - [Get transcriptions](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcriptions) # Community and support URL: /community-and-support Engage with our community to explore new updates, participate in discussions, contribute to our projects, and report any issues you encounter. import { LinkCard } from "@/components/link-card"; import { GitHubIcon } from "@/components/github-icon"; ## Support We offer three levels of support depending on your plan: **Free**
Community-driven support through our [Discord server](https://discord.gg/rWfnk9uM5j). **Business**
Priority email support and onboarding assistance.
Please contact [sales@soniox.com](mailto:sales@soniox.com) for more information. **Enterprise**
Dedicated support channels, defined response-time SLAs, and escalation paths for production deployments.
Please contact [sales@soniox.com](mailto:sales@soniox.com) for more information. *** ## Github We use GitHub to track issues related to official Soniox SDKs and integrations. Check our [Soniox GitHub](https://github.com/soniox) profile for all available code. *** ## Website For more information about our products, pricing or Soniox in general, visit out [website](https://soniox.com/). # FAQ URL: /faq Common troubleshooting guidance and answers for integrating with the Soniox API. This page answers common questions related to integrating with the Soniox API. High WebSocket connection startup time is usually caused by one or more of the following: * **Network latency:** High round-trip time between your client and Soniox increases the duration of the TLS handshake and WebSocket upgrade. * **Region selection:** Using an endpoint located far from your compute environment adds unnecessary cross-region latency during connection establishment. See [Data residency](/stt/data-residency) for more info. * **Large initial context payload:** Sending a large [context](/stt/concepts/context) during initialization delays readiness because the server must fully receive and process the payload before the session becomes active. To minimize perceived startup delay, you should always **buffer audio locally before the WebSocket connection is established** and immediately stream all buffered audio chunks after sending the initial configuration message. You can request a limit increase from the [Soniox Console organization limits](https://console.soniox.com/org/limits) page. Requests are reviewed within 1-3 business days. Yes. Soniox provides standard legal and compliance documentation for companies integrating the Soniox API into their products or services. This may include an MSA, DPA, and security or compliance documentation required for procurement or security review processes. Documentation is available for Business and Enterprise customers. Please contact [sales@soniox.com](mailto:sales@soniox.com) to request access or begin the review process. # Introduction URL: / Soniox provides powerful, production-ready APIs for transcribing, translating, and understanding audio content. import { LinkCards, SpeechToTextIcon } from "@/components/link-card"; import { Step, Steps } from "fumadocs-ui/components/steps"; ## Get started with Soniox APIs Welcome to Soniox — the fastest, most accurate platform for audio and speech intelligence. Soniox provides powerful, production-ready APIs for transcribing, translating, and understanding audio content. Whether you are building real-time voice interfaces, analyzing large volumes of audio, or extracting structured insights from speech, Soniox gives you the tools to do it efficiently and at scale. You can integrate Soniox into your product, workflow, or pipeline using simple REST or WebSocket APIs, with support for multiple SDKs and real-time streaming. ## Products , description: "Transcribe and translate speech in 60+ languages with world-leading accuracy and real-time performance. Supports file and real-time modes, high-quality translation with super-low latency, speaker diarization, and advanced customization.", arrowInSeparateLine: true, titleSize: "text-xl", }, ]} /> ## Before you begin To start using Soniox, create a [Soniox account](https://console.soniox.com/signup/). Visit the [Soniox Console](https://console.soniox.com/) to generate and manage API keys, view usage, logs, and billing. Soniox Console is your self-service control center for everything Soniox. # AI engineering URL: /stt/ai-engineering Using MCP, AI assistant, and LLMs with Soniox for AI-powered development import Image from "next/image"; Soniox provides easy-to-use AI tools that help you explore documentation, generate code, and get guidance, even if you're new to programming. These tools work directly with your coding environment, so you can focus on building instead of searching for answers. With Soniox AI engineering, you can: * Browse documentation via the **MCP server** without leaving your coding tools * Ask the **AI assistant** for explanations, examples, or code help * Use **LLM context files** so AI models understand Soniox APIs and examples * Copy page content or open it directly in your preferred AI tool These features reduce friction, help you learn faster, and make working with Soniox APIs simple and efficient. *** ## MCP server The **MCP server** lets you access Soniox documentation right from tools like Cursor, Windsurf, or Claude Code. You can search guides, view examples, and explore APIs without switching windows. ### How to set it up Add the following configuration to your coding tool: ```json "soniox-docs": { "command": "npx", "args": [ "-y", "mcp-remote", "https://soniox.com/docs/api/mcp/mcp" ] } ```
[![Install MCP Server](https://cursor.com/deeplink/mcp-install-light.svg)](https://cursor.com/en/install-mcp?name=soniox-docs\&config=eyJjb21tYW5kIjoibnB4IC15IG1jcC1yZW1vdGUgaHR0cHM6Ly9zb25pb3guY29tL2RvY3MvYXBpL21jcC9tY3AifQ%3D%3D)
[![Install MCP Server](https://cursor.com/deeplink/mcp-install-dark.svg)](https://cursor.com/en/install-mcp?name=soniox-docs\&config=eyJjb21tYW5kIjoibnB4IC15IG1jcC1yZW1vdGUgaHR0cHM6Ly9zb25pb3guY29tL2RvY3MvYXBpL21jcC9tY3AifQ%3D%3D)
Follow your tool's instructions for adding a remote server. Once set up, you can quickly explore Soniox docs and code samples from within your coding environment. *** ## AI assistant The **Soniox AI assistant** is available directly from the docs. It can: * Answer questions about Soniox APIs * Explain example code or suggest modifications * Provide guidance in context, so you don't need to guess Even if you're new to programming, the AI assistant can help you understand code and API workflows quickly. *** ## LLM context files Soniox provides two files that give AI models context about our APIs and examples: * [llms.txt](/llms.txt) – core context for general tasks * [llms-full.txt](/llms-full.txt) – extended context for advanced workflows Adding these files to your AI tool ensures the model can provide accurate, context-aware help. *** ## Copy and open buttons Copy button At the top of each documentation page, the **Copy page** button makes it easy to bring content into your workflow: * **Copy Markdown** – copy the full page content instantly * **Open in ChatGPT or Claude** – send the page context for live AI interaction These features help you experiment and learn by bringing examples and documentation directly into your coding environment. *** For more information about Soniox products, pricing, or general resources, visit our [website](https://soniox.com/). # Data residency URL: /stt/data-residency Learn about data residency. Soniox keeps your data yours. Any content you send to the Soniox API — audio, transcripts, or metadata — is **never used to train or improve our models.** For more information, see our [Security and privacy](/stt/security-and-privacy). *** ## What is data residency Data residency lets you choose **where** Soniox processes and stores your content. When you select a region for a project, **all audio and transcript data for that project stays in that region** — for both processing and storage. To get access to regional deployments, contact us: [sales@soniox.com](mailto:sales@soniox.com). *** ## How data residency works When data residency is enabled for your account: * You choose a **region** when creating a new project. * Any API requests made using that project's API key are handled **fully within the selected region.** * All **content data** (audio + transcripts) remains within that region for processing and storage. ### System data Data residency **does not apply to system data** such as account and project metadata, usage statistics, and billing data. This system data may be processed outside the selected region. **Your content (audio + transcripts) never leaves the region you choose.** *** ## Using data residency Data residency is set **per project** within your Soniox organization. ### 1. Create a project with a region When creating a new project: * Select the region from the **region** dropdown. * Each project receives region-specific API keys. ### 2. Use the region-specific API domain To ensure processing stays in the region, use: * The **API key** from the regional project. * The **correct API domain** for that region (see below). *** ## Regional endpoints | Region | Regional storage | Regional processing | Capabilities | API domain | | ------------------ | ---------------- | ------------------- | ------------------ | ------------------------------------------------- | | **United States** | ✅ Yes | ✅ Yes | Full API supported | `api.soniox.com`
`stt-rt.soniox.com` | | **European Union** | ✅ Yes | ✅ Yes | Full API supported | `api.eu.soniox.com`
`stt-rt.eu.soniox.com` | | **Japan** | ✅ Yes | ✅ Yes | Full API supported | `api.jp.soniox.com`
`stt-rt.jp.soniox.com` | If you'd like help enabling data residency or need a custom region, reach out: [sales@soniox.com](mailto:sales@soniox.com). # Get started URL: /stt/get-started Learn how to use Soniox Speech-to-Text API. ## Learn how to use Soniox API in minutes Soniox Speech-to-Text is a **universal speech AI** that lets you transcribe and translate speech in 60+ languages — from recorded files (async) or live audio streams (real-time). Languages can be freely mixed within the same conversation, and Soniox will handle them seamlessly with high accuracy and low latency. In just a few steps, you can run your first transcription or translation. The examples also cover advanced features such as speaker diarization, real-time translation, context customization, and automatic language identification — all through the same simple API. ### Get API key Create a [Soniox account](https://console.soniox.com/signup) and log in to the [Console](https://console.soniox.com) to get your API key. API keys are created per project. In the Console, go to **My First Project** and click **API Keys** to generate one. Export it as an environment variable (replace with your key): ```sh title="Terminal" export SONIOX_API_KEY= ``` ### Get examples Clone the official examples repo: ```sh title="Terminal" git clone https://github.com/soniox/soniox_examples cd soniox_examples/speech_to_text ``` ### Run examples Choose your language and run the ready-to-use examples below. {/* TABLE START */} {/* NOTE: Width is set so that we have maximum of 2 lines in 'Example' column. */} {/* NOTE: Font size is set so the table doesn't look "too big". */}
|
Example
| What it does | Output | | ------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- | | **Real-time
transcription** | Transcribes speech in any language in
real-time. | Transcript streamed to console. | | **Real-time
one-way translation** | Transcribes speech in any language and translates it into Spanish in real-time. | Transcript + Spanish translation streamed together. | | **Real-time
two-way translation** | Transcribes speech in any language and translates English ↔ Spanish in real-time. Spanish → English, English → Spanish. | Transcript + bidirectional translations streamed together. | | **Transcribe file from URL** | Transcribes an audio file directly from a public URL. | Transcript printed to console. | | **Transcribe local file** | Uploads and transcribes an audio file from your computer. | Transcript printed to console. |
{/* TABLE END */} {/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal" cd python_sdk # Set up environment python3 -m venv venv source venv/bin/activate pip install -r requirements.txt # Real-time examples python soniox_sdk_realtime.py --audio_path ../assets/coffee_shop.mp3 python soniox_sdk_realtime.py --audio_path ../assets/coffee_shop.mp3 --translation one_way python soniox_sdk_realtime.py --audio_path ../assets/two_way_translation.mp3 --translation two_way # Async examples python soniox_sdk_async.py --audio_url "https://soniox.com/media/examples/coffee_shop.mp3" python soniox_sdk_async.py --audio_path ../assets/coffee_shop.mp3 ``` {/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal" cd nodejs_sdk # Install dependencies npm install # Real-time examples node soniox_sdk_realtime.js --audio_path ../assets/coffee_shop.mp3 node soniox_sdk_realtime.js --audio_path ../assets/coffee_shop.mp3 --translation one_way node soniox_sdk_realtime.js --audio_path ../assets/two_way_translation.mp3 --translation two_way # Async examples node soniox_sdk_async.js --audio_url "https://soniox.com/media/examples/coffee_shop.mp3" node soniox_sdk_async.js --audio_path ../assets/coffee_shop.mp3 ``` {/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal" cd python # Set up environment python3 -m venv venv source venv/bin/activate pip install -r requirements.txt # Real-time examples python soniox_realtime.py --audio_path ../assets/coffee_shop.mp3 python soniox_realtime.py --audio_path ../assets/coffee_shop.mp3 --translation one_way python soniox_realtime.py --audio_path ../assets/two_way_translation.mp3 --translation two_way # Async examples python soniox_async.py --audio_url "https://soniox.com/media/examples/coffee_shop.mp3" python soniox_async.py --audio_path ../assets/coffee_shop.mp3 ``` {/* NOTE: Empty tag is needed so code block renders correctly */}
```sh title="Terminal" cd nodejs # Install dependencies npm install # Real-time examples node soniox_realtime.js --audio_path ../assets/coffee_shop.mp3 node soniox_realtime.js --audio_path ../assets/coffee_shop.mp3 --translation one_way node soniox_realtime.js --audio_path ../assets/two_way_translation.mp3 --translation two_way # Async examples node soniox_async.js --audio_url "https://soniox.com/media/examples/coffee_shop.mp3" node soniox_async.js --audio_path ../assets/coffee_shop.mp3 ``` ## Next steps * **Dive into the [Real-time API](/stt/rt/real-time-transcription)** → Run live transcription, translations, and endpoint detection. * **Explore the [Async API](/stt/async/async-transcription)** → Transcribe and translate (recorded) files at scale and integrate with webhooks. # Models URL: /stt/models Learn about latest models, changelog, and deprecations. Soniox Speech-to-Text **AI** provides multiple models for real-time and asynchronous transcription and translation. This page lists the currently available models, their capabilities, and important updates. *** ## Current models {/*TABLE START */} {/* NOTE: Width is set so that we have maximum of 2 lines in 'Example' column. */} {/* NOTE: Font size is set so the table doesn't look "too big".*/}
|
Model
| {" "}
Type
| Status | | -------------------------------------------- | ----------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | **stt-rt-v4** | Real-time | **Active** | | **stt-async-v4** | Async | **Active** | | **stt-rt-v3** | Real-time | **Active** (After 2026-02-28, requests will automatically route to `stt-rt-v4` with no service interruption. No API changes required.) | | **stt-async-v3** | Async | **Active** (After 2026-02-28, requests will automatically route to `stt-async-v4` with no service interruption. No API changes required.) |
{/* TABLE END */} *** ## Aliases Aliases provide a stable reference so you don’t need to change your code when newer versions are released. | Alias | Points to | Notes | | ------------------------ | -------------- | -------------------------------------------------- | | **stt-rt-v3-preview** | `stt-rt-v3` | Always points to the latest real-time active model | | **stt-rt-preview-v2** | `stt-rt-v3` | | | **stt-async-preview-v1** | `stt-async-v3` | | *** ## Changelog ### February 5, 2026 **New models:** stt-rt-v4 **Replaces:** stt-rt-v3 #### Overview **Soniox v4 Real-Time** is a next-generation real-time speech recognition model built for low-latency voice interactions. It delivers speaker-native accuracy across 60+ languages with improved latency, reliability, and conversational behavior. The model is production-ready and fully backward-compatible with v3 Real-Time. #### Key improvements * Higher accuracy across all supported languages * Better multilingual detection and mid-sentence language switching * Lower endpoint latency with faster final transcription * Improved semantic endpointing for more natural turn-taking * Lower manual finalization latency with faster final transcription * More stable, higher-quality transcription on long and multi-hour recordings * Stronger use of provided context for domain-specific accuracy * More fluent, accurate, and consistent translation across all supported languages * Added `max_endpoint_delay_ms` for controlling end-of-speech endpoint delay #### API compatibility * The stt-rt-v4 model is fully compatible with the existing stt-rt-v3 model and Soniox API * To upgrade, simply replace the model name in your API request: * `{ "model": "stt-rt-v4" }` for real-time #### Deprecation notice * The stt-rt-v3 model will be removed on February 28, 2026 * After February 28, 2026, requests will automatically route to stt-rt-v4 with no service interruption. No API changes required ### January 29, 2026 **New models:** stt-async-v4 **Replaces:** stt-async-v3 #### Overview **Soniox v4 Async** is the latest generation of Soniox’s asynchronous speech recognition and translation model. This release delivers a significant improvement in accuracy, robustness, and multilingual performance across more than 60 languages. v4 Async reaches human-parity transcription quality in real-world scenarios, while also introducing stronger long-form processing, improved speaker diarization, richer context handling, and higher-quality translation output. The model is designed for production-scale workloads and consistent, high-fidelity results across diverse acoustic environments and language mixes. #### Key improvements * Higher transcription accuracy across all languages, reaching speaker-native quality in many domains * More robust performance in noise, accents, overlapping speech, and poor audio * Better language identification and smoother mid-sentence language switching * Improved speaker separation and more consistent labeling in multi-speaker audio * Better normalization of dates, numbers, phone/email addresses, and other structured content * More stable, higher-quality transcription on long and multi-hour recordings * Stronger use of provided context for domain-specific accuracy * More fluent, accurate, and consistent translation across all supported languages #### API compatibility * The stt-async-v4 model is fully compatible with the existing stt-async-v3 model and Soniox API * To upgrade, simply replace the model name in your API request: * `{ "model": "stt-async-v4" }` for async #### Deprecation notice * The stt-async-v3 model will be removed on February 28, 2026 * After February 28, 2026, requests will automatically route to stt-async-v4 with no service interruption. No API changes required ### October 31, 2025 #### Model retirement and upgrade We have accelerated the retirement of older models following the overwhelmingly positive response to the new v3 models. The following models have been retired: * stt-async-preview-v1 * stt-rt-preview-v2 Both models have been **aliased to the new Soniox v3 models.** This means all existing requests using the old model names are now automatically served with v3, giving every user our most accurate, capable, and intelligent voice AI experience, without any code changes required. #### Context compatibility The context feature is now backward compatible with v3 models, ensuring smooth migration from older versions. However, we **strongly recommend updating to the new context** structure for best results and future flexibility. Learn more about [context](/stt/concepts/context). ### October 29, 2025 **Model update:** v3 enhancements **Applies to:** stt-rt-v3, stt-async-v3 #### New features * **Extended audio duration support:** both real-time (stt-rt-v3) and asynchronous (stt-async-v3) models now support **audio up to 5 hours** in a single request. #### Quality improvements * **Higher transcription accuracy** across challenging audio conditions and diverse languages. #### Notes * No API changes are required; existing integrations continue to work seamlessly. * For asynchronous processing, large files up to 5 hours can now be uploaded directly without chunking. * For real-time streaming, sessions up to 5 hours are supported under the same WebSocket connection. ### October 21, 2025 **New models:** stt-rt-v3, stt-async-v3 **Replaces:** stt-rt-preview-v2, stt-async-preview-v1 #### Overview The **v3 models** introduce major improvements across recognition, translation, and reasoning — making Soniox faster, more accurate, and more capable than ever before. These models power real-time and asynchronous speech processing in 60+ languages, with enhanced accuracy, robustness, and context understanding. #### Key improvements * Higher transcription accuracy across 60+ languages * Improved multilingual switching — seamless recognition when speakers change language mid-sentence * Significantly higher translation quality, especially for languages such as German and Korean * The async model now also supports translation * Support for new advanced structured context, enabling richer domain- and task-specific adaptation * Enhanced alphanumeric accuracy (addresses, IDs, codes, serials) * More accurate speaker diarization, even in overlapping speech * Extended maximum audio duration to 5 hours for both async and real-time models #### API compatibility * The v3 models are fully compatible with the existing Soniox API, if you are not using the context feature. * To upgrade, simply replace the model name in your API request: * `{ "model": "stt-rt-v3" }` for real-time * `{ "model": "stt-async-v3" }` for async * If you are using the context feature, update to the new structured [context](/stt/concepts/context) for improved accuracy. #### Deprecation notice The following preview models are **deprecated** and will be retired on **November 30, 2025:** * stt-async-preview-v1 * stt-rt-preview-v2 Please migrate to the v3 models before that date to ensure uninterrupted service. ### August 15, 2025 * Deprecated `stt-rt-preview-v1` ### August 5, 2025 * Released `stt-rt-preview-v2` * Higher transcription accuracy * Improved translation quality * Expanded to support all translation pairs * More reliable automatic language switching * **Replaces:** stt-rt-preview-v2, stt-async-preview-v1 # Security and privacy URL: /stt/security-and-privacy Learn about security and privacy policies. At Soniox, we take security and privacy seriously. Our platform is designed to keep your data protected while reducing compliance burdens for your business. This page outlines how Soniox handles data, meets compliance requirements, and ensures secure communication. *** ## Compliance Soniox meets industry-leading certification standards: * **SOC 2 Type 2** – auditing standard that evaluates an organization’s controls for security, availability, processing integrity, confidentiality, and privacy over an extended period of time. * **ISO/IEC 27001:2022** – internationally recognized standard for Information Security Management Systems (ISMS). * **GDPR** – European Union regulation that governs the collection, processing, and protection of personal data and privacy rights. * **HIPAA** – U.S. regulatory framework that establishes requirements for protecting sensitive healthcare data, including Protected Health Information (PHI). To request compliance documentation, contact us at [support@soniox.com](mailto:support@soniox.com). *** ## Data handling * **No model training** – your audio and transcripts are never used to improve Soniox models or services. * **No retention** – Soniox does not store your audio or transcript data unless explicitly requested through a service that supports storage, i.e. async API. * **Storage** – when you choose to store data, it is securely isolated within your Soniox Account. * **Data deletion** – you can delete all stored audio and transcripts at any time via the Soniox Console or API. *** ## Logging * Minimal logging is performed for service reliability, debugging, and billing. * Logs **never** contain raw audio or transcript content. * Diagnostic metadata (such as request IDs or error traces) may be retained temporarily for operational purposes. *** ## Encryption & security * **In transit** – all communication between your application and Soniox services is encrypted using **TLS 1.2+**. * **Access control** – stored data is restricted to your account namespace, accessible only by your API keys. # React Native SDK URL: /stt/SDKs/react-native-SDK Build speech-to-text workflows in React Native with real-time API. Soniox [React SDK](/stt/SDKs/react-SDK) works with React Native and Expo out of the box, providing the same hooks for real-time speech-to-text. It lets you: * Capture audio from the device microphone with a single hook * Stream audio to Soniox in real time * Receive transcription and translation results as reactive state ## Quickstart ### Install Install via your preferred package manager: ```bash tab npm install @soniox/react @soniox/client ``` ```bash tab yarn add @soniox/react @soniox/client ``` ```bash tab pnpm add @soniox/react @soniox/client ``` ```bash tab bun add @soniox/react @soniox/client ``` ### Setup you temporary API key endpoint In client enviroment (browser, mobile app, React Native, etc.), you don't want to expose your API key to the client. For this reason, you can create a temporary API key endpoint on your server and use it to issue temporary API keys for the client. Read more about using Temporary API keys with [React SDK](/stt/SDKs/react-SDK#setup-you-temporary-api-key-endpoint) ### Create a custom audio source Wrap any RN audio streaming library (e.g. `@siteed/expo-audio-studio`) with the `AudioSource` interface to stream PCM audio chunks to Soniox ```ts import type { AudioSource, AudioSourceHandlers } from "@soniox/client"; class MyAudioSource implements AudioSource { private handlers: AudioSourceHandlers | null = null; async start(handlers: AudioSourceHandlers): Promise { this.handlers = handlers; // Start your audio capture here. // Call handlers.onData(chunk) with each audio chunk as an ArrayBuffer. // Call handlers.onError(error) if something goes wrong. // Call handlers.onMuted?.() / handlers.onUnmuted?.() when the mic is // muted or unmuted externally (e.g. OS-level, hardware switch). } stop(): void { // Stop audio capture and release resources. this.handlers = null; } } ``` ### Create your first real-time session The core hooks (e.g. [`useRecording`](/stt/SDKs/react-SDK/realtime-transcription#userecording)) are platform-agnostic. To use them in React Native, provide a custom [`AudioSource`](/stt/SDKs/web-SDK/reference/types#audiosource) that streams PCM audio chunks ```ts import { SonioxProvider, useRecording } from "@soniox/react"; import { MyAudioSource } from "./MyAudioSource"; // Create a temporary API key endpoint on your server and use it to issue temporary API keys for the client. async function fetchApiKey() { const res = await fetch("/api/soniox-temporary-key", { method: "POST" }); const { api_key } = await res.json(); return api_key; } function App() { return ( // Wrap your app with a SonioxProvider and pass the temporary API key getter function ); } function Transcription() { // Create a custom audio source const source = useRef(new MyAudioSource()).current; // Create a recording session const { state, isActive, finalText, partialText, start, stop } = useRecording({ model: "stt-rt-v4", audio_format: "pcm_s16le", sample_rate: 16000, num_channels: 1, source, }); return ( {finalText} {partialText} {isActive ? (
``` [View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/python/real_time/browser_direct_stream) # Proxy stream URL: /stt/guides/proxy-stream How to stream audio from a client app to Soniox Speech-to-Text WebSocket API through a proxy server. import Image from "next/image"; ## Overview This guide explains how to stream microphone audio from a client to the Soniox [WebSocket API](/stt/api-reference/websocket-api) through a proxy server. In this architecture, the client captures audio and sends it over WebSocket to a proxy server. The proxy server establishes a connection to the Soniox WebSocket API, authenticates the session, streams the audio for transcription, and relays the transcribed results back to the client in real time. This setup is useful when you want to **inspect, transform, or store audio and transcription data on the server side** before passing it to the client. If your goal is simply to transcribe audio and return results with the lowest possible latency, consider using the [direct stream](/stt/guides/direct-stream) approach instead. Soniox STT stream with proxy flowchart ## Example In the following example, we create a proxy HTTP server that: 1. Listens for incoming WebSocket connections from the client. 2. Forwards audio data from the client to the [WebSocket API](/stt/api-reference/websocket-api). 3. Relays transcription results back to the client. Authentication with the [WebSocket API](/stt/api-reference/websocket-api) is handled by the proxy server using the `SONIOX_API_KEY`. Python server that will act as a proxy between our client and [WebSocket API](/stt/api-reference/websocket-api). ``` import os import json import asyncio from dotenv import load_dotenv import websockets load_dotenv() async def handle_client(websocket): print("Browser client connected") # create a message queue to store client messages received before # Soniox WebSocket API connection is ready, so we don't loose any message_queue = [] soniox_ws = None soniox_ws_ready = False async def init_soniox_connection(): nonlocal soniox_ws, soniox_ws_ready try: soniox_ws = await websockets.connect( "wss://stt-rt.soniox.com/transcribe-websocket" ) print("Connected to Soniox STT WebSocket API") # Send initial configuration message start_message = json.dumps( { "api_key": os.getenv("SONIOX_API_KEY"), "audio_format": "auto", "model": "stt-rt-preview-v2", "language_hints": ["en"], } ) await soniox_ws.send(start_message) print("Sent start message to Soniox") # mark connection as ready soniox_ws_ready = True # process any queued messages while len(message_queue) > 0 and soniox_ws_ready: data = message_queue.pop(0) await forward_data(data) # receive messages from Soniox STT WebSocket API async for message in soniox_ws: try: await websocket.send(message) except Exception as e: print(f"Error forwarding Soniox response: {e}") break except Exception as e: print(f"Soniox WebSocket error: {e}") soniox_ws_ready = False finally: if soniox_ws: await soniox_ws.close() soniox_ws_ready = False print("Soniox WebSocket closed") async def forward_data(data): try: if soniox_ws: await soniox_ws.send(data) except Exception as e: print(f"Error forwarding data to Soniox: {e}") # initialize Soniox connection soniox_task = asyncio.create_task(init_soniox_connection()) try: # receive messages from browser client async for data in websocket: if soniox_ws_ready: # forward messages instantly await forward_data(data) else: # queue the message to be processed # as soon as connection to Soniox STT WebSocket API is ready message_queue.append(data) except Exception as e: print(f"Error with browser client: {e}") finally: print("Browser client disconnected") soniox_task.cancel() try: await soniox_task except asyncio.CancelledError: pass async def main(): port = int(os.getenv("PORT", 3001)) server = await websockets.serve(handle_client, "0.0.0.0", port) print(f"WebSocket proxy server listening on http://0.0.0.0:{port}") await server.wait_closed() if __name__ == "__main__": asyncio.run(main()) ``` [View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/python/real_time/browser_proxy_stream) Node.js server that will act as a proxy between our client and [WebSocket API](/stt/api-reference/websocket-api). ``` require("dotenv").config(); const WebSocket = require("ws"); const http = require("http"); const server = http.createServer(); const wss = new WebSocket.Server({ server }); wss.on("connection", (ws) => { console.log("Browser client connected"); // create a message queue to store client messages received before // Soniox WebSocket API connection is ready, so we don't loose any const messageQueue = []; let sonioxWs = null; let sonioxWsReady = false; function initSonioxConnection() { sonioxWs = new WebSocket("wss://stt-rt.soniox.com/transcribe-websocket"); sonioxWs.on("open", () => { console.log("Connected to Soniox STT WebSocket API"); // send initial configuration message const startMessage = JSON.stringify({ api_key: process.env.SONIOX_API_KEY, audio_format: "auto", model: "stt-rt-preview-v2", language_hints: ["en"], }); sonioxWs.send(startMessage); console.log("Sent start message to Soniox"); // mark connection as ready sonioxWsReady = true; // process any queued messages while (messageQueue.length > 0 && sonioxWsReady) { const data = messageQueue.shift(); forwardData(data); } }); // receive messages from Soniox STT WebSocket API sonioxWs.on("message", (data) => { // note: // at this point we could manipulate and enhance the transcribed data try { ws.send(data.toString()); } catch (err) { console.error("Error forwarding Soniox response:", err); } }); sonioxWs.on("error", (error) => { console.log("Soniox WebSocket error:", error); sonioxWsReady = false; }); sonioxWs.on("close", (code, reason) => { console.log("Soniox WebSocket closed:", code, reason); sonioxWsReady = false; ws.close(); }); } // forward message data to Soniox STT WebSocket API function forwardData(data) { try { sonioxWs.send(data); } catch (err) { console.error("Error forwarding data to Soniox:", err); } } // initialize Soniox connection initSonioxConnection(); // receive messages from browser client ws.on("message", (data) => { if (sonioxWsReady) { // forward messages instantly forwardData(data); } else { // queue the message to be processed // as soon as connection to Soniox STT WebSocket API is ready messageQueue.push(data); } }); ws.on("close", () => { console.log("Browser client disconnected"); if (sonioxWs) { try { sonioxWs.close(); } catch (err) { console.error("Error closing Soniox connection:", err); } } }); }); server.listen(process.env.PORT, () => { console.log( `WebSocket proxy server listening on http://0.0.0.0:${process.env.PORT}` ); }); ``` [View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/nodejs/real_time/browser_proxy_stream) Next, we create a basic HTML page as the client (same concept works for any other app framework). The HTML client: 1. Connects to the proxy server via WebSocket. 2. Captures audio stream from the microphone through the [`MediaRecorder`](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder). 3. Streams audio data to the proxy server. 4. Receives messages from the proxy server and renders transcribed text into a `div`. ```

Browser proxy stream example


```
[View example on GitHub](https://github.com/soniox/soniox_examples/tree/master/speech_to_text/python/real_time/browser_proxy_stream)
# Async transcription with Node SDK URL: /stt/SDKs/node-SDK/async-transcription Transcribe audio files asynchronously with the Soniox Node SDK Soniox Node SDK supports asynchronous transcription for audio files. This allows you to transcribe recordings without maintaining a live connection or streaming pipeline. You can either wait for completion or create a job and retrieve the results based on the webhook event. ## Quickstart SDK provides you a convenient method to transcribe audio from a local file, public URL, or previously uploaded file. The **[`transcribe`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribe) method will:** 1. Upload the file to Soniox if it's not already uploaded (if `file` is provided) 2. Transcribe the audio 3. Await for the transcription to complete (if `wait: true` is provided) 4. Return the transcription object and final transcript (you can disable this by setting `fetch_transcript: false` and fetch transcript later using [`getTranscript`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) method) 5. Delete the file from Soniox if was uploaded (configurable using `cleanup` option) Don't forget to remove files and transcriptions from Soniox after you're done with them if `cleanup` option is not set. **Transcribe from a local file and delete everything after transcription is complete** ```ts const transcription = await client.stt.transcribe({ model: 'stt-async-v4', file: audio, // Buffer, Uint8Array, Blob, ReadableStream filename: 'audio.mp3', wait: true, cleanup: ['file', 'transcription'], }); ``` **Transcribe from a public URL and fetch the transcript later using [`getTranscript`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) method** ```ts const transcription = await client.stt.transcribe({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', wait: true, }); const transcript = await transcription.getTranscript(); ``` **Transcribe from a previously uploaded file and setup a [webhook](/stt/SDKs/node-SDK/webhooks) to get the transcription when it's complete** ```ts const transcription = await client.stt.transcribe({ model: 'stt-async-v4', file_id: file.id, wait: false, webhook_url: 'https://example.com/webhook', }); ``` Learn more about [testing webhooks locally](/stt/SDKs/node-SDK/webhooks#testing-webhooks-locally). ## Retrieve list of transcriptions You can retrieve a list of transcriptions using [`list`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-list) method. ```ts const transcriptions = await client.stt.list({ limit: 100, }); ``` The returned result is async iterable - use `for await...of` to iterate through all pages. ```ts for await (const transcription of transcriptions) { console.log(transcription.id, transcription.status); } ``` ## Get transcription You can get a transcription by ID using [`get`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-get) method. ```ts const transcription = await client.stt.get(transcription.id); console.log(transcription.id, transcription.status); ``` ## Get transcription transcript You can get a transcription transcript using [`getTranscript`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) method. ```ts const transcript = await transcription.getTranscript(); console.log(transcript.text); ``` Or get transcript by transcription ID. ```ts const transcript = await client.stt.getTranscript(transcription.id); console.log(transcript.text); ``` ## Segmenting transcripts Group tokens by speaker and language changes: ```ts const transcript = await transcription.getTranscript(); for (const segment of transcript?.segments() ?? []) { console.log(`[${segment.speaker}][${segment.language}] ${segment.text}`); } ``` ## Delete or destroy transcription You can delete or destroy a transcription using [`delete`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete) or [`destroy`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy) method. **Delete transcription only** ```ts await client.stt.delete(transcription.id); ``` **Delete transcription and its file if it was uploaded** ```ts await client.stt.destroy(transcription.id); ``` ## Delete all transcriptions and files from your account ### Delete all transcriptions You can delete all transcriptions using [`stt.delete_all`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete_all) method. ```ts await client.stt.delete_all(); ``` ### Delete all transcriptions and their files You can delete all transcriptions and their files using [`stt.destroy_all`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy_all) method. ```ts await client.stt.destroy_all(); ``` `delete_all` and `destroy_all` operations are irreversible and cannot be undone. # Handling files with Node SDK URL: /stt/SDKs/node-SDK/files Upload audio files and manage them with the Soniox Node SDK Node SDK provides helpers to work with the [Files API](/stt/api-reference/files/upload_file) to upload audio for async transcription or to reuse files across multiple jobs. ## Upload [`upload()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-upload) accepts `Buffer`, `Uint8Array`, `Blob`, `ReadableStream` ```ts import { readFile } from 'node:fs/promises'; const audio = await readFile('audio.mp3'); const file = await client.files.upload(audio, { filename: 'audio.mp3', client_reference_id: 'meeting-42', }); console.log(file.id, file.filename, file.size); ``` Read more about [Supported audio formats](/stt/async/async-transcription#audio-formats). ## List files [`list()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-list) returns a paginated list of all uploaded files. Use `for await...of` to iterate through all pages. ```ts const result = await client.files.list({ limit: 100 }); // Automatic pagination for await (const file of result) { console.log(file.id, file.filename); } ``` ## Get file Get a file by ID using [`get()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-get) method: ```ts const file = await client.files.get('file-id'); ``` ## Delete file Delete file via instance using [`file.delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete) method: ```ts const file = await client.files.get('file-id'); if (file) { await file.delete(); } ``` Or delete by ID using [`delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete) method: ```ts await client.files.delete('file-id'); ``` ## Delete all files from your account You can delete all files using `files.delete_all` method. ```ts await client.files.delete_all(); ``` `delete_all` operation is irreversible and cannot be undone. # Node SDK URL: /stt/SDKs/node-SDK Build speech-to-text workflows in Node with async and real-time APIs. The Soniox [Node SDK](https://github.com/soniox/soniox-js) gives you fully typed access to our Async and Real-time Speech-to-Text APIs. ## Quickstart ### Install Install via your preferred package manager: ```bash tab npm install @soniox/node ``` ```bash tab yarn add @soniox/node ``` ```bash tab pnpm add @soniox/node ``` ```bash tab bun add @soniox/node ``` ### Set your API key ```sh title="Terminal" export SONIOX_API_KEY= ``` Create a [Soniox account](https://console.soniox.com/signup) and log in to the [Console](https://console.soniox.com) to get your API key. See all available environment variables in the [SDK reference](/stt/SDKs/node-SDK/reference#environment-variables). ### Create your first real-time session ```ts import { SonioxNodeClient } from "@soniox/node"; const stream = await createFakeAudioStream(); // Create a Soniox client // The API key is read from the SONIOX_API_KEY environment variable. const client = new SonioxNodeClient(); // Create a real-time session const session = client.realtime.stt({ model: "stt-rt-v4", }); // Listen for transcription results session.on("result", (result) => { const text = result.tokens.map((t) => t.text).join(""); if (text) console.log(text); }); // Listen for errors session.on("error", (err) => console.error("Error:", err)); // Connect to Soniox and stream audio await session.connect(); // Stream audio in chunks session.sendStream(stream, { pace_ms: 60, // Send audio in chunks of 60ms to simulate real-time transcription finish: true // Gracefully end the session after the stream is finished } ); // Fake streaming: read audio.mp3 in chunks and send in 60ms interval to simulate real-time transcription. // In real use case, you would stream audio from a microphone or other stream source. async function createFakeAudioStream() { const res = await fetch("https://soniox.com/media/examples/coffee_shop.mp3"); if (!res.ok) throw new Error(`Request failed: ${res.status} ${res.statusText}`); if (!res.body) throw new Error("No response body"); return res.body; } ``` Learn more about [Real-time transcription](/stt/SDKs/node-SDK/realtime-transcription). ### Create your first async transcription ```ts import { SonioxNodeClient } from '@soniox/node'; import { readFile } from 'node:fs/promises'; const audio = await readFile('audio.mp3'); const client = new SonioxNodeClient(); const transcription = await client.stt.transcribe({ model: 'stt-async-v4', file: audio, filename: 'audio.mp3', wait: true, }); console.log(transcription.transcript?.text); ``` Learn more about [Async transcription](/stt/SDKs/node-SDK/async-transcription). ## Next steps * [Real-time transcription](/stt/SDKs/node-SDK/realtime-transcription) * [Async transcription](/stt/SDKs/node-SDK/async-transcription) * [Webhooks](/stt/SDKs/node-SDK/webhooks) * [Files and models](/stt/SDKs/node-SDK/files) * [Full SDK reference](/stt/SDKs/node-SDK/reference) ## Package links * [GitHub repository](https://github.com/soniox/soniox-js) * [NPM package](https://www.npmjs.com/package/@soniox/node) # Real-time transcription with Node SDK URL: /stt/SDKs/node-SDK/realtime-transcription Create and manage real-time speech-to-text sessions with the Soniox Node SDK Soniox Node SDK supports real-time streaming transcription over WebSocket. This allows you to transcribe live audio with low latency — ideal for voice agents, live captions, and interactive experiences. You can consume results via events, async iteration, or buffers that group tokens into utterances. SDK provides you helper methods to work both with direct and proxy streaming. ## Direct stream and temporary API keys Read more about [Direct stream](/stt/guides/direct-stream) Node SDK provides you a helper method to issue [temporary API Keys](/stt/api-reference/auth/create_temporary_api_key) to use with [Direct stream](/stt/guides/direct-stream) from the client's browser. ```ts const { api_key, expires_at } = await client.auth.createTemporaryKey({ usage_type: 'transcribe_websocket', expires_in_seconds: 3600, client_reference_id: 'support-call-123', }); console.log(api_key, expires_at); ``` Soniox's [Web Library](/stt/SDKs/web-library) handles everything client-side — capturing microphone input, managing the WebSocket connection, and authenticating using temporary API keys. ## Proxy stream helpers Read more about [Proxy stream](/stt/guides/proxy-stream) Use the SDK's real-time session for low-latency transcription, live captions, and voice agent experiences. ## Create a real-time session ```ts const session = client.realtime.stt({ model: 'stt-rt-v4', audio_format: 'pcm_s16le', sample_rate: 16000, num_channels: 1, enable_endpoint_detection: true, enable_speaker_diarization: true, language_hints: ['en'], context: { text: 'Support call about billing', terms: ['invoice', 'refund'], }, }); ``` ## Connect and stream Use [`sendAudio`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendaudio) to send audio chunks to the session. ```ts await session.connect(); session.on('result', (result) => { process.stdout.write(result.tokens.map(t => t.text).join('')); }); for await (const chunk of audioStream) { session.sendAudio(chunk); } await session.finish(); ``` See the full example with a demo stream in the quickstart: [Create your first real-time session](/stt/SDKs/node-SDK#create-your-first-real-time-session) ## Handle session events ```ts session.on('connected', () => console.log('connected')); session.on('disconnected', (reason) => console.log('disconnected:', reason)); session.on('error', (error) => console.error('error:', error)); session.on('result', (result) => console.log(result.tokens.map(t => t.text).join(''))); session.on('endpoint', () => console.log('endpoint')); session.on('finalized', () => console.log('finalized')); session.on('finished', () => console.log('finished')); ``` ## Session lifecycle ```ts // Connect to the session await session.connect(); // idle -> connected // Send audio chunks to the session for await (const chunk of audioStream) { session.sendAudio(chunk); } // Gracefully end the session (Signal end of audio and wait for remaining results from the server) await session.finish(); // Or cancel immediately: session.close(); // connected -> closed ``` ## Endpoint detection and manual finalization Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences. Read more about [Endpoint detection](/stt/rt/endpoint-detection) Enable endpoint detection by setting `enable_endpoint_detection: true` in the session configuration. ```ts const session = client.realtime.stt({ model: 'stt-rt-v4', enable_endpoint_detection: true, }); ``` Manual finalization gives you precise control over when audio should be finalized — useful for Push-to-talk systems and client-side voice activity detection (VAD). Read more about [Manual finalization](/stt/rt/manual-finalization) ```ts session.finalize(); ``` ## Pause and resume ```ts session.pause(); // keeps connection alive, drops audio while paused session.resume(); // resume sending audio ``` You are billed for the full stream duration even when session is paused. In a typical voice agent loop, you pause the STT session while the agent is responding to avoid transcribing the agent's own audio or processing overlapping speech: ```ts session.on("endpoint", async () => { const utterance = utteranceBuffer.markEndpoint(); // Read more about utterance buffer below if (!utterance) return; // Pause STT while the agent processes and responds session.pause(); const response = await myAgent.respond(utterance.text); // ... send response audio to the client ... // Resume listening for the next utterance session.resume(); }); ``` SDK will `finalize` audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about [Manual finalization](/stt/rt/manual-finalization#key-points) ## Keepalive Read more about [Connection keepalive](/stt/rt/connection-keepalive) Node SDK **automatically sends** keepalive messages when session is paused via [`session.pause()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-pause). You can also send keepalive messages manually: ```ts session.sendKeepalive(); ``` ## Detecting utterance for voice agents When building voice AI agents, you need to know when the user has finished speaking so you can process their input. The SDK provides [`RealtimeUtteranceBuffer`](/stt/SDKs/node-SDK/reference/classes#realtimeutterancebuffer) to collect streaming tokens into complete utterances, driven by the server's endpoint detection. ### How it works 1. Set `enable_endpoint_detection: true` in the session config – the server detects when the user stops speaking and emits an endpoint event. 2. Feed every result event into the buffer with [`addResult()`](/stt/SDKs/node-SDK/reference/classes#realtimeutterancebuffer-addresult). 3. When an endpoint fires, call [`markEndpoint()`](/stt/SDKs/node-SDK/reference/classes#realtimeutterancebuffer-markendpoint) to flush the buffer and get the complete utterance. ### Example ```ts import { SonioxNodeClient, RealtimeUtteranceBuffer } from "@soniox/node"; const client = new SonioxNodeClient(); // Call this for each new user/connection - each session needs its own buffer function createAgentSession(onUtterance: (text: string) => void) { const session = client.realtime.stt({ model: "stt-rt-v4", enable_endpoint_detection: true, }); // Each session gets its own buffer const utteranceBuffer = new RealtimeUtteranceBuffer({ final_only: true }); session.on("result", (result) => { utteranceBuffer.addResult(result); }); session.on("endpoint", () => { const utterance = utteranceBuffer.markEndpoint(); if (utterance) { onUtterance(utterance.text); } }); return session; } // Usage: create a session per user connection const session = createAgentSession((text) => { console.log("User said:", text); // Pass to your LLM / agent pipeline }); await session.connect(); session.sendAudio(audioChunk); ``` ## Streaming audio from a file Use [`sendStream()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendstream) to pipe audio directly from a file (or any async source) into a real-time session. It accepts any `AsyncIterable` – Node.js file streams, Web `ReadableStream`, Bun file streams, fetch response bodies, or custom async generators. ### Simulating real-time pace When streaming pre-recorded files, you can throttle sending with pace\_ms to simulate how audio would arrive from a live source (e.g. a microphone). This isn't needed for live audio – it naturally arrives at real-time pace. Use [`sendAudio`](/stt/SDKs/node-SDK/realtime-transcription#connect-and-stream) if you need more control. # Handling webhooks with Node SDK URL: /stt/SDKs/node-SDK/webhooks Use webhooks to receive transcription results with the Soniox Node SDK SDK provides you a helper method to handle [Webhooks](/stt/async/webhooks) from the Soniox API and transform them into a typed object. ## Configure webhook delivery If webhook is enabled during [transcription creation](/stt/SDKs/node-SDK/async-transcription#quickstart), Soniox will send a POST request to your webhook URL with the transcription result. ```ts await client.stt.transcribe({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', webhook_url: 'https://your-server.com/webhooks/soniox', webhook_auth_header_name: 'X-Webhook-Secret', webhook_auth_header_value: process.env.SONIOX_API_WEBHOOK_SECRET, }); ``` You can also append metadata as query parameters: ```ts await client.stt.transcribe({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', webhook_url: 'https://your-server.com/webhooks/soniox', webhook_query: { request_id: 'abc-123' }, }); ``` Learn more about [testing webhooks locally](/stt/SDKs/node-SDK/webhooks#testing-webhooks-locally). ## Handling webhooks The SDK provides both framework-agnostic and framework-specific handlers that parse the request body, verify authentication, and return a typed [`WebhookHandlerResultWithFetch`](/stt/SDKs/node-SDK/reference/types#webhookhandlerresultwithfetch). All handlers return: * `ok` — whether the webhook was handled successfully * `status` — HTTP status code to return to Soniox * `event` — the parsed [`WebhookEvent`](/stt/SDKs/node-SDK/reference/types#webhookevent) (when `ok=true`) * `error` — error message (when `ok=false`) * `fetchTranscript()` — lazily fetch the full transcript (when `event.status === 'completed'`) * `fetchTranscription()` — lazily fetch the transcription object ```ts import express from 'express'; const app = express(); app.use(express.json()); app.post('/webhooks/soniox', async (req, res) => { const result = client.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } res.status(result.status).json({ received: true }); }); ``` ```ts import Fastify from 'fastify'; const app = Fastify(); app.post('/webhooks/soniox', async (request, reply) => { const result = client.webhooks.handleFastify(request); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } return reply.status(result.status).send({ received: true }); }); ``` ```ts import { Hono } from 'hono'; const app = new Hono(); app.post('/webhooks/soniox', async (c) => { const result = await client.webhooks.handleHono(c); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } return c.json({ received: true }, result.status); }); ``` `handleHono` is async because it reads the request body from the Hono context. ```ts import { Controller, Post, Req, Res } from '@nestjs/common'; import { Request, Response } from 'express'; @Controller('webhooks') export class WebhooksController { @Post('soniox') async handleSoniox(@Req() req: Request, @Res() res: Response) { const result = client.webhooks.handleNestJS(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } res.status(result.status).json({ received: true }); } } ``` Use `handleRequest` with any framework that provides a standard Fetch API `Request` object: ```ts export default { async fetch(request: Request) { if (new URL(request.url).pathname === '/webhooks/soniox') { const result = await client.webhooks.handleRequest(request); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } return Response.json({ received: true }, { status: result.status }); } return new Response('Not found', { status: 404 }); }, }; ``` `handleRequest` is async because it reads the request body from the `Request` object. The `handle` method is a framework-agnostic handler. You provide the method, headers, and parsed body directly: ```ts const result = client.webhooks.handle({ method: req.method, headers: req.headers, body: req.body, }); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } if (result.ok && result.event.status === 'error') { const transcription = await result.fetchTranscription(); console.log(transcription?.error_message); } ``` See [`HandleWebhookOptions`](/stt/SDKs/node-SDK/reference/types#handlewebhookoptions) for all available options. ## Webhook auth helpers By default, webhook handlers read auth from `SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET`. You can override auth explicitly: ```ts const result = client.webhooks.handleExpress(req, { name: 'X-Webhook-Secret', value: process.env.SONIOX_API_WEBHOOK_SECRET, }); ``` Learn more info about [Environment Variables](/stt/SDKs/node-SDK/reference#environment-variables). But you can also verify the auth manually: ```ts const auth = client.webhooks.getAuthFromEnv(); if (!auth) { throw new Error('Missing webhook auth'); } const isValid = client.webhooks.verifyAuth(req.headers, auth); ``` ## Webhook event helpers ```ts const event = client.webhooks.parseEvent(req.body); const isEvent = client.webhooks.isEvent(req.body); ``` ## Testing webhooks locally Since Soniox needs to reach your server over the internet, you'll need a tunnel to expose your local development server. You can use [Cloudflare Tunnel](https://developers.cloudflare.com/pages/how-to/preview-with-cloudflare-tunnel/) or [ngrok](https://ngrok.com/). [Cloudflare Tunnel](https://developers.cloudflare.com/pages/how-to/preview-with-cloudflare-tunnel/) provides a quick way to expose your local server — no account required. Install `cloudflared` and start a tunnel pointing to your local server: ```bash # macOS brew install cloudflared # Start a tunnel to your local server on port 3000 cloudflared tunnel --url http://localhost:3000 ``` The command will output a public URL like `https://random-name.trycloudflare.com`. [ngrok](https://ngrok.com/) creates a secure tunnel to your local server and provides a stable public URL. Install ngrok, authenticate, and start a tunnel: ```bash # macOS brew install ngrok # Authenticate (one-time setup) ngrok config add-authtoken # Start a tunnel to your local server on port 3000 ngrok http 3000 ``` The command will output a public URL like `https://abcd-1234.ngrok-free.app`. Once you have your public tunnel URL, use it as the `webhook_url` when creating a transcription: ```ts import express from 'express'; import { SonioxNodeClient } from '@soniox/node'; const client = new SonioxNodeClient(); const app = express(); app.use(express.json()); // Handle incoming webhook events app.post('/webhooks/soniox', async (req, res) => { const result = client.webhooks.handleExpress(req); // You will receive the webhook event when the transcription is completed if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); // Lazy fetch the transcript console.log(transcript?.text); } res.status(result.status).json({ received: true }); }); app.listen(3000, () => console.log('Listening on port 3000')); // Start a transcription with the tunnel URL as webhook await client.stt.transcribe({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', webhook_url: 'https:///webhooks/soniox', }); ``` # Async transcription with Python SDK URL: /stt/SDKs/python-SDK/async-transcription Transcribe audio files asynchronously with the Soniox Python SDK Soniox Python SDK supports asynchronous transcription for audio files. This allows you to transcribe recordings without maintaining a live connection or streaming pipeline. You can either wait for completion or create a job and retrieve the results based on the webhook event. ## Quickstart The SDK provides a convenient `transcribe` method that accepts a local file, public URL, or previously uploaded file ID. It will upload the file (if provided) and create the transcription job. Don't forget to remove files and transcriptions from Soniox after you're done with them. ```python from soniox import SonioxClient client = SonioxClient() # Transcribe from a local file transcription = client.stt.transcribe( model="stt-async-v4", file="audio.mp3", ) # Transcribe from a public URL and fetch the transcript later transcription = client.stt.transcribe( model="stt-async-v4", audio_url="https://soniox.com/media/examples/coffee_shop.mp3", ) # Transcribe from a previously uploaded file transcription = client.stt.transcribe( model="stt-async-v4", file_id="uploaded-file-id", ) ``` After creating the job, you can poll the status with `client.stt.get` or wait for completion with `client.stt.wait`. To get the final transcript, call `client.stt.get_transcript`. ```python # Check status transcription = client.stt.get("transcription-id") print(transcription.status) # Wait for completion client.stt.wait("transcription-id") # Fetch transcript transcript = client.stt.get_transcript("transcription-id") print(transcript.text) ``` Don't forget to delete files and transcriptions from Soniox after you're done with them. ## Get transcription You can get a transcription by ID using `get` method. ```python transcription = client.stt.get("transcription-id") print(transcription.id, transcription.status) ``` Get a transcription or return `None` if it doesn't exist: ```python transcription = client.stt.get_or_none("transcription-id") if transcription is None: print("Transcription not found") ``` ## Get transcription transcript If you want to receive text or tokens from a transcription, fetch the transcription transcript with `get_transcript`. ```python transcript = client.stt.get_transcript("transcription-id") print(transcript.text) ``` ## Retrieve list of transcriptions You can retrieve a list of transcriptions using `list` method. ```python from soniox import SonioxClient client = SonioxClient() response = client.stt.list(limit=100) for transcription in response.transcriptions: print(transcription.id, transcription.status) # Use pagination to list more transcriptions while response.next_page_cursor: response = client.stt.list( limit=100, cursor=response.next_page_cursor, ) for transcription in response.transcriptions: print(transcription.id, transcription.status) ``` ## Delete or destroy transcription You can delete or destroy a transcription using `delete` or `destroy` method. **Delete transcription only:** ```python client.stt.delete("transcription-id") ``` **Delete a transcription only if it exists:** ```python client.stt.delete_if_exists("transcription-id") ``` **Delete transcription and its file if it was uploaded:** ```python client.stt.destroy("transcription-id") ``` ## Delete all transcriptions and files from your account You have limited space for files and transcriptions, see: [Limits and quotas](https://soniox.com/docs/stt/async/limits-and-quotas). These operations are irreversible and cannot be undone. ### Delete all transcriptions You can delete all transcriptions using `transcriptions.delete_all`. ```python client.stt.delete_all() ``` ### Delete all files You can delete all files using `files.delete_all`. ```python client.files.delete_all() ``` ### Delete all transcriptions and its files You can delete all transcriptions and its files (if it exist) using `files.destroy_all`. ```python client.files.destroy_all() ``` # Handling files with Python SDK URL: /stt/SDKs/python-SDK/files Upload audio files and manage them with the Soniox Python SDK Use the Files API to upload audio for async transcription or to reuse files across multiple jobs. ## Upload `upload()` accepts `bytes`, file paths (`str` or `Path`), or a file-like object (`BinaryIO`). ```python from soniox import SonioxClient client = SonioxClient() file = client.files.upload("audio.mp3") print(file.id, file.filename, file.size) ``` Read more about [Supported audio formats](/stt/async/async-transcription#audio-formats). ## Get file Get a file by ID, throws `SonioxNotFoundError` if file does not exist: ```python file = client.files.get("file-id") print(file.id, file.filename) ``` Get file or none: ```python file = client.files.get_or_none("file-id") if file is None: print("File not found") ``` ## List files List files returns a paginated response. Use `next_page_cursor` to fetch additional pages until it is `None`. ```python from soniox import SonioxClient client = SonioxClient() response = client.files.list(limit=100) for file in response.files: print(file.id, file.filename) # Pagination while response.next_page_cursor: response = client.files.list(limit=100, cursor=response.next_page_cursor) for file in response.files: print(file.id, file.filename) ``` ## Delete file Delete file by ID, throws `SonioxNotFoundError` if file does not exist: ```python client.files.delete("file-id") ``` Delete file only if exists: ```python client.files.delete_if_exists("file-id") ``` ## Delete all files Delete all files iterates through every page and removes each file. ```python client.files.delete_all() ``` # Python SDK URL: /stt/SDKs/python-SDK Python SDK for Soniox REST and realtime APIs. Soniox [Python SDK](https://github.com/soniox/soniox-python) gives you fully typed access to our Async and Real-time Speech-to-Text APIs. ## Quickstart ### Install ```bash pip install soniox ``` ### Set your API key ```bash export SONIOX_API_KEY= ``` Create a [Soniox account](https://console.soniox.com/signup) and log in to the [Console](https://console.soniox.com) to get your API key. ### Create your first real-time session ```python from soniox import SonioxClient from soniox.types import RealtimeSTTConfig, Token from soniox.utils import render_tokens, throttle_audio, start_audio_thread # Grab the demo file from: # https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3 AUDIO_FILE = "coffee_shop.mp3" client = SonioxClient() config = RealtimeSTTConfig(model="stt-rt-v4", audio_format="mp3") final_tokens: list[Token] = [] non_final_tokens: list[Token] = [] def realtime(): # Create new real-time websocket session with client.realtime.stt.connect(config=config) as session: # Stream audio to websocket start_audio_thread(session, throttle_audio(AUDIO_FILE, delay_seconds=0.1)) # Receive events from Soniox Real-time STT for event in session.receive_events(): for token in event.tokens: if token.is_final: final_tokens.append(token) else: non_final_tokens.append(token) print(render_tokens(final_tokens, non_final_tokens)) non_final_tokens.clear() realtime() ``` Learn more about [Real-time transcription](/stt/SDKs/python-SDK/realtime-transcription). ### Create your first async transcription ```python from soniox import SonioxClient client = SonioxClient() # Create new transcription from `audio_url` transcription = client.stt.transcribe( audio_url="https://soniox.com/media/examples/coffee_shop.mp3", ) # Wait until transcription processing is finished client.stt.wait(transcription.id) # Get transcription transcript and print it transcript = client.stt.get_transcript(transcription.id) print(transcript.text) ``` Learn more about [Async transcription](/stt/SDKs/python-SDK/async-transcription). ## Next steps * [Real-time streaming](/stt/SDKs/python-SDK/realtime-transcription) * [Async transcription](/stt/SDKs/python-SDK/async-transcription) * [Webhooks](/stt/SDKs/python-SDK/webhooks) * [Files and models](/stt/SDKs/python-SDK/files) * [Sync vs async clients](/stt/SDKs/python-SDK/sync-vs-async-clients) * [Full SDK reference](/stt/SDKs/python-SDK/Full-SDK-reference/__init__) ## Package links * [GitHub repository](https://github.com/soniox/soniox-python) * [PyPi package](https://pypi.org/project/soniox/) # Real-time transcription with Python SDK URL: /stt/SDKs/python-SDK/realtime-transcription Create and connect to Soniox real-time speech-to-text sessions with the Python SDK Soniox Python SDK supports transcribing audio in real-time with **low latency** and **high accuracy**. This makes it ideal for voice assistants, live captions, and conversational AI. ## Connect to a real-time session Example below streams audio from live radio to the Soniox real-time API. If you want to stream from a file instead, see: [Create your first real-time session](/stt/SDKs/python-SDK#create-your-first-real-time-session). ```python from typing import Iterator from soniox import SonioxClient from soniox.types import ( RealtimeSTTConfig, Token, StructuredContext, StructuredContextGeneralItem, ) from soniox.utils import render_tokens, start_audio_thread import httpx AUDIO_URL = "https://npr-ice.streamguys1.com/live.mp3?ck=1742897559135" # Fetch audio from a live radio stream and yield it in chunks. def stream_audio_from_url(audio_url) -> Iterator[bytes]: with httpx.Client() as client: with client.stream("GET", audio_url) as response: response.raise_for_status() for chunk in response.iter_bytes(4096): if chunk: yield chunk client = SonioxClient() # Create config, see below for all parameters config = RealtimeSTTConfig( model="stt-rt-v4", audio_format="mp3", enable_endpoint_detection=True, enable_speaker_diarization=True, language_hints=["en"], context=StructuredContext( general=[StructuredContextGeneralItem(key="domain", value="live radio / news broadcast")], text="Live NPR news and talk radio stream, including interviews, music, and commentary.", terms=["NPR", "news", "interview", "music", "commentary", "report", "broadcast", "anchor"], ), ) final_tokens: list[Token] = [] non_final_tokens: list[Token] = [] def realtime(): # Create new real-time websocket session with client.realtime.stt.connect(config=config) as session: # Stream audio from live radio to websocket start_audio_thread(session, stream_audio_from_url(AUDIO_URL)) # Receive events from Soniox Real-time STT for event in session.receive_events(): for token in event.tokens: if token.is_final: final_tokens.append(token) else: non_final_tokens.append(token) print(render_tokens(final_tokens, non_final_tokens)) non_final_tokens.clear() realtime() ``` For config options see: [WebSocket API](/stt/api-reference/websocket-api#configuration) or [RealtimeSTTConfig reference](/stt/SDKs/python-SDK/Full-SDK-reference/types/types_realtime#class-realtimesttconfig). ## Endpoint detection Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences. Read more about [Endpoint detection](/stt/rt/endpoint-detection) Enable endpoint detection by setting `enable_endpoint_detection=True` in the session config. You will receive special token `` when speech ends. ```python # Enable endpoint detection config = RealtimeSTTConfig( enable_endpoint_detection=True, ... ) # When receiving events, check for special token for event in session.receive_events(): for token in event.tokens: if token.text == "": print("Endpoint detected") ``` ## Manual finalization Manual finalization gives you precise control over when audio should be finalized. When you know the user stopped talking (push-to-talk or client-side VAD), call `finalize` to mark all outstanding tokens as final. Read more about [Manual finalization](/stt/rt/manual-finalization) ```python # Finalize current buffered audio without closing the session. session.finalize() ``` ## Pause and resume ```python session.pause(); // keeps connection alive, drops audio while paused session.resume(); // resume sending audio ``` You are billed for the full stream duration even when session is paused. ## Keepalive Soniox terminates your session if no audio arrives for \~20 seconds. To keep the connection alive, send a keepalive control message or run a background keepalive loop. Python SDK **automatically sends** keepalive messages when session is paused via `session.pause()`. ```python # Sends keepalive message manually session.keep_alive() ``` Read more about [Connection keepalive](/stt/rt/connection-keepalive) ## Streaming audio from a file Use `stream_audio()` with `start_audio_thread()` to stream from a file while receiving events. If you are streaming live audio (microphone, client stream, etc.), you can feed raw chunks without throttling. If you are streaming a prerecorded file, throttle chunks to simulate real-time delivery. ```python from soniox.utils import stream_audio, start_audio_thread, throttle_audio ... with client.realtime.stt.connect(config=config) as session: # Start streaming audio on a background thread. start_audio_thread(session, stream_audio("audio.wav")) # Or throttle local audio file to simulate streaming (sends chunk every 100 ms) start_audio_thread(session, throttle_audio("audio.wav", delay_seconds=0.1)) ... ``` Use [`send_bytes`](/stt/SDKs/python-SDK/Full-SDK-reference/realtime/__init__#send_bytes) if you need more control ## Direct stream and proxy stream Read more about [Direct stream](/stt/guides/direct-stream) and [Proxy stream](/stt/guides/proxy-stream). For direct streaming from a client, issue a temporary API key and pass it to the browser or device that will open the WebSocket connection: ```python from soniox import SonioxClient client = SonioxClient() key = client.auth.create_temporary_api_key( expires_in_seconds=3600, client_reference_id="support-call-123", ) print(key.api_key, key.expires_at) ``` For proxy streaming, keep the WebSocket connection on your server and stream audio through your backend. # Sync vs async clients URL: /stt/SDKs/python-SDK/sync-vs-async-clients Choose between SonioxClient and AsyncSonioxClient based on your app. Soniox Python SDK provides two clients. * `SonioxClient` for synchronous code (scripts, CLIs, simple services) * `AsyncSonioxClient` for asyncio apps (FastAPI, aiohttp, background workers) The `files`, `models`, `auth`, `webhooks`, `transcriptions`, and `realtime` APIs accept the same parameters and return the same types in both clients. ## Sync client (`SonioxClient`) ```python from soniox import SonioxClient client = SonioxClient() file = client.files.upload("audio.mp3") transcription = client.stt.transcribe( audio_url="https://soniox.com/media/examples/coffee_shop.mp3" ) print(file.id, transcription.id) ``` ## Async client (`AsyncSonioxClient`) ```python import asyncio from soniox import AsyncSonioxClient async def main(): client = AsyncSonioxClient() file = await client.files.upload("audio.mp3") transcription = await client.stt.transcribe( audio_url="https://soniox.com/media/examples/coffee_shop.mp3" ) print(file.id, transcription.id) asyncio.run(main()) ``` # Handling webhooks with Python SDK URL: /stt/SDKs/python-SDK/webhooks Use webhooks to receive transcription results with the Soniox Python SDK Python SDK provides helper functions to work with [Webhooks](/stt/async/webhooks). ## Configure webhook delivery If webhook is enabled during transcription creation, Soniox will send a POST request to your webhook URL with the [transcription status result](/stt/async/webhooks#example). ```python from soniox import SonioxClient from soniox.types import CreateTranscriptionConfig client = SonioxClient() config = CreateTranscriptionConfig( webhook_url="https://your-server.com/webhooks/soniox", webhook_auth_header_name="X-Webhook-Secret", webhook_auth_header_value="your-secret", ) transcription = client.stt.transcribe( audio_url="https://soniox.com/media/examples/coffee_shop.mp3", config=config, ) ``` For `transcribe`, you must pass `webhook_auth_header_name` and `webhook_auth_header_value` explicitly in the config. Environment variables (`SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET`) are only used by webhook helpers and verification (see below). You can also append additional metadata as query parameters: ```python config = CreateTranscriptionConfig( webhook_url="https://your-server.com/webhooks/soniox?request_id=abc-123", ) ``` If you are uploading a local file, you can also use the convenience helper (reads webhook secret and header automatically from environment if present): ```python from soniox import SonioxClient from soniox.types import WebhookAuthConfig client = SonioxClient() transcription = client.stt.transcribe_file_with_webhook( model="stt-async-v4", file="audio.mp3", webhook_url="https://your-server.com/webhooks/soniox", ) ``` ## Example (FastAPI + ngrok) Expose your local server (for example with [ngrok](https://ngrok.com/)), then create a transcription that points to the public ngrok URL and verify the webhook payload on your FastAPI server: ```python from fastapi import FastAPI, Request from soniox import SonioxClient from soniox.errors import InvalidWebhookSignatureError from soniox.types import CreateTranscriptionConfig, WebhookAuthConfig app = FastAPI() client = SonioxClient() # Replace with your public ngrok URL. NGROK_URL = "https://your-subdomain.ngrok-free.app" WEBHOOK_SECRET_NAME = "X-Webhook-Secret" WEBHOOK_SECRET_VALUE = "your-secret" # When creating transcript you must provide correct webhook secret name and value # config = CreateTranscriptionConfig( # webhook_url=f"{NGROK_URL}/webhooks/soniox", # webhook_auth_header_name=WEBHOOK_SECRET_NAME, # webhook_auth_header_value=WEBHOOK_SECRET_VALUE, # ) # client.stt.transcribe( # model="stt-async-v4", # audio_url="https://soniox.com/media/examples/coffee_shop.mp3", # config=config, # ) @app.post("/webhooks/soniox") async def soniox_webhook(request: Request): payload = await request.body() headers = dict(request.headers) try: event = client.webhooks.unwrap( payload, headers, # This can be omitted if you have set env variables SONOIX_API_WEBHOOK_SECRET and SONIOX_API_WEBHOOK_HEADER auth=WebhookAuthConfig( name=WEBHOOK_SECRET_NAME, value=WEBHOOK_SECRET_VALUE, ), ) except InvalidWebhookSignatureError: print("InvalidWebhookSignatureError") return if event.status == "completed": transcript = client.stt.get_transcript(event.id) print(transcript.text) if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8080) ``` ## Webhook verification Verify webhook signatures to ensure the request really came from Soniox (and not a third party posting to your endpoint). You can verify signatures manually: ```python from soniox import SonioxClient from soniox.types import WebhookAuthConfig client = SonioxClient() client.webhooks.verify_signature( headers={"X-Webhook-Secret": "your-secret"}, ) ``` Or rely on `unwrap` to validate and parse in one step: ```python from soniox import SonioxClient from soniox.types import WebhookAuthConfig client = SonioxClient() event = client.webhooks.unwrap( payload=request_body, headers={"X-Webhook-Secret": "your-secret"}, ) print(event.id, event.status) ``` If you prefer, you can also use `client.stt.transcribe_file_with_webhook` and `client.webhooks` with `SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET` set in your environment. # React SDK URL: /stt/SDKs/react-SDK Build speech-to-text workflows in React with real-time API. import { LinkCards } from "@/components/link-card"; Soniox [React SDK](https://www.npmjs.com/package/@soniox/react) provides React hooks and components for real-time speech-to-text, built on top of the [Web SDK](/stt/SDKs/web-SDK). It lets you: * Capture audio from the user's microphone with a single hook * Stream audio to Soniox in real time * Receive transcription and translation results as reactive state ## Quickstart ### Install Install via your preferred package manager: ```bash tab npm install @soniox/react ``` ```bash tab yarn add @soniox/react ``` ```bash tab pnpm add @soniox/react ``` ```bash tab bun add @soniox/react ``` ### Setup you temporary API key endpoint In client enviroment (browser, mobile app, React Native, etc.), you don't want to expose your API key to the client. For this reason, you can create a temporary API key endpoint on your server and use it to issue temporary API keys for the client. For example, you can use our [Node SDK](/stt/SDKs/node-SDK) to create a temporary API key endpoint. ```ts import express from 'express'; import { SonioxNodeClient } from '@soniox/node'; const app = express(); const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env // Create a temporary API key endpoint app.get('/tmp-key', async (_req, res) => { try { const { api_key, expires_at } = await client.auth.createTemporaryKey({ usage_type: 'transcribe_websocket', expires_in_seconds: 300, // 1..3600 }); res.json({ api_key, expires_at }); } catch (err) { res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' }); } }); app.listen(3000, () => { console.log('Server listening on http://localhost:3000'); }); ``` Read more about our [Node SDK](/stt/SDKs/node-SDK) and [Temporary API keys](/stt/api-reference/auth/create_temporary_api_key) ### Create your first real-time session ```ts import { SonioxProvider, useRecording } from "@soniox/react"; // Create a temporary API key endpoint on your server and use it to issue temporary API keys for the client async function getAPIKey() { const res = await fetch("/tmp-key", { method: "POST" }); const { api_key } = await res.json(); return api_key; } function App() { return ( // Wrap your app with a SonioxProvider and pass the temporary API key getter function ); } function Transcription() { // Create a recording session const { state, finalText, partialText, start, stop } = useRecording({ model: "stt-rt-v4", }); return (

{finalText} {partialText}

State: {state}

{state === "recording" || state === "connecting" || state === "starting" ? ( ) : ( )}
); } ``` Learn more about [Real-time transcription](/stt/SDKs/react-SDK/realtime-transcription) ## Next steps * [Real-time transcription](/stt/SDKs/react-SDK/realtime-transcription) * [Full SDK reference](/stt/SDKs/react-SDK/reference) ## Package links * [GitHub repository](https://github.com/soniox/soniox-js) * [NPM package](https://www.npmjs.com/package/@soniox/react) # Real-time transcription with React SDK URL: /stt/SDKs/react-SDK/realtime-transcription Create and manage real-time speech-to-text sessions with the Soniox React SDK import { LinkCards } from "@/components/link-card"; Soniox React SDK supports real-time transcription via React hooks, built on top of the [@soniox/client](/stt/SDKs/web-SDK) Web SDK. This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences. You can capture audio from the user's microphone, receive transcription results as reactive state, and control sessions with simple `start`/`stop` calls. ## Soniox Provider [`SonioxProvider`](/stt/SDKs/react-SDK/reference/types#sonioxprovider) creates and shares a single [`SonioxClient`](/stt/SDKs/web-SDK/reference/classes#sonioxclient) instance via React context. Place it near the root of your component tree. ### With configuration props ```tsx import { SonioxProvider } from "@soniox/react"; function App() { return ( { const res = await fetch("/api/get-temporary-key", { method: "POST" }); return (await res.json()).api_key; }} > {children} ); } ``` ### With a pre-built client ```tsx import { SonioxClient } from "@soniox/client"; import { SonioxProvider } from "@soniox/react"; const client = new SonioxClient({ api_key: async () => fetchKey(), }); function App() { return {children}; } ``` ## `useRecording` `useRecording` is the primary hook for real-time speech-to-text. Returns reactive transcript state and control methods. Returns [`UseRecordingReturn`](/stt/SDKs/react-SDK/reference/types#userecordingreturn) which contains reactive state and control methods. ```tsx function Transcriber() { const recording = useRecording({ model: "stt-rt-v4", language_hints: ["en", "es"], enable_endpoint_detection: true, }); return (

State: {recording.state}

{recording.text}

); } ``` ### Handle session events | Callback | Signature | Description | | ----------------- | -------------------------------------------- | ------------------------------------------------- | | `onResult` | `(result: RealtimeResult) => void` | Called on each result from the server. | | `onEndpoint` | `() => void` | Called when an endpoint is detected. | | `onError` | `(error: Error) => void` | Called when an error occurs. | | `onStateChange` | `(update: { old_state, new_state }) => void` | Called on each state transition. | | `onFinished` | `() => void` | Called when the recording session finishes. | | `onConnected` | `() => void` | Called when the WebSocket connects. | | `onSourceMuted` | `() => void` | Called when the audio source is muted externally. | | `onSourceUnmuted` | `() => void` | Called when the audio source is unmuted. | ### Session lifecycle #### Recording state | Field | Type | Description | | --------------- | ---------------- | -------------------------------------------------------------------- | | `state` | `RecordingState` | Current lifecycle state (`'idle'`, `'recording'`, `'paused'`, etc.). | | `isActive` | `boolean` | `true` when state is not `idle`/`stopped`/`canceled`/`error`. | | `isRecording` | `boolean` | `true` when `state === 'recording'`. | | `isPaused` | `boolean` | `true` when `state === 'paused'`. | | `isSourceMuted` | `boolean` | `true` when the audio source is muted externally. | #### Available methods | Method | Signature | Description | | ----------------- | --------------------- | ------------------------------------------------------------------------ | | `start` | `() => void` | Start a new recording. Aborts any in-flight recording first. | | `stop` | `() => Promise` | Gracefully stop — waits for final results from the server. | | `cancel` | `() => void` | Immediately cancel — does not wait for final results. | | `pause` | `() => void` | Pause audio capture (keepalive keeps connection open). | | `resume` | `() => void` | Resume after pause. | | `finalize` | `(options?) => void` | Request the server to finalize current non-final tokens. | | `clearTranscript` | `() => void` | Clear transcript state (`finalText`, `partialText`, `utterances`, etc.). | ### Endpoint detection and manual finalization Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences. Read more about [Endpoint detection](/stt/rt/endpoint-detection) Enable endpoint detection by setting `enable_endpoint_detection: true` in the hook configuration. Use the `onEndpoint` callback to know when a speaker has finished speaking. ```tsx const { start, stop, text } = useRecording({ apiKey: "", model: "stt-rt-v4", enable_endpoint_detection: true, onEndpoint: () => { console.log("--- speaker finished ---"); }, }); ``` Manual finalization gives you precise control over when audio should be finalized — useful for push-to-talk systems and client-side voice activity detection (VAD). Read more about [Manual finalization](/stt/rt/manual-finalization) The finalize function is returned by useRecording and can be called at any time during an active recording: ```tsx const { start, stop, finalize } = useRecording({}); // Later, when you want to force finalization:finalize(); ``` ### Pause, resume and muting audio source The pause and resume functions are returned by `useRecording`. The `isPaused` flag reflects the current pause state reactively. ```tsx const { start, stop, pause, resume, isPaused } = useRecording({}); pause(); // keeps connection alive, drops audio while pausedresume(); // resume sending audio ``` SDK will `finalize` audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about [Manual finalization](/stt/rt/manual-finalization#key-points) The hook also tracks system-level mute events via `isSourceMuted`. When the audio source is muted externally (e.g. OS-level or hardware mute), keepalive messages are sent automatically to keep the session alive. You can listen for mute state changes with the `onSourceMuted` and `onSourceUnmuted` callbacks. ```tsx const { isSourceMuted } = useRecording({ onSourceMuted: () => { console.log("Microphone muted externally"); }, onSourceUnmuted: () => { console.log("Microphone unmuted"); }, }); ``` You are billed for the full stream duration even when the session is paused. ### Handling translation The React SDK supports one-way and two-way real-time translation. Configure translation in the useRecording hook config. The hook automatically groups tokens by translation status or language via the groups snapshot field, so you can render original and translated text separately without manual filtering. #### One-way translation Translates all spoken audio into a single target language. When translation is provided with type: `one_way`, the hook automatically sets `groupBy: 'translation'`, splitting tokens into `original` and `translation` groups. ```tsx const { groups } = useRecording({ apiKey: "", model: "stt-rt-preview", translation: { type: "one_way", target_language: "es", // Translate everything to Spanish }, }); // Render grouped text return (

Original: {groups.original?.text}

Translated: {groups.translation?.text}

); ``` #### Two-way translation Translates between two languages — each speaker's speech is translated into the other language. When translation is provided with type: `two_way`, the hook automatically sets `groupBy: 'language'`, splitting tokens by language code (e.g. `en`, `fr`). ```tsx const { groups } = useRecording({ apiKey: "", model: "stt-rt-preview", translation: { type: "two_way", language_a: "en", language_b: "fr", }, }); // Render grouped text by language return (

English: {groups.en?.text}

French: {groups.fr?.text}

); ``` Learn more about [Real-time translation](/stt/rt/real-time-translation) ### Utterances When `enable_endpoint_detection` is enabled, the `utterances` array accumulates utterances separated by natural pauses: ```tsx function TranscriptWithUtterances() { const { utterances, partialText, start, stop, isActive } = useRecording({ model: "stt-rt-v4", enable_endpoint_detection: true, }); return (
{utterances.map((utterance, i) => (

{utterance.text}

))} {partialText &&

{partialText}

}
); } ``` Learn more about [Endpoint detection](/stt/rt/endpoint-detection) ### Token grouping The `groupBy` option splits tokens into named groups, accessible via `recording.groups`. This is particularly useful for translation and multi-speaker scenarios. #### `groupBy` strategies | Value | Keys | Description | | ------------------- | ------------------------------------ | -------------------------------- | | `'translation'` | `"original"`, `"translation"` | Group by `translation_status`. | | `'language'` | Language codes (e.g. `"en"`, `"fr"`) | Group by token `language` field. | | `'speaker'` | Speaker IDs (e.g. `"1"`) | Group by token `speaker` field. | | `(token) => string` | Custom keys | Custom grouping function. | Learn more about [Speaker diarization](/stt/concepts/speaker-diarization) #### TokenGroup fields Each group in `recording.groups` contains: | Field | Type | Description | | --------------- | ----------------- | ------------------------------------------------------- | | `text` | `string` | Full text: `finalText + partialText`. | | `finalText` | `string` | Accumulated finalized text in this group. | | `partialText` | `string` | Text from current non-final tokens. | | `partialTokens` | `RealtimeToken[]` | Current non-final tokens (from the latest result only). | #### Automatic grouping for translation When a `translation` config is provided, `groupBy` is set automatically: * `one_way` translation → groups by `'translation'` (keys: `"original"`, `"translation"`) * `two_way` translation → groups by `'language'` (keys: language codes like `"en"`, `"es"`) ```tsx function TranslatedTranscript() { const { groups, start, stop, isActive } = useRecording({ model: "stt-rt-v4", translation: { type: "one_way", target_language: "es" }, }); return (

Original

{groups.original?.text}

Translation

{groups.translation?.text}

); } ``` ## `useSoniox` Returns the `SonioxClient` instance from the nearest `SonioxProvider`. Useful for low-level session access. ```tsx import { useSoniox } from "@soniox/react"; function MyComponent() { const client = useSoniox(); // Use client.realtime.stt() for low-level session access // Use client.permissions for permission checks } ``` ## `useMicrophonePermission` Hook for checking and requesting microphone permission before recording. Requires a `SonioxProvider` with a permission resolver configured (default in browsers). ```tsx import { useMicrophonePermission } from "@soniox/react"; function PermissionGate({ children }) { const mic = useMicrophonePermission({ autoCheck: true }); if (!mic.isSupported) { return

Microphone permissions are not available.

; } if (mic.status === "unknown") { return

Checking permission...

; } if (mic.isDenied) { return (

Microphone access denied.

{!mic.canRequest && (

Please enable microphone access in your browser settings.

)}
); } if (mic.status === "prompt") { return ( ); } return children; } ``` ### Options | Option | Type | Default | Description | | ----------- | --------- | ------- | ---------------------------------------- | | `autoCheck` | `boolean` | `false` | Automatically check permission on mount. | ### Return value | Field | Type | Description | | ------------- | --------------------- | ------------------------------------------------------------------------------------------------------ | | `status` | `MicPermissionStatus` | Current status: `'granted'`, `'denied'`, `'prompt'`, `'unavailable'`, `'unsupported'`, or `'unknown'`. | | `canRequest` | `boolean` | Whether the user can be prompted again. `false` when permanently denied. | | `isGranted` | `boolean` | `status === 'granted'`. | | `isDenied` | `boolean` | `status === 'denied'`. | | `isSupported` | `boolean` | Whether permission checking is available. | | `check` | `() => Promise` | Check (or re-check) the microphone permission. No-op when unsupported. | ### Status values | Status | Description | | --------------- | --------------------------------------------------- | | `'granted'` | Microphone access is granted. | | `'denied'` | Microphone access is denied. | | `'prompt'` | User hasn't been asked yet. | | `'unavailable'` | Permissions API not available in this browser. | | `'unsupported'` | No `PermissionResolver` configured in the provider. | | `'unknown'` | Initial state before the first `check()` call. | ## `useAudioLevel` Hook for real-time audio volume metering. Useful for building recording indicators and animations. ```tsx import { useAudioLevel } from "@soniox/react"; function VolumeIndicator({ isActive }) { const { volume } = useAudioLevel({ active: isActive }); // float value between 0 and 1 return (
); } ``` ## Next.js (App Router) The package declares `'use client'` at the entry point. All hooks must be used inside Client Components. Server Components cannot use `useRecording` or other hooks directly. # Web SDK URL: /stt/SDKs/web-SDK Build speech-to-text workflows in browser with real-time API. import { LinkCards } from "@/components/link-card"; Soniox [Web SDK](https://www.npmjs.com/package/@soniox/client) is the official JavaScript/TypeScript SDK for using the Soniox [Real-time API](/stt/api-reference/websocket-api) directly in the browser. It lets you: * Capture audio from the user's microphone * Stream audio to Soniox in real time * Receive transcription and translation results instantly ## Quickstart ### Install Install via your preferred package manager: ```bash tab npm install @soniox/client ``` ```bash tab yarn add @soniox/client ``` ```bash tab pnpm add @soniox/client ``` ```bash tab bun add @soniox/client ``` ### Set up your temporary API key endpoint In client enviroment (browser, mobile app, React Native, etc.), you don't want to expose your API key to the client. For this reason, you can create a temporary API key endpoint on your server and use it to issue temporary API keys for the client. For example, you can use our [Node SDK](/stt/SDKs/node-SDK) to create a temporary API key endpoint. ```ts import express from 'express'; import { SonioxNodeClient } from '@soniox/node'; const app = express(); const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env // Create a temporary API key endpoint app.get('/tmp-key', async (_req, res) => { try { const { api_key, expires_at } = await client.auth.createTemporaryKey({ usage_type: 'transcribe_websocket', expires_in_seconds: 300, // 1..3600 }); res.json({ api_key, expires_at }); } catch (err) { res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' }); } }); app.listen(3000, () => { console.log('Server listening on http://localhost:3000'); }); ``` Read more about our [Node SDK](/stt/SDKs/node-SDK) and [Temporary API keys](/stt/api-reference/auth/create_temporary_api_key) ### Create your first real-time session ```ts import { SonioxClient } from "@soniox/client"; // Create a Soniox client const client = new SonioxClient({ // Pass a function that fetches a temporary API key from your server api_key: async () => { const res = await fetch("/tmp-key", { method: "POST" }); const { api_key } = await res.json(); return api_key; }, }); // Create a recording session const recording = client.realtime.record({ model: "stt-rt-v4" }); // Listen for transcription results recording.on("result", (result) => { const text = result.tokens.map((t) => t.text).join(""); if (text) console.log(text); }); // Listen for errors recording.on("error", (err) => console.error("Error:", err)); // Later, stop gracefully (waits for final results): // await recording.stop(); ``` Learn more about [Real-time transcription](/stt/SDKs/web-SDK/realtime-transcription) ## Next steps * [Real-time transcription](/stt/SDKs/web-SDK/realtime-transcription) * [Full SDK reference](/stt/SDKs/web-SDK/reference) ## Package links * [GitHub repository](https://github.com/soniox/soniox-js) * [NPM package](https://www.npmjs.com/package/@soniox/client) # Real-time transcription with Web SDK URL: /stt/SDKs/web-SDK/realtime-transcription Create and manage real-time speech-to-text sessions with the Soniox Web SDK Soniox Web SDK supports real-time transcription over WebSocket directly in the browser. This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences. You can capture audio from the user's microphone, consume results via events or buffers that group tokens into utterances, and manage sessions with built-in connection handling. ## Create a real-time recording session `client.realtime.record()` is the high-level API for capturing audio and streaming it to Soniox for real-time transcription. It returns a [`Recording`](/stt/SDKs/web-SDK/reference/classes#recording) instance synchronously so you can attach event listeners before any async work (microphone access, API key fetch, WebSocket connection) begins. ```typescript const recording = client.realtime.record({ // speech-to-text model to use model: "stt-rt-v4", // Optional: hint expected languages language_hints: ["en", "es"], // Optional: enable speaker identification enable_speaker_diarization: true, // Optional: detect utterance boundaries (useful for voice agents) enable_endpoint_detection: true, // Optional: provide domain context to improve accuracy context: { terms: ["Soniox", "WebSocket"], general: [{ key: "domain", value: "technology" }], }, // ... other options ... }); ``` ### Listen for results The `result` event fires every time the server returns a transcription update. Each `RealtimeResult` contains an array of `RealtimeToken` objects — both finalized and in-progress tokens. ```typescript recording.on("result", (result) => { const text = result.tokens.map((t) => t.text).join(""); if (text) console.log(text); }); ``` ## Handle session events | Event | Payload | Description | | ---------------- | -------------------------- | ------------------------------------------------------------------- | | `result` | `RealtimeResult` | Transcription result received from the server. | | `error` | `Error` | An error occurred during recording. | | `endpoint` | — | Endpoint detected (speaker finished talking). | | `finalized` | — | Server completed finalization of current tokens. | | `finished` | — | Server acknowledged end of stream. Fires before `stopped` state. | | `connected` | — | WebSocket connected and streaming. | | `state_change` | `{ old_state, new_state }` | Recording state transition. | | `source_muted` | — | Audio source was muted externally (e.g. OS-level or hardware mute). | | `source_unmuted` | — | Audio source was unmuted after an external mute. | ## Session lifecycle A `Recording` transitions through a set of states. The lifecycle is fully managed — audio buffering during connection, keepalive during pause, and cleanup on stop or error are all handled automatically. ### States | State | Description | | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | `idle` | Initial state before any work begins. | | `starting` | Audio source is starting, API key is being fetched. Audio is buffered. | | `connecting` | WebSocket connection is being established. | | `recording` | Actively capturing and streaming audio. | | `paused` | Audio capture and streaming paused. Keepalive messages maintain the connection.
**You are still charged for the open session even when it is paused.** | | `stopping` | `stop()` called. Waiting for the server to finish processing remaining audio. | | `stopped` | Gracefully stopped. All final results have been received. | | `error` | An error occurred. Resources have been cleaned up. | | `canceled` | Canceled via `cancel()` or `AbortSignal`. | ### Methods #### `stop(): Promise` Gracefully stops the recording. Stops the audio source and waits for the server to process all remaining audio and return final results. ```typescript await recording.stop(); // All final results have been received at this point ``` #### `cancel(): void` Immediately cancels the recording without waiting for final results. Closes the WebSocket connection and releases all resources. ```typescript recording.cancel(); ``` #### `pause(): void` Pauses audio capture and streaming. The WebSocket connection stays open with automatic keepalive messages. ```typescript recording.pause(); console.log(recording.state); // 'paused' ``` You are charged for the full stream duration even when session is paused. #### `resume(): void` Resumes audio capture and streaming after a pause. ```typescript recording.resume(); console.log(recording.state); // 'recording' ``` #### `finalize(options?): void` Requests the server to finalize current non-final tokens. Useful for forcing finalization at a specific point (e.g. before displaying a completed sentence). ```typescript recording.finalize(); // With trailing silence trimming: recording.finalize({ trailing_silence_ms: 500 }); ``` ### Tracking state changes ```typescript recording.on("state_change", ({ old_state, new_state }) => { console.log(`${old_state} → ${new_state}`); }); ``` ## Endpoint detection and manual finalization Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences. Read more about [Endpoint detection](/stt/rt/endpoint-detection) Enable endpoint detection by setting `enable_endpoint_detection: true` in the session configuration. Listen for the `endpoint` event to know when a speaker has finished speaking. ```typescript recording.on("endpoint", () => { console.log("--- speaker finished ---"); }); ``` Manual finalization gives you precise control over when audio should be finalized — useful for Push-to-talk systems and client-side voice activity detection (VAD). Read more about [Manual finalization](/stt/rt/manual-finalization) ```ts session.finalize(); ``` ## Pause, resume and muting audio source ```ts recording.pause(); // keeps connection alive, drops audio while paused recording.resume(); // resume sending audio ``` SDK will `finalize` audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about [Manual finalization](/stt/rt/manual-finalization#key-points) Recording will also react on system level mute events and will start sending keepalive messages to keep the session alive. You are billed for the full stream duration even when session is paused. ## Handling translation The SDK supports one-way and two-way real-time translation. Configure translation in the session config, then filter tokens by `translation_status` to separate original and translated text. ### One-way translation Translates all spoken audio into a single target language. ```typescript const recording = client.realtime.record({ model: "stt-rt-v4", translation: { type: "one_way", target_language: "es", // Translate everything to Spanish }, }); recording.on("result", (result) => { for (const token of result.tokens) { if (token.translation_status === "original") { console.log("[Original]", token.text); } else if (token.translation_status === "translation") { console.log("[Translated]", token.text); } } }); ``` ### Two-way translation Translates between two languages — each speaker's speech is translated into the other language. ```typescript const recording = client.realtime.record({ model: "stt-rt-v4", translation: { type: "two_way", language_a: "en", language_b: "fr", }, }); ``` ### Translation token fields When translation is enabled, each `RealtimeToken` includes: | Field | Type | Description | | -------------------- | --------------------------------------- | ------------------------------------------------------- | | `translation_status` | `'none' \| 'original' \| 'translation'` | Whether this token is original speech or a translation. | | `source_language` | `string` | The source language code for translated tokens. | | `language` | `string` | The language of this token's text. | Learn more about [Real-time translation](/stt/rt/real-time-translation) You can provide [custom translation terms](/stt/concepts/context#translation-terms) in the context to improve translation accuracy. ## Handle permissions The SDK provides a platform-agnostic permission system for checking and requesting microphone access before starting a recording. This is optional but recommended for a good user experience — you can show appropriate UI based on the permission state rather than waiting for the recording to fail. ### Setup Pass a [`BrowserPermissionResolver`](/stt/SDKs/web-SDK/reference/classes#browserpermissionresolver) when creating the client: ```typescript import { SonioxClient, BrowserPermissionResolver } from "@soniox/client"; const client = new SonioxClient({ api_key: fetchKey, permissions: new BrowserPermissionResolver(), }); ``` ### Check permission status `check()` queries the current microphone permission without prompting the user: ```typescript const result = await client.permissions?.check("microphone"); switch (result?.status) { case "granted": // Microphone access already granted — safe to record break; case "prompt": // User hasn't been asked yet — show a "start recording" button break; case "denied": if (!result.can_request) { // Permanently denied — show "go to browser settings" instructions } break; case "unavailable": // No microphone or getUserMedia not supported break; } ``` ### Request permission `request()` triggers the browser permission prompt. On platforms where permission is already granted, this is a no-op. ```typescript const result = await client.permissions?.request("microphone"); if (result?.status === "granted") { startRecording(); } else if (result?.status === "denied") { showPermissionDeniedMessage(); } ``` Only create `BrowserPermissionResolver` in browser environments ## Use custom audio source By default, `client.realtime.record()` uses the built-in [`MicrophoneSource`](/stt/SDKs/web-SDK/reference/classes#microphonesource) which captures audio via `getUserMedia` and [`MediaRecorder`](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder). You can replace it with any object that implements the [`AudioSource`](/stt/SDKs/web-SDK/reference/types#audiosource) interface. # LangChain.js (JavaScript) URL: /stt/integrations/langchain/langchain-js Soniox document loader for LangChain.js import Image from "next/image";
Soniox x Langchain
## Overview [LangChain](https://www.langchain.com/) is a popular framework for building applications powered by large language models (LLMs). The `@soniox/langchain` package provides a document loader that transcribes audio files using Soniox's speech-to-text API, making it easy to incorporate audio transcription into your LangChain pipelines. ## Setup Install the package: ```bash npm2yarn npm install @soniox/langchain ``` ### Credentials Get your Soniox API key from the [Soniox Console](https://console.soniox.com) and set it as an environment variable: ```bash export SONIOX_API_KEY=your_api_key ``` ## Usage ### Basic transcription Transcribe audio files using the `SonioxAudioTranscriptLoader`: ```typescript import { SonioxAudioTranscriptLoader } from "@soniox/langchain"; // Fetch the file const response = await fetch( "https://github.com/soniox/soniox_examples/raw/refs/heads/master/speech_to_text/assets/coffee_shop.mp3", ); const audioBuffer = await response.bytes(); // Uint8Array const loader = new SonioxAudioTranscriptLoader({ audio: audioBuffer, // Or you can pass in a URL string }); const docs = await loader.load(); console.log(docs[0].pageContent); // Transcribed text ``` ### Two-way translation Transcribe and translate between two languages simultaneously: ```typescript const loader = new SonioxAudioTranscriptLoader( { audio: audioBuffer, }, { translation: { type: "two_way", language_a: "en", language_b: "es", }, language_hints: ["en", "es"], }, ); const docs = await loader.load(); ``` ### One-way translation Translate from any detected language to a target language: ```typescript const loader = new SonioxAudioTranscriptLoader( { audio: audioBuffer, }, { translation: { type: "one_way", target_language: "fr", }, language_hints: ["en", "es"], }, ); const docs = await loader.load(); ``` ## Advanced usage ### Language hints Provide [language hints](/stt/concepts/language-hints) to improve transcription accuracy: ```typescript const loader = new SonioxAudioTranscriptLoader( { audio: audioBuffer, }, { language_hints: ["en", "es"], }, ); ``` ### Context for improved accuracy Provide domain-specific [context](/stt/concepts/context) to improve transcription accuracy: ```typescript const loader = new SonioxAudioTranscriptLoader( { audio: audioBuffer, }, { context: { general: [ { key: "industry", value: "healthcare" }, { key: "meeting_type", value: "consultation" }, ], terms: ["hypertension", "cardiology", "metformin"], translation_terms: [ { source: "blood pressure", target: "presión arterial" }, { source: "medication", target: "medicamento" }, ], }, }, ); ``` ## API reference ### Constructor parameters #### SonioxLoaderParams (required) | Parameter | Type | Required | Description | | ------------------- | ---------------------- | -------- | ------------------------------------------------------------------------------------------------------------------------- | | `audio` | `Uint8Array \| string` | Yes | Audio file as buffer or URL | | `audioFormat` | `SonioxAudioFormat` | No | Audio file format | | `apiKey` | `string` | No | Soniox API key (defaults to `SONIOX_API_KEY` env var) | | `apiBaseUrl` | `string` | No | API base URL (defaults to `https://api.soniox.com/v1`). See [regional endpoints](/stt/data-residency#regional-endpoints). | | `pollingIntervalMs` | `number` | No | Polling interval in ms (min: 1000, default: 1000) | | `pollingTimeoutMs` | `number` | No | Polling timeout in ms (default: 180000) | #### SonioxLoaderOptions (optional) | Parameter | Type | Description | | -------------------------------- | ---------------------------- | ---------------------------------------- | | `model` | `SonioxTranscriptionModelId` | Model to use (default: `"stt-async-v4"`) | | `translation` | `object` | Translation configuration | | `language_hints` | `string[]` | Language hints for transcription | | `language_hints_strict` | `boolean` | Enforce strict language hints | | `enable_speaker_diarization` | `boolean` | Enable speaker identification | | `enable_language_identification` | `boolean` | Enable language detection | | `context` | `object` | Context for improved accuracy | Browse the [API reference](/stt/api-reference/transcriptions/create_transcription) for a full list of supported options. ### Supported audio formats * `aac` - Advanced Audio Coding * `aiff` - Audio Interchange File Format * `amr` - Adaptive Multi-Rate * `asf` - Advanced Systems Format * `flac` - Free Lossless Audio Codec * `mp3` - MPEG Audio Layer III * `ogg` - Ogg Vorbis * `wav` - Waveform Audio File Format * `webm` - WebM Audio ### Return value The `load()` method returns an array containing a single `Document` object: ```typescript Document { pageContent: string, // The transcribed text metadata: SonioxTranscriptResponse // Full transcript with metadata } ``` The metadata includes transcribed text, speaker information (if diarization enabled), language information (if identification enabled), translation data (if translation enabled), and timing information. ## Related * [LangChain documentation](https://docs.langchain.com/oss/javascript/integrations/document_loaders/web_loaders/soniox) * [Package on NPM](https://www.npmjs.com/package/@soniox/langchain) # LangChain (Python) URL: /stt/integrations/langchain/langchain Soniox document loader for LangChain import Image from "next/image";
Soniox x Langchain
## Overview [LangChain](https://www.langchain.com/) is a popular framework for building applications powered by large language models (LLMs). The `langchain-soniox` package provides a document loader that transcribes audio files using Soniox's speech-to-text API, making it easy to incorporate audio transcription into your LangChain pipelines. ## Setup Install the package: ```bash pip install langchain-soniox ``` ### Credentials Get your Soniox API key from the [Soniox Console](https://console.soniox.com) and set it as an environment variable: ```bash export SONIOX_API_KEY=your_api_key ``` ## Usage ### Basic transcription Transcribe audio files using the `SonioxDocumentLoader`: ```python from langchain_soniox import SonioxDocumentLoader # Using a URL loader = SonioxDocumentLoader( file_url="https://soniox.com/media/examples/coffee_shop.mp3" ) docs = list(loader.lazy_load()) print(docs[0].page_content) # Transcribed text ``` You can also load audio from a local file or from bytes: ```python # Using a local file path loader = SonioxDocumentLoader(file_path="/path/to/audio.mp3") # Using binary data with open("/path/to/audio.mp3", "rb") as f: audio_bytes = f.read() loader = SonioxDocumentLoader(file_data=audio_bytes) ``` ### Async loading For async operations, use `alazy_load()`: ```python import asyncio from langchain_soniox import SonioxDocumentLoader async def transcribe_async(): loader = SonioxDocumentLoader( file_url="https://soniox.com/media/examples/coffee_shop.mp3" ) docs = [doc async for doc in loader.alazy_load()] print(docs[0].page_content) asyncio.run(transcribe_async()) ``` ## Advanced usage ### Language hints Soniox automatically detects and transcribes speech in [**60+ languages**](https://soniox.com/docs/stt/concepts/supported-languages). When you know which languages are likely to appear in your audio, provide `language_hints` to improve accuracy by biasing recognition toward those languages. Language hints **do not restrict** recognition — they only **bias** the model toward the specified languages, while still allowing other languages to be detected if present. ```python from langchain_soniox import ( SonioxDocumentLoader, SonioxTranscriptionOptions, ) loader = SonioxDocumentLoader( file_url="https://soniox.com/media/examples/coffee_shop.mp3", options=SonioxTranscriptionOptions( language_hints=["en", "es"], ), ) docs = list(loader.lazy_load()) ``` For more details, see the [Soniox language hints documentation](https://soniox.com/docs/stt/concepts/language-hints). ### Speaker diarization Enable speaker identification to distinguish between different speakers: ```python from langchain_soniox import ( SonioxDocumentLoader, SonioxTranscriptionOptions, ) loader = SonioxDocumentLoader( file_url="https://soniox.com/media/examples/coffee_shop.mp3", options=SonioxTranscriptionOptions( enable_speaker_diarization=True, ), ) docs = list(loader.lazy_load()) # Access speaker information in the metadata current_speaker = None output = "" for token in docs[0].metadata["tokens"]: if current_speaker != token["speaker"]: current_speaker = token["speaker"] output += f"\nSpeaker {current_speaker}: {token['text'].lstrip()}" else: output += token["text"] print(output) ``` ### Language identification Enable automatic language detection and identification: ```python from langchain_soniox import ( SonioxDocumentLoader, SonioxTranscriptionOptions, ) loader = SonioxDocumentLoader( file_url="https://soniox.com/media/examples/coffee_shop.mp3", options=SonioxTranscriptionOptions( enable_language_identification=True, ), ) docs = list(loader.lazy_load()) # Access language information in the metadata current_language = None output = "" for token in docs[0].metadata["tokens"]: if current_language != token["language"]: current_language = token["language"] output += f"\n[{current_language}] {token['text'].lstrip()}" else: output += token["text"] print(output) ``` ### Context for improved accuracy Provide domain-specific [context](https://soniox.com/docs/stt/concepts/context) to improve transcription accuracy. Context helps the model understand your domain, recognize important terms, and apply custom vocabulary. The `context` object supports four optional sections: ```python from langchain_soniox import ( SonioxDocumentLoader, SonioxTranscriptionOptions, StructuredContext, StructuredContextGeneralItem, StructuredContextTranslationTerm, ) loader = SonioxDocumentLoader( file_url="https://soniox.com/media/examples/coffee_shop.mp3", options=SonioxTranscriptionOptions( context=StructuredContext( # Structured key-value information (domain, topic, intent, etc.) general=[ StructuredContextGeneralItem(key="domain", value="Healthcare"), StructuredContextGeneralItem( key="topic", value="Diabetes management consultation" ), StructuredContextGeneralItem(key="doctor", value="Dr. Martha Smith"), ], # Longer free-form background text or related documents text="The patient has a history of...", # Domain-specific or uncommon words terms=["Celebrex", "Zyrtec", "Xanax"], # Custom translations for ambiguous terms translation_terms=[ StructuredContextTranslationTerm( source="Mr. Smith", target="Sr. Smith" ), StructuredContextTranslationTerm(source="MRI", target="RM"), ], ), ), ) docs = list(loader.lazy_load()) ``` For more details, see the [Soniox context documentation](https://soniox.com/docs/stt/concepts/context). ### Translation Translate from any detected language to a target language: ```python from langchain_soniox import ( SonioxDocumentLoader, SonioxTranscriptionOptions, TranslationConfig, ) loader = SonioxDocumentLoader( file_url="https://soniox.com/media/examples/coffee_shop.mp3", options=SonioxTranscriptionOptions( translation=TranslationConfig( type="one_way", target_language="fr", ), language_hints=["en"], ), ) docs = list(loader.lazy_load()) for token in docs[0].metadata["tokens"]: if token["translation_status"] == "translation": translated_text += token["text"] else: original_text += token["text"] print(original_text) print(translated_text) ``` You can also transcribe and translate between two languages simultaneously using `two_way` translation type. Learn more about translation [here](https://soniox.com/docs/stt/async/async-translation). ## API reference ### Constructor parameters | Parameter | Type | Required | Default | Description | | ------------------------------ | ---------------------------- | -------- | ------------------------------ | -------------------------------------------------- | | `file_path` | `str` | No\* | `None` | Path to local audio file to transcribe | | `file_data` | `bytes` | No\* | `None` | Binary data of audio file to transcribe | | `file_url` | `str` | No\* | `None` | URL of audio file to transcribe | | `api_key` | `str` | No | `SONIOX_API_KEY` env var | Soniox API key | | `base_url` | `str` | No | `https://api.soniox.com/v1` | API base URL (see [regional endpoints][endpoints]) | | `options` | `SonioxTranscriptionOptions` | No | `SonioxTranscriptionOptions()` | Transcription options | | `polling_interval_seconds` | `float` | No | `1.0` | Time between status polls (seconds) | | `timeout_seconds` | `float` | No | `300.0` (5 minutes) | Maximum time to wait for transcription | | `http_request_timeout_seconds` | `float` | No | `60.0` | Timeout for individual HTTP requests | \* You must specify **exactly one** of: `file_path`, `file_data`, or `file_url`. [endpoints]: https://soniox.com/docs/stt/data-residency#regional-endpoints ### Transcription options The `SonioxTranscriptionOptions` class supports these parameters: | Parameter | Type | Description | | -------------------------------- | ------------------- | ----------------------------------------------------- | | `model` | `str` | Async model to use (see [available models][models]) | | `language_hints` | `list[str]` | Language hints for transcription (ISO language codes) | | `language_hints_strict` | `bool` | Enforce strict language hints | | `enable_speaker_diarization` | `bool` | Enable speaker identification | | `enable_language_identification` | `bool` | Enable language detection | | `translation` | `TranslationConfig` | Translation configuration | | `context` | `StructuredContext` | Context for improved accuracy | | `client_reference_id` | `str` | Custom reference ID for your records | | `webhook_url` | `str` | Webhook URL for completion notifications | | `webhook_auth_header_name` | `str` | Custom auth header name for webhook | | `webhook_auth_header_value` | `str` | Custom auth header value for webhook | Browse the [API documentation](https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription) for a full list of supported options. [models]: https://soniox.com/docs/stt/models ### Return value The `lazy_load()` and `alazy_load()` methods yield a single `Document` object: ```python Document( page_content=str, # The transcribed text metadata={ "source": str, # File URL, path, or "file_upload" "transcription_id": str, # Unique transcription ID "audio_duration_ms": int, # Audio duration in milliseconds "model": str, # Model used for transcription "created_at": str, # ISO 8601 timestamp "tokens": list[dict], # Detailed token-level information } ) ``` The `tokens` array in metadata includes detailed information for each transcribed word: * `text`: The transcribed text * `start_ms`: Start time in milliseconds * `end_ms`: End time in milliseconds * `speaker`: Speaker ID (if diarization enabled), for example `"1"`, `"2"`, etc. * `language`: Detected language (if identification enabled), for example `"en"`, `"fr"`, etc. * `translation_status`: Translation status (`"original"`, `"translated"` or `"none"`) Learn more about the [Soniox API reference](https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription_transcript). ## Related * [LangChain documentation](https://docs.langchain.com/oss/python/integrations/document_loaders/soniox) * [Package on PyPI](https://pypi.org/project/langchain-soniox/) # Classes URL: /stt/SDKs/node-SDK/reference/classes Soniox Node SDK — Class Reference ## SonioxNodeClient Soniox Node Client ### Example ```typescript import { SonioxNodeClient } from '@soniox/node'; const client = new SonioxNodeClient({ api_key: 'your-api-key', }); ``` ### Constructor ```ts new SonioxNodeClient(options): SonioxNodeClient; ``` **Parameters** | Parameter | Type | | --------- | ---------------------------------------------------------- | | `options` | [`SonioxNodeClientOptions`](types#sonioxnodeclientoptions) | **Returns** `SonioxNodeClient` ### Properties | Property | Type | | ---------- | ------------------------------------------------ | | `auth` | [`SonioxAuthAPI`](classes#sonioxauthapi) | | `files` | [`SonioxFilesAPI`](classes#sonioxfilesapi) | | `models` | [`SonioxModelsAPI`](classes#sonioxmodelsapi) | | `realtime` | [`SonioxRealtimeApi`](classes#sonioxrealtimeapi) | | `stt` | [`SonioxSttApi`](classes#sonioxsttapi) | | `webhooks` | [`SonioxWebhooksAPI`](classes#sonioxwebhooksapi) | *** ## SonioxFilesAPI ### delete() ```ts delete(file, signal?): Promise; ``` Permanently deletes a file. This operation is idempotent - succeeds even if the file doesn't exist. **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------- | --------------------------------------------- | | `file` | [`FileIdentifier`](types#fileidentifier) | The UUID of the file or a SonioxFile instance | | `signal?` | `AbortSignal` | Optional AbortSignal for cancellation | **Returns** `Promise`\<`void`> **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript // Delete by ID await client.files.delete('550e8400-e29b-41d4-a716-446655440000'); // Or delete a file instance const file = await client.files.get('550e8400-e29b-41d4-a716-446655440000'); if (file) { await client.files.delete(file); } // Or just use the instance method await file.delete(); ``` *** ### delete\_all() ```ts delete_all(options): Promise; ``` Permanently deletes all uploaded files. Iterates through all pages of files and deletes each one. **Parameters** | Parameter | Type | Description | | --------- | ------------------------------------------------------ | -------------------------------------- | | `options` | [`DeleteAllFilesOptions`](types#deleteallfilesoptions) | Optional signal and progress callback. | **Returns** `Promise`\<`void`> The number of files deleted. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Throws** `Error` If the operation is aborted via signal. **Example** ```typescript // Delete all files await client.files.delete_all(); console.log(`Deleted all files.`); // With cancellation const controller = new AbortController(); await client.files.delete_all({ signal: controller.signal }); ``` *** ### get() ```ts get(file, signal?): Promise; ``` Retrieve metadata for an uploaded file. **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------- | --------------------------------------------- | | `file` | [`FileIdentifier`](types#fileidentifier) | The UUID of the file or a SonioxFile instance | | `signal?` | `AbortSignal` | Optional AbortSignal for cancellation | **Returns** `Promise`\<[`SonioxFile`](classes#sonioxfile) | `null`> The file instance, or null if not found **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript const file = await client.files.get('550e8400-e29b-41d4-a716-446655440000'); if (file) { console.log(file.filename, file.size); } ``` *** ### list() ```ts list(options): Promise; ``` Retrieves list of uploaded files The returned result is async iterable - use `for await...of` **Parameters** | Parameter | Type | Description | | --------- | -------------------------------------------- | ----------------------------------------------- | | `options` | [`ListFilesOptions`](types#listfilesoptions) | Optional pagination and cancellation parameters | **Returns** `Promise`\<[`FileListResult`](classes#filelistresult)> FileListResult **Throws** [SonioxHttpError](classes#sonioxhttperror) **Example** ```typescript const result = await client.files.list(); // Automatic paging - iterates through ALL files across all pages for await (const file of result) { console.log(file.filename, file.size); } // Or access just the first page for (const file of result.files) { console.log(file.filename); } // Check if there are more pages if (result.isPaged()) { console.log('More pages available'); } // Manual paging using cursor const page1 = await client.files.list({ limit: 10 }); if (page1.next_page_cursor) { const page2 = await client.files.list({ cursor: page1.next_page_cursor }); } // With cancellation const controller = new AbortController(); const result = await client.files.list({ signal: controller.signal }); ``` *** ### upload() ```ts upload(file, options): Promise; ``` Uploads a file to Soniox for transcription **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------------- | ------------------------------------------- | | `file` | [`UploadFileInput`](types#uploadfileinput) | Buffer, Uint8Array, Blob, or ReadableStream | | `options` | [`UploadFileOptions`](types#uploadfileoptions) | Upload options | **Returns** `Promise`\<[`SonioxFile`](classes#sonioxfile)> The uploaded file metadata **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors **Throws** `Error` On validation errors (file too large, invalid input) **Examples** ```typescript import * as fs from 'node:fs'; const buffer = await fs.promises.readFile('/path/to/audio.mp3'); const file = await client.files.upload(buffer, { filename: 'audio.mp3' }); ``` ```typescript const file = await client.files.upload(Bun.file('/path/to/audio.mp3')); ``` ```typescript const file = await client.files.upload(buffer, { filename: 'audio.mp3', client_reference_id: 'order-12345', }); ``` ```typescript const controller = new AbortController(); setTimeout(() => controller.abort(), 30000); const file = await client.files.upload(buffer, { filename: 'audio.mp3', signal: controller.signal, }); ``` *** ## SonioxSttApi ### create() ```ts create(options, signal?): Promise; ``` Creates a new transcription from audio\_url or file\_id **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------------------------------- | ------------------------------------------------------- | | `options` | [`CreateTranscriptionOptions`](types#createtranscriptionoptions) | Transcription options including model and audio source. | | `signal?` | `AbortSignal` | - | **Returns** `Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)> The created transcription. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Example** ```typescript // Transcribe from URL const transcription = await client.stt.create({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', }); // Transcribe from uploaded file const file = await client.files.upload(buffer); const transcription = await client.stt.create({ model: 'stt-async-v4', file_id: file.id, }); // With speaker diarization const transcription = await client.stt.create({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', enable_speaker_diarization: true, }); ``` *** ### delete() ```ts delete(id, signal?): Promise; ``` Permanently deletes a transcription. This operation is idempotent - succeeds even if the transcription doesn't exist. **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------------------------- | --------------------------------------------------------------- | | `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance | | `signal?` | `AbortSignal` | - | **Returns** `Promise`\<`void`> **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript // Delete by ID await client.stt.delete('550e8400-e29b-41d4-a716-446655440000'); // Or delete a transcription instance const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000'); if (transcription) { await client.stt.delete(transcription); } ``` *** ### delete\_all() ```ts delete_all(options): Promise; ``` Permanently deletes all transcriptions. Iterates through all pages of transcriptions and deletes each one. **Parameters** | Parameter | Type | Description | | --------- | ------------------------------------------------------------------------ | ---------------- | | `options` | [`DeleteAllTranscriptionsOptions`](types#deletealltranscriptionsoptions) | Optional signal. | **Returns** `Promise`\<`void`> **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Throws** `Error` If the operation is aborted via signal. **Example** ```typescript // Delete all transcriptions await client.stt.delete_all(); console.log(`Deleted all transcriptions.`); // With cancellation const controller = new AbortController(); await client.stt.delete_all({ signal: controller.signal }); ``` *** ### destroy() ```ts destroy(id): Promise; ``` Permanently deletes a transcription and its associated file (if any). This operation is idempotent - succeeds even if resources don't exist. **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------------------------- | --------------------------------------------------------------- | | `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance | **Returns** `Promise`\<`void`> **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript // Clean up both transcription and uploaded file const transcription = await client.stt.transcribe({ model: 'stt-async-v4', file: buffer, wait: true, }); // ... use transcription ... await client.stt.destroy(transcription); // Deletes both // Or by ID await client.stt.destroy('550e8400-e29b-41d4-a716-446655440000'); ``` *** ### destroy\_all() ```ts destroy_all(options): Promise; ``` Permanently deletes all transcriptions and their associated files. Iterates through all pages of transcriptions and calls [destroy](#destroy) on each one, removing both the transcription and its uploaded file. **Parameters** | Parameter | Type | Description | | --------- | ------------------------------------------------------------------------ | -------------------------------------- | | `options` | [`DeleteAllTranscriptionsOptions`](types#deletealltranscriptionsoptions) | Optional signal and progress callback. | **Returns** `Promise`\<`void`> The number of transcriptions destroyed. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Throws** `Error` If the operation is aborted via signal. **Example** ```typescript // Destroy all transcriptions and their files await client.stt.destroy_all(); console.log(`Destroyed all transcriptions and their files.`); // With cancellation const controller = new AbortController(); await client.stt.destroy_all({ signal: controller.signal }); ``` *** ### get() ```ts get(id, signal?): Promise; ``` Retrieves a transcription by ID **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------------------------- | ---------------------------------------------------------------- | | `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance. | | `signal?` | `AbortSignal` | - | **Returns** `Promise`\<[`SonioxTranscription`](classes#sonioxtranscription) | `null`> The transcription, or null if not found. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404). **Example** ```typescript const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000'); if (transcription) { console.log(transcription.status, transcription.model); } ``` *** ### getTranscript() ```ts getTranscript(id, signal?): Promise; ``` Retrieves the full transcript text and tokens for a completed transcription. Only available for successfully completed transcriptions. **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------------------------- | --------------------------------------------------------------- | | `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance | | `signal?` | `AbortSignal` | - | **Returns** `Promise`\<[`SonioxTranscript`](classes#sonioxtranscript) | `null`> The transcript with text and detailed tokens, or null if not found **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript const transcript = await client.stt.getTranscript('550e8400-e29b-41d4-a716-446655440000'); if (transcript) { console.log(transcript.text); for (const token of transcript.tokens) { console.log(token.text, token.start_ms, token.end_ms, token.confidence); } } ``` *** ### list() ```ts list(options, signal?): Promise; ``` Retrieves list of transcriptions The returned result is async iterable - use `for await...of` to iterate through all pages **Parameters** | Parameter | Type | Description | | --------- | -------------------------------------------------------------- | ------------------------------------------ | | `options` | [`ListTranscriptionsOptions`](types#listtranscriptionsoptions) | Optional pagination and filter parameters. | | `signal?` | `AbortSignal` | - | **Returns** `Promise`\<[`TranscriptionListResult`](classes#transcriptionlistresult)> TranscriptionListResult with async iteration support. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Example** ```typescript const result = await client.stt.list(); // Automatic paging - iterates through ALL transcriptions across all pages for await (const transcription of result) { console.log(transcription.id, transcription.status); } // Or access just the first page for (const transcription of result.transcriptions) { console.log(transcription.id); } // Check if there are more pages if (result.isPaged()) { console.log('More pages available'); } ``` *** ### transcribe() ```ts transcribe(options): Promise; ``` Unified transcribe method - supports direct file upload When `file` is provided, uploads it first then creates a transcription When `wait: true`, waits for completion before returning When `cleanup` is specified (requires `wait: true`), cleans up resources after completion or on error/timeout **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------------------- | -------------------------------------------------------------------- | | `options` | [`TranscribeOptions`](types#transcribeoptions) | Transcribe options including model, audio source, and wait settings. | **Returns** `Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)> The transcription (completed if wait=true, otherwise in queued/processing state). **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Throws** `Error` On validation errors or wait timeout. **Example** ```typescript // Transcribe from URL and wait for completion const result = await client.stt.transcribe({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', wait: true, }); // Upload file and transcribe in one call const result = await client.stt.transcribe({ model: 'stt-async-v4', file: buffer, // or Blob, ReadableStream filename: 'meeting.mp3', enable_speaker_diarization: true, wait: true, }); // With wait progress callback const result = await client.stt.transcribe({ model: 'stt-async-v4', file: buffer, wait: true, wait_options: { interval_ms: 2000, on_status_change: (status) => console.log(`Status: ${status}`), }, }); // Auto-cleanup uploaded file after transcription const result = await client.stt.transcribe({ model: 'stt-async-v4', file: buffer, wait: true, cleanup: ['file'], // Deletes uploaded file, keeps transcription record }); // Auto-cleanup everything after transcription const result = await client.stt.transcribe({ model: 'stt-async-v4', file: buffer, wait: true, cleanup: ['file', 'transcription'], // Deletes both file and transcription record }); ``` *** ### transcribeFromFile() ```ts transcribeFromFile(file, options): Promise; ``` Wrapper to transcribe from raw file data. **Parameters** | Parameter | Type | Description | | --------- | -------------------------------------------------------------- | ------------------------------------------- | | `file` | [`UploadFileInput`](types#uploadfileinput) | Buffer, Uint8Array, Blob, or ReadableStream | | `options` | [`TranscribeFromFileOptions`](types#transcribefromfileoptions) | Transcription options (excluding file) | **Returns** `Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)> The transcription (completed if wait=true, otherwise in queued/processing state). *** ### transcribeFromFileId() ```ts transcribeFromFileId(file_id, options): Promise; ``` Wrapper to transcribe from an uploaded file ID. **Parameters** | Parameter | Type | Description | | --------- | ------------------------------------------------------------------ | ------------------------------------------ | | `file_id` | `string` | ID of a previously uploaded file | | `options` | [`TranscribeFromFileIdOptions`](types#transcribefromfileidoptions) | Transcription options (excluding file\_id) | **Returns** `Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)> The transcription (completed if wait=true, otherwise in queued/processing state). *** ### transcribeFromUrl() ```ts transcribeFromUrl(audio_url, options): Promise; ``` Wrapper to transcribe from a URL. **Parameters** | Parameter | Type | Description | | ----------- | ------------------------------------------------------------ | -------------------------------------------- | | `audio_url` | `string` | Publicly accessible audio URL | | `options` | [`TranscribeFromUrlOptions`](types#transcribefromurloptions) | Transcription options (excluding audio\_url) | **Returns** `Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)> The transcription (completed if wait=true, otherwise in queued/processing state). *** ### wait() ```ts wait(id, options?): Promise; ``` Waits for a transcription to complete **Parameters** | Parameter | Type | Description | | ---------- | ---------------------------------------------------------- | ---------------------------------------------------------------- | | `id` | [`TranscriptionIdentifier`](types#transcriptionidentifier) | The UUID of the transcription or a SonioxTranscription instance. | | `options?` | [`WaitOptions`](types#waitoptions) | Wait options including polling interval, timeout, and callbacks. | **Returns** `Promise`\<[`SonioxTranscription`](classes#sonioxtranscription)> The completed or errored transcription. **Throws** `Error` If the wait times out or is aborted. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Example** ```typescript const completed = await client.stt.wait('550e8400-e29b-41d4-a716-446655440000'); // With progress callback const completed = await client.stt.wait('id', { interval_ms: 2000, on_status_change: (status) => console.log(`Status: ${status}`), }); ``` *** ## SonioxModelsAPI ### list() ```ts list(signal?): Promise; ``` List of available models and their attributes. **Parameters** | Parameter | Type | Description | | --------- | ------------- | ------------------------------------- | | `signal?` | `AbortSignal` | Optional AbortSignal for cancellation | **Returns** `Promise`\<[`SonioxModel`](types#sonioxmodel)\[]> List of available models and their attributes. **See** [https://soniox.com/docs/stt/api-reference/models/get\_models](https://soniox.com/docs/stt/api-reference/models/get_models) *** ## SonioxWebhooksAPI Webhook utilities API accessible via client.webhooks Provides methods for handling incoming Soniox webhook requests. When used via the client, results include lazy fetch helpers for transcripts. ### getAuthFromEnv() ```ts getAuthFromEnv(): WebhookAuthConfig | undefined; ``` Get webhook authentication configuration from environment variables. Reads `SONIOX_API_WEBHOOK_HEADER` and `SONIOX_API_WEBHOOK_SECRET` environment variables. Returns undefined if either variable is not set (both are required for authentication). **Returns** [`WebhookAuthConfig`](types#webhookauthconfig) | `undefined` *** ### handle() ```ts handle(options): WebhookHandlerResultWithFetch; ``` Framework-agnostic webhook handler **Parameters** | Parameter | Type | | --------- | ---------------------------------------------------- | | `options` | [`HandleWebhookOptions`](types#handlewebhookoptions) | **Returns** [`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch) *** ### handleExpress() ```ts handleExpress(req, auth?): WebhookHandlerResultWithFetch; ``` Handle a webhook from an Express-like request **Parameters** | Parameter | Type | | --------- | ------------------------------------------------ | | `req` | [`ExpressLikeRequest`](types#expresslikerequest) | | `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) | **Returns** [`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch) **Example** ```typescript app.post('/webhook', async (req, res) => { const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } res.status(result.status).json({ received: true }); }); ``` *** ### handleFastify() ```ts handleFastify(req, auth?): WebhookHandlerResultWithFetch; ``` Handle a webhook from a Fastify request **Parameters** | Parameter | Type | | --------- | ------------------------------------------------ | | `req` | [`FastifyLikeRequest`](types#fastifylikerequest) | | `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) | **Returns** [`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch) *** ### handleHono() ```ts handleHono(c, auth?): Promise; ``` Handle a webhook from a Hono context **Parameters** | Parameter | Type | | --------- | ---------------------------------------------- | | `c` | [`HonoLikeContext`](types#honolikecontext) | | `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) | **Returns** `Promise`\<[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)> *** ### handleNestJS() ```ts handleNestJS(req, auth?): WebhookHandlerResultWithFetch; ``` Handle a webhook from a NestJS request **Parameters** | Parameter | Type | | --------- | ---------------------------------------------- | | `req` | [`NestJSLikeRequest`](types#nestjslikerequest) | | `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) | **Returns** [`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch) *** ### handleRequest() ```ts handleRequest(request, auth?): Promise; ``` Handle a webhook from a Fetch API Request **Parameters** | Parameter | Type | | --------- | ---------------------------------------------- | | `request` | `Request` | | `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) | **Returns** `Promise`\<[`WebhookHandlerResultWithFetch`](types#webhookhandlerresultwithfetch)> *** ### isEvent() ```ts isEvent(payload): payload is WebhookEvent; ``` Type guard to check if a value is a valid WebhookEvent **Parameters** | Parameter | Type | | --------- | --------- | | `payload` | `unknown` | **Returns** `payload is WebhookEvent` *** ### parseEvent() ```ts parseEvent(payload): WebhookEvent; ``` Parse and validate a webhook event payload **Parameters** | Parameter | Type | | --------- | --------- | | `payload` | `unknown` | **Returns** [`WebhookEvent`](types#webhookevent) *** ### verifyAuth() ```ts verifyAuth(headers, auth): boolean; ``` Verify webhook authentication header **Parameters** | Parameter | Type | | --------- | ---------------------------------------------- | | `headers` | [`WebhookHeaders`](types#webhookheaders) | | `auth` | [`WebhookAuthConfig`](types#webhookauthconfig) | **Returns** `boolean` *** ## SonioxAuthAPI ### createTemporaryKey() ```ts createTemporaryKey(request, signal?): Promise; ``` Creates a temporary API key for client-side use. **Parameters** | Parameter | Type | Description | | --------- | -------------------------------------------------------- | ---------------------------------------- | | `request` | [`TemporaryApiKeyRequest`](types#temporaryapikeyrequest) | Request parameters for the temporary key | | `signal?` | `AbortSignal` | Optional AbortSignal for cancellation | **Returns** `Promise`\<[`TemporaryApiKeyResponse`](types#temporaryapikeyresponse)> The temporary API key response *** ## SonioxRealtimeApi Real-time API factory for creating STT sessions. ### Example ```typescript const session = client.realtime.stt({ model: 'stt-rt-v4', enable_endpoint_detection: true, }); await session.connect(); ``` ### stt() ```ts stt(config, options?): RealtimeSttSession; ``` Create a new Speech-to-Text session. **Parameters** | Parameter | Type | Description | | ---------- | ---------------------------------------------- | -------------------------------------- | | `config` | [`SttSessionConfig`](types#sttsessionconfig) | Session configuration (sent to server) | | `options?` | [`SttSessionOptions`](types#sttsessionoptions) | Session options (SDK-level settings) | **Returns** [`RealtimeSttSession`](classes#realtimesttsession) New STT session instance *** ## SonioxFile Uploaded file ### Constructor ```ts new SonioxFile(data, _http): SonioxFile; ``` **Parameters** | Parameter | Type | | --------- | ---------------------------------------- | | `data` | [`SonioxFileData`](types#sonioxfiledata) | | `_http` | [`HttpClient`](types#httpclient) | **Returns** `SonioxFile` ### delete() ```ts delete(signal?): Promise; ``` Permanently deletes this file. This operation is idempotent - succeeds even if the file doesn't exist. **Parameters** | Parameter | Type | Description | | --------- | ------------- | ------------------------------------- | | `signal?` | `AbortSignal` | Optional AbortSignal for cancellation | **Returns** `Promise`\<`void`> **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript const file = await client.files.get('550e8400-e29b-41d4-a716-446655440000'); if (file) { await file.delete(); } ``` *** ### toJSON() ```ts toJSON(): SonioxFileData; ``` Returns the raw data for this file. **Returns** [`SonioxFileData`](types#sonioxfiledata) ### Properties | Property | Type | | --------------------- | ----------------------- | | `client_reference_id` | `string` \| `undefined` | | `created_at` | `string` | | `filename` | `string` | | `id` | `string` | | `size` | `number` | *** ## SonioxTranscription A Transcription instance ### Constructor ```ts new SonioxTranscription( data, _http, transcript?): SonioxTranscription; ``` **Parameters** | Parameter | Type | | ------------- | ---------------------------------------------------------- | | `data` | [`SonioxTranscriptionData`](types#sonioxtranscriptiondata) | | `_http` | [`HttpClient`](types#httpclient) | | `transcript?` | [`SonioxTranscript`](classes#sonioxtranscript) \| `null` | **Returns** `SonioxTranscription` ### delete() ```ts delete(): Promise; ``` Permanently deletes this transcription. This operation is idempotent - succeeds even if the transcription doesn't exist. **Returns** `Promise`\<`void`> **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000'); await transcription.delete(); ``` *** ### destroy() ```ts destroy(): Promise; ``` Permanently deletes this transcription and its associated file (if any). This operation is idempotent - succeeds even if resources don't exist. **Returns** `Promise`\<`void`> **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404) **Example** ```typescript // Clean up both transcription and uploaded file const transcription = await client.stt.transcribe({ model: 'stt-async-v4', file: buffer, wait: true, }); // ... use transcription ... await transcription.destroy(); // Deletes both transcription and file ``` *** ### getTranscript() ```ts getTranscript(options?): Promise; ``` Retrieves the full transcript text and tokens for this transcription. Only available for successfully completed transcriptions. Returns cached transcript if available (when using `transcribe()` with `wait: true`). Use `force: true` to bypass the cache and fetch fresh data from the API. **Parameters** | Parameter | Type | Description | | ----------------- | --------------------------------------------------- | -------------------------------------------------------- | | `options?` | \{ `force?`: `boolean`; `signal?`: `AbortSignal`; } | Optional settings | | `options.force?` | `boolean` | If true, bypasses cached transcript and fetches from API | | `options.signal?` | `AbortSignal` | Optional AbortSignal for request cancellation | **Returns** `Promise`\<[`SonioxTranscript`](classes#sonioxtranscript) | `null`> The transcript with text and detailed tokens, or null if not found. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors (except 404). **Example** ```typescript const transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000'); if (transcription) { const transcript = await transcription.getTranscript(); if (transcript) { console.log(transcript.text); } } // Force re-fetch from API const freshTranscript = await transcription.getTranscript({ force: true }); ``` *** ### refresh() ```ts refresh(signal?): Promise; ``` Re-fetches this transcription to get the latest status. **Parameters** | Parameter | Type | Description | | --------- | ------------- | ---------------------------------------------- | | `signal?` | `AbortSignal` | Optional AbortSignal for request cancellation. | **Returns** `Promise`\<`SonioxTranscription`> A new SonioxTranscription instance with updated data. **Throws** [SonioxHttpError](classes#sonioxhttperror) **Example** ```typescript let transcription = await client.stt.get('550e8400-e29b-41d4-a716-446655440000'); transcription = await transcription.refresh(); console.log(transcription.status); ``` *** ### toJSON() ```ts toJSON(): SonioxTranscriptionData; ``` Returns the raw data for this transcription. **Returns** [`SonioxTranscriptionData`](types#sonioxtranscriptiondata) *** ### wait() ```ts wait(options): Promise; ``` Waits for the transcription to complete or fail. Polls the API at the specified interval until the status is 'completed' or 'error'. **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------- | ---------------------------------------------------------------- | | `options` | [`WaitOptions`](types#waitoptions) | Wait options including polling interval, timeout, and callbacks. | **Returns** `Promise`\<`SonioxTranscription`> The completed or errored transcription. **Throws** `Error` If the wait times out or is aborted. **Throws** [SonioxHttpError](classes#sonioxhttperror) On API errors. **Example** ```typescript const transcription = await client.stt.create({ model: 'stt-async-v4', audio_url: 'https://soniox.com/media/examples/coffee_shop.mp3', }); // Simple wait const completed = await transcription.wait(); // Wait with progress callback const completed = await transcription.wait({ interval_ms: 2000, on_status_change: (status) => console.log(`Status: ${status}`), }); ``` ### Properties | Property | Type | Description | | -------------------------------- | -------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `audio_duration_ms` | `number` \| `null` \| `undefined` | Duration of the audio in milliseconds. Only available after processing begins. | | `audio_url` | `string` \| `null` \| `undefined` | URL of the audio file being transcribed. | | `client_reference_id` | `string` \| `null` \| `undefined` | Optional tracking identifier. | | `context` | \| [`TranscriptionContext`](types#transcriptioncontext) \| `null` \| `undefined` | Additional context provided for the transcription. | | `created_at` | `string` | UTC timestamp when the transcription was created. | | `enable_language_identification` | `boolean` | When true, language is detected for each part of the transcription. | | `enable_speaker_diarization` | `boolean` | When true, speakers are identified and separated in the transcription output. | | `error_message` | `string` \| `null` \| `undefined` | Error message if transcription failed. | | `error_type` | `string` \| `null` \| `undefined` | Error type if transcription failed. | | `file_id` | `string` \| `null` \| `undefined` | ID of the uploaded file being transcribed. | | `filename` | `string` | Name of the file being transcribed. | | `id` | `string` | Unique identifier of the transcription. | | `language_hints` | `string`\[] \| `undefined` | Expected languages in the audio. | | `model` | `string` | Speech-to-text model used. | | `status` | [`TranscriptionStatus`](types#transcriptionstatus) | Current status of the transcription. | | `transcript` | [`SonioxTranscript`](classes#sonioxtranscript) \| `null` \| `undefined` | Pre-fetched transcript. Only available when using `transcribe()` with `wait: true`, `fetch_transcript !== false`, and the transcription completed successfully | | `webhook_auth_header_name` | `string` \| `null` \| `undefined` | Name of the authentication header sent with webhook notifications. | | `webhook_auth_header_value` | `string` \| `null` \| `undefined` | Authentication header value (masked). | | `webhook_status_code` | `number` \| `null` \| `undefined` | HTTP status code received when webhook was delivered. | | `webhook_url` | `string` \| `null` \| `undefined` | URL to receive webhook notifications. | *** ## SonioxTranscript A Transcript result containing the transcribed text and tokens. ### Constructor ```ts new SonioxTranscript(data): SonioxTranscript; ``` **Parameters** | Parameter | Type | | --------- | ------------------------------------------------ | | `data` | [`TranscriptResponse`](types#transcriptresponse) | **Returns** `SonioxTranscript` ### segments() ```ts segments(options?): TranscriptSegment[]; ``` Groups tokens into segments based on specified grouping keys. A new segment starts when any of the `group_by` fields changes. **Parameters** | Parameter | Type | Description | | ---------- | ------------------------------------------------------------ | -------------------- | | `options?` | [`SegmentTranscriptOptions`](types#segmenttranscriptoptions) | Segmentation options | **Returns** [`TranscriptSegment`](types#transcriptsegment)\[] Array of segments with combined text and timing **Example** ```typescript const transcript = await transcription.getTranscript(); // Group by both speaker and language (default) const segments = transcript.segments(); // Group by speaker only const bySpeaker = transcript.segments({ group_by: ['speaker'] }); for (const s of segments) { console.log(`[Speaker ${s.speaker}] ${s.text}`); } ``` ### Properties | Property | Type | Description | | -------- | --------------------------------------------- | ------------------------------------------------------------------ | | `id` | `string` | Unique identifier of the transcription this transcript belongs to. | | `text` | `string` | Complete transcribed text content. | | `tokens` | [`TranscriptToken`](types#transcripttoken)\[] | List of detailed token information with timestamps and metadata. | *** ## FileListResult Result set for file listing ### Constructor ```ts new FileListResult( initialResponse, _http, _limit, _signal): FileListResult; ``` **Parameters** | Parameter | Type | Default value | | ----------------- | ------------------------------------------------------------------------------------------ | ------------- | | `initialResponse` | [`ListFilesResponse`](types#listfilesresponset)\<[`SonioxFileData`](types#sonioxfiledata)> | `undefined` | | `_http` | [`HttpClient`](types#httpclient) | `undefined` | | `_limit` | `number` \| `undefined` | `undefined` | | `_signal` | `AbortSignal` \| `undefined` | `undefined` | **Returns** `FileListResult` ### \[asyncIterator]\() ```ts asyncIterator: AsyncIterator; ``` Async iterator that automatically fetches all pages Use with `for await...of` to iterate through all files **Returns** `AsyncIterator`\<[`SonioxFile`](classes#sonioxfile)> *** ### isPaged() ```ts isPaged(): boolean; ``` Returns true if there are more pages of results beyond the first page **Returns** `boolean` *** ### toJSON() ```ts toJSON(): ListFilesResponse; ``` Returns the raw data for this list result. Also used by JSON.stringify() to prevent serialization of internal HTTP client. **Returns** [`ListFilesResponse`](types#listfilesresponset)\<[`SonioxFileData`](types#sonioxfiledata)> ### Properties | Property | Type | Description | | ------------------ | ------------------------------------- | ---------------------------------------------------------- | | `files` | [`SonioxFile`](classes#sonioxfile)\[] | Files from the first page of results | | `next_page_cursor` | `string` \| `null` | Pagination cursor for the next page. Null if no more pages | *** ## TranscriptionListResult Result set for transcription listing. ### Constructor ```ts new TranscriptionListResult( initialResponse, _http, _options, _signal?): TranscriptionListResult; ``` **Parameters** | Parameter | Type | | ----------------- | ------------------------------------------------------------------------------------------------------------------------------ | | `initialResponse` | [`ListTranscriptionsResponse`](types#listtranscriptionsresponset)\<[`SonioxTranscriptionData`](types#sonioxtranscriptiondata)> | | `_http` | [`HttpClient`](types#httpclient) | | `_options` | [`ListTranscriptionsOptions`](types#listtranscriptionsoptions) | | `_signal?` | `AbortSignal` | **Returns** `TranscriptionListResult` ### \[asyncIterator]\() ```ts asyncIterator: AsyncIterator; ``` Async iterator that automatically fetches all pages. Use with `for await...of` to iterate through all transcriptions. **Returns** `AsyncIterator`\<[`SonioxTranscription`](classes#sonioxtranscription)> *** ### isPaged() ```ts isPaged(): boolean; ``` Returns true if there are more pages of results beyond the first page. **Returns** `boolean` *** ### toJSON() ```ts toJSON(): ListTranscriptionsResponse; ``` Returns the raw data for this list result **Returns** [`ListTranscriptionsResponse`](types#listtranscriptionsresponset)\<[`SonioxTranscriptionData`](types#sonioxtranscriptiondata)> ### Properties | Property | Type | Description | | ------------------ | ------------------------------------------------------- | ----------------------------------------------------------- | | `next_page_cursor` | `string` \| `null` | Pagination cursor for the next page. Null if no more pages. | | `transcriptions` | [`SonioxTranscription`](classes#sonioxtranscription)\[] | Transcriptions from the first page of results. | *** ## RealtimeSttSession Real-time Speech-to-Text session Provides WebSocket-based streaming transcription with support for: * Event-based and async iterator consumption * Pause/resume with automatic keepalive while paused * AbortSignal cancellation ### Example ```typescript const session = new RealtimeSttSession(apiKey, wsUrl, { model: 'stt-rt-v4' }); session.on('result', (result) => { console.log(result.tokens.map(t => t.text).join('')); }); await session.connect(); session.sendAudio(audioChunk); await session.finish(); ``` ### paused ```ts get paused(): boolean; ``` Whether the session is currently paused. **Returns** `boolean` *** ### state ```ts get state(): SttSessionState; ``` Current session state. **Returns** [`SttSessionState`](types#sttsessionstate) ### Constructor ```ts new RealtimeSttSession( apiKey, wsBaseUrl, config, options?): RealtimeSttSession; ``` **Parameters** | Parameter | Type | | ----------- | ---------------------------------------------- | | `apiKey` | `string` | | `wsBaseUrl` | `string` | | `config` | [`SttSessionConfig`](types#sttsessionconfig) | | `options?` | [`SttSessionOptions`](types#sttsessionoptions) | **Returns** `RealtimeSttSession` ### \[asyncIterator]\() ```ts asyncIterator: AsyncIterator; ``` Async iterator for consuming events. **Returns** `AsyncIterator`\<[`RealtimeEvent`](types#realtimeevent)> *** ### close() ```ts close(): void; ``` Close (cancel) the session immediately without waiting **Returns** `void` *** ### connect() ```ts connect(): Promise; ``` Connect to the Soniox WebSocket API. **Returns** `Promise`\<`void`> **Throws** [AbortError](classes#aborterror) If aborted **Throws** [ConnectionError](classes#connectionerror) If connection fails **Throws** [StateError](classes#stateerror) If already connected *** ### finalize() ```ts finalize(options?): void; ``` Requests the server to finalize current transcription **Parameters** | Parameter | Type | | ------------------------------ | -------------------------------------- | | `options?` | \{ `trailing_silence_ms?`: `number`; } | | `options.trailing_silence_ms?` | `number` | **Returns** `void` *** ### finish() ```ts finish(): Promise; ``` Gracefully finish the session **Returns** `Promise`\<`void`> *** ### keepAlive() ```ts keepAlive(): void; ``` Send a keepalive message **Returns** `void` *** ### off() ```ts off(event, handler): this; ``` Remove an event handler **Type Parameters** | Type Parameter | | ---------------------------------------------------------------- | | `E` *extends* keyof [`SttSessionEvents`](types#sttsessionevents) | **Parameters** | Parameter | Type | | --------- | -------------------------------------------------- | | `event` | `E` | | `handler` | [`SttSessionEvents`](types#sttsessionevents)\[`E`] | **Returns** `this` *** ### on() ```ts on(event, handler): this; ``` Register an event handler **Type Parameters** | Type Parameter | | ---------------------------------------------------------------- | | `E` *extends* keyof [`SttSessionEvents`](types#sttsessionevents) | **Parameters** | Parameter | Type | | --------- | -------------------------------------------------- | | `event` | `E` | | `handler` | [`SttSessionEvents`](types#sttsessionevents)\[`E`] | **Returns** `this` *** ### once() ```ts once(event, handler): this; ``` Register a one-time event handler **Type Parameters** | Type Parameter | | ---------------------------------------------------------------- | | `E` *extends* keyof [`SttSessionEvents`](types#sttsessionevents) | **Parameters** | Parameter | Type | | --------- | -------------------------------------------------- | | `event` | `E` | | `handler` | [`SttSessionEvents`](types#sttsessionevents)\[`E`] | **Returns** `this` *** ### pause() ```ts pause(): void; ``` Pause audio transmission and starts automatic keepalive messages **Returns** `void` *** ### resume() ```ts resume(): void; ``` Resume audio transmission **Returns** `void` *** ### sendAudio() ```ts sendAudio(data): void; ``` Send audio data to the server **Parameters** | Parameter | Type | Description | | --------- | ----------- | --------------------------------------- | | `data` | `AudioData` | Audio data as Uint8Array or ArrayBuffer | **Returns** `void` **Throws** [AbortError](classes#aborterror) If aborted **Throws** [StateError](classes#stateerror) If not connected *** ### sendStream() ```ts sendStream(stream, options?): Promise; ``` Stream audio data from an async iterable source. **Parameters** | Parameter | Type | Description | | ---------- | ---------------------------------------------- | ---------------------------------------- | | `stream` | `AsyncIterable`\<`AudioData`> | Async iterable yielding audio chunks | | `options?` | [`SendStreamOptions`](types#sendstreamoptions) | Optional pacing and auto-finish settings | **Returns** `Promise`\<`void`> **Throws** [AbortError](classes#aborterror) If aborted during streaming **Throws** [StateError](classes#stateerror) If not connected *** ## RealtimeSegmentBuffer Rolling buffer for turning real-time results into stable segments. ### size ```ts get size(): number; ``` Number of tokens currently buffered. **Returns** `number` ### Constructor ```ts new RealtimeSegmentBuffer(options?): RealtimeSegmentBuffer; ``` **Parameters** | Parameter | Type | | ---------- | -------------------------------------------------------------------- | | `options?` | [`RealtimeSegmentBufferOptions`](types#realtimesegmentbufferoptions) | **Returns** `RealtimeSegmentBuffer` ### add() ```ts add(result): RealtimeSegment[]; ``` Add a real-time result and return stable segments. **Parameters** | Parameter | Type | | --------- | ---------------------------------------- | | `result` | [`RealtimeResult`](types#realtimeresult) | **Returns** [`RealtimeSegment`](types#realtimesegment)\[] *** ### flushAll() ```ts flushAll(): RealtimeSegment[]; ``` Flush all buffered tokens into segments and clear the buffer. Includes tokens that are not yet stable by final\_audio\_proc\_ms. **Returns** [`RealtimeSegment`](types#realtimesegment)\[] *** ### reset() ```ts reset(): void; ``` Clear all buffered tokens. **Returns** `void` *** ## RealtimeUtteranceBuffer Collects real-time results into utterances for endpoint-driven workflows. ### Constructor ```ts new RealtimeUtteranceBuffer(options?): RealtimeUtteranceBuffer; ``` **Parameters** | Parameter | Type | | ---------- | ------------------------------------------------------------------------ | | `options?` | [`RealtimeUtteranceBufferOptions`](types#realtimeutterancebufferoptions) | **Returns** `RealtimeUtteranceBuffer` ### addResult() ```ts addResult(result): RealtimeSegment[]; ``` Add a real-time result and collect stable segments. **Parameters** | Parameter | Type | | --------- | ---------------------------------------- | | `result` | [`RealtimeResult`](types#realtimeresult) | **Returns** [`RealtimeSegment`](types#realtimesegment)\[] *** ### markEndpoint() ```ts markEndpoint(): RealtimeUtterance | undefined; ``` Mark an endpoint and flush the current utterance. **Returns** [`RealtimeUtterance`](types#realtimeutterance) | `undefined` *** ### reset() ```ts reset(): void; ``` Clear buffered segments and tokens. **Returns** `void` *** ## SonioxError ### Extends * `Error` ### Extended by * [`SonioxHttpError`](classes#sonioxhttperror) * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` ### Properties | Property | Type | Description | | ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## SonioxHttpError HTTP error class for all HTTP-related failures (REST API). Thrown when HTTP requests fail due to network issues, timeouts, server errors, or response parsing failures. ### Extends * [`SonioxError`](classes#sonioxerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Overrides** [`SonioxError`](classes#sonioxerror).[`toJSON`](classes#sonioxerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Overrides** [`SonioxError`](classes#sonioxerror).[`toString`](classes#sonioxerror-tostring) ### Properties | Property | Type | Description | | ------------ | -------------------------------------------- | ------------------------------------------------------------------------------------ | | `bodyText` | `string` \| `undefined` | Response body text, capped at 4KB (only for http\_error/parse\_error) | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`HttpErrorCode`](types#httperrorcode) | Categorized HTTP error code | | `headers` | `Record`\<`string`, `string`> \| `undefined` | Response headers (only for http\_error) | | `method` | [`HttpMethod`](types#httpmethod) | HTTP method | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | | `url` | `string` | Request URL | *** ## RealtimeError Base error class for all real-time (WebSocket) SDK errors ### Extends * [`SonioxError`](classes#sonioxerror) ### Extended by * [`AuthError`](classes#autherror) * [`BadRequestError`](classes#badrequesterror) * [`QuotaError`](classes#quotaerror) * [`ConnectionError`](classes#connectionerror) * [`NetworkError`](classes#networkerror) * [`AbortError`](classes#aborterror) * [`StateError`](classes#stateerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Overrides** [`SonioxError`](classes#sonioxerror).[`toJSON`](classes#sonioxerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Overrides** [`SonioxError`](classes#sonioxerror).[`toString`](classes#sonioxerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## AuthError Authentication error (401). Thrown when the API key is invalid or expired. ### Extends * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## BadRequestError Bad request error (400). Thrown for invalid configuration or parameters. ### Extends * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## QuotaError Quota error (402, 429). Thrown when rate limits are exceeded or quota is exhausted. ### Extends * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## ConnectionError Connection error. Thrown for WebSocket connection failures and transport errors. ### Extends * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## NetworkError Network error. Thrown for server-side network issues (408, 500, 503). ### Extends * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## AbortError Abort error. Thrown when an operation is cancelled via AbortSignal. ### Extends * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## StateError State error. Thrown when an operation is attempted in an invalid state. ### Extends * [`RealtimeError`](classes#realtimeerror) ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toJSON`](classes#realtimeerror-tojson) *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** [`RealtimeError`](classes#realtimeerror).[`toString`](classes#realtimeerror-tostring) ### Properties | Property | Type | Description | | ------------ | ---------------------------------------------- | -------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | [`RealtimeErrorCode`](types#realtimeerrorcode) | Real-time error code | | `raw` | `unknown` | Original response payload for debugging. Contains the raw WebSocket message that caused the error. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | # Full Node SDK reference URL: /stt/SDKs/node-SDK/reference Full SDK reference for the Node SDK ## Environment variables Environment variables are used to configure the client. You can set them in your environment or pass them explicitly to the client. | Variable | Description | Default | | --------------------------- | ---------------------------- | ---------------------------------------------- | | `SONIOX_API_KEY` | API key for REST requests | - | | `SONIOX_API_BASE_URL` | REST base URL | `https://api.soniox.com` | | `SONIOX_WS_URL` | Real-time WebSocket base URL | `wss://stt-rt.soniox.com/transcribe-websocket` | | `SONIOX_API_WEBHOOK_HEADER` | Webhook auth header name | - | | `SONIOX_API_WEBHOOK_SECRET` | Webhook auth header value | - | ## Client ### Available client methods | Method | Description | | ------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- | | [`client.files.upload(file, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-upload) | Upload file | | [`client.files.list(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-list) | List files | | [`client.files.get(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-get) | Get file | | [`client.files.delete(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete) | Delete file | | [`client.files.delete_all()`](/stt/SDKs/node-SDK/reference/classes#sonioxfilesapi-delete_all) | Delete all files | | --- | --- | | [`client.stt.create(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-create) | Create transcription | | [`client.stt.list(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-list) | List transcriptions | | [`client.stt.get(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-get) | Get transcription | | [`client.stt.getTranscript(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-gettranscript) | Get transcription transcript | | [`client.stt.delete(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete) | Delete transcription | | [`client.stt.destroy(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy) | Delete transcription and its file | | [`client.stt.wait(id)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-wait) | Wait for transcription to complete | | [`client.stt.transcribe(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribe) | Transcribe audio | | [`client.stt.transcribeFromUrl(url, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribefromurl) | Transcribe audio from URL | | [`client.stt.transcribeFromFile(file, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribefromfile) | Transcribe audio from file | | [`client.stt.transcribeFromFileId(id, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-transcribefromfileid) | Transcribe audio from file ID | | [`client.stt.delete_all()`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-delete_all) | Delete all transcriptions | | [`client.stt.destroy_all()`](/stt/SDKs/node-SDK/reference/classes#sonioxsttapi-destroy_all) | Delete all transcriptions and their files | | --- | --- | | [`client.models.list()`](/stt/SDKs/node-SDK/reference/classes#sonioxmodelsapi-list) | List available models | | --- | --- | | [`client.webhooks.handle(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handle) | Handle webhook | | [`client.webhooks.handleRequest(request, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlerequest) | Handle webhook with Fetch API | | [`client.webhooks.handleExpress(req, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handleexpress) | Handle webhook with Express | | [`client.webhooks.handleFastify(req, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlefastify) | Handle webhook with Fastify | | [`client.webhooks.handleNestJS(req, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlenestjs) | Handle webhook with NestJS | | [`client.webhooks.handleHono(c, options)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-handlehono) | Handle webhook with Hono | | [`client.webhooks.getAuthFromEnv()`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-getauthfromenv) | Get webhook auth from environment variables | | [`client.webhooks.verifyAuth(headers, auth)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-verifyauth) | Verify webhook auth | | [`client.webhooks.parseEvent(body)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-parseevent) | Parse webhook event | | [`client.webhooks.isEvent(body)`](/stt/SDKs/node-SDK/reference/classes#sonioxwebhooksapi-isevent) | Check if body is a webhook event | | --- | --- | | [`client.auth.createTemporaryKey(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxauthapi-createtemporarykey) | Create temporary API key | | --- | --- | | [`client.realtime.stt(options)`](/stt/SDKs/node-SDK/reference/classes#sonioxrealtimeapi-stt) | Create real-time session | ## File ### Available file instance methods | Method | Description | | ------------------------------------------------------------------------- | ----------- | | [`file.delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxfile-delete) | Delete file | ## Transcription ### Available transcription instance methods | Method | Description | | --------------------------------------------------------------------------------------------------------- | ---------------------------------- | | [`transcription.getTranscript()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-gettranscript) | Get transcription transcript | | [`transcription.delete()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-delete) | Delete transcription | | [`transcription.destroy()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-destroy) | Delete transcription and its file | | [`transcription.wait()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-wait) | Wait for transcription to complete | | [`transcription.refresh()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscription-refresh) | Refresh transcription | ## Transcript | Method | Description | | ----------------------------------------------------------------------------------------- | ----------------------- | | [`transcript.segments()`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscript-segments) | Get transcript segments | | [`transcript.text`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscript-text) | Transcript text | | [`transcript.tokens`](/stt/SDKs/node-SDK/reference/classes#sonioxtranscript-tokens) | Transcript tokens | ## Real-time STT Session | Method | Description | | ---------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ | | [`realtime.stt.connect()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-connect) | Establish websocket connection | | [`realtime.stt.close()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-close) | Close websocket connection | | [`realtime.stt.finalize()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-finalize) | Request server to finalize current transcription | | [`realtime.stt.finish()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-finish) | Gracefully finish the session | | [`realtime.stt.keepAlive()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-keepalive) | Send keepalive message | | [`realtime.stt.off()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-off) | Remove event handler | | [`realtime.stt.on()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-on) | Register event handler | | [`realtime.stt.once()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-once) | Register one-time event handler | | [`realtime.stt.sendAudio(audio)`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendaudio) | Send audio chunk | | [`realtime.stt.sendStream(stream, options)`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-sendstream) | Send audio stream | | [`realtime.stt.pause()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-pause) | Pause audio transmission | | [`realtime.stt.resume()`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-resume) | Resume audio transmission | | [`realtime.stt.paused`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-paused) | Whether the session is currently paused | | [`realtime.stt.state`](/stt/SDKs/node-SDK/reference/classes#realtimesttsession-state) | Current session state | # Types URL: /stt/SDKs/node-SDK/reference/types Soniox Node SDK — Types Reference ## AudioData ```ts type AudioData = Buffer | Uint8Array | ArrayBuffer; ``` Audio data types accepted by sendAudio. In Node.js, Buffer is also accepted since Buffer extends Uint8Array. *** ## AudioFormat ```ts type AudioFormat = | "pcm_s8" | "pcm_s8le" | "pcm_s8be" | "pcm_s16le" | "pcm_s16be" | "pcm_s24le" | "pcm_s24be" | "pcm_s32le" | "pcm_s32be" | "pcm_u8" | "pcm_u8le" | "pcm_u8be" | "pcm_u16le" | "pcm_u16be" | "pcm_u24le" | "pcm_u24be" | "pcm_u32le" | "pcm_u32be" | "pcm_f32le" | "pcm_f32be" | "pcm_f64le" | "pcm_f64be" | "mulaw" | "alaw" | "aac" | "aiff" | "amr" | "asf" | "wav" | "mp3" | "flac" | "ogg" | "webm"; ``` Supported audio formats for real-time transcription. *** ## CleanupTarget ```ts type CleanupTarget = "file" | "transcription"; ``` Resource types that can be cleaned up after transcription completes. * `'file'` - The uploaded file * `'transcription'` - The transcription record *** ## ContextGeneralEntry ```ts type ContextGeneralEntry = { key: string; value: string; }; ``` Key-value pair for general context information. **Properties** | Property | Type | Description | | -------- | -------- | ------------------------------------------------------------------------ | | `key` | `string` | The key describing the context type (e.g., "domain", "topic", "doctor"). | | `value` | `string` | The value for the context key. | *** ## ContextTranslationTerm ```ts type ContextTranslationTerm = { source: string; target: string; }; ``` Custom translation term mapping. **Properties** | Property | Type | Description | | -------- | -------- | ------------------------------------ | | `source` | `string` | The source term to translate. | | `target` | `string` | The target translation for the term. | *** ## CreateTranscriptionOptions ```ts type CreateTranscriptionOptions = { audio_url?: string; client_reference_id?: string; context?: TranscriptionContext; enable_language_identification?: boolean; enable_speaker_diarization?: boolean; file_id?: string; language_hints?: string[]; language_hints_strict?: boolean; model: string; translation?: TranslationConfig; webhook_auth_header_name?: string; webhook_auth_header_value?: string; webhook_url?: string; }; ``` Options for creating a transcription. **Properties** | Property | Type | Description | | --------------------------------- | ---------------------------------------------------- | ------------------------------------------------------------------------------------------------- | | `audio_url?` | `string` | URL of a publicly accessible audio file. **Max Length** 4096 | | `client_reference_id?` | `string` | Optional tracking identifier. **Max Length** 256 | | `context?` | [`TranscriptionContext`](types#transcriptioncontext) | Additional context to improve transcription accuracy and formatting of specialized terms. | | `enable_language_identification?` | `boolean` | Enable automatic language identification. | | `enable_speaker_diarization?` | `boolean` | Enable speaker diarization to identify different speakers. | | `file_id?` | `string` | ID of a previously uploaded file. **Format** uuid | | `language_hints?` | `string`\[] | Array of expected ISO language codes to bias recognition. | | `language_hints_strict?` | `boolean` | When true, model relies more heavily on language hints. | | `model` | `string` | Speech-to-text model to use. **Max Length** 32 | | `translation?` | [`TranslationConfig`](types#translationconfig) | Translation configuration. | | `webhook_auth_header_name?` | `string` | Name of the authentication header sent with webhook notifications. **Max Length** 256 | | `webhook_auth_header_value?` | `string` | Authentication header value sent with webhook notifications. **Max Length** 256 | | `webhook_url?` | `string` | URL to receive webhook notifications when transcription is completed or fails. **Max Length** 256 | *** ## DeleteAllFilesOptions ```ts type DeleteAllFilesOptions = { signal?: AbortSignal; }; ``` Options for purging all files. **Properties** | Property | Type | Description | | --------- | ------------- | ----------------------------------------------------- | | `signal?` | `AbortSignal` | AbortSignal for cancelling the delete\_all operation. | *** ## DeleteAllTranscriptionsOptions ```ts type DeleteAllTranscriptionsOptions = { on_progress?: (transcription, index) => void; signal?: AbortSignal; }; ``` Options for deleting all transcriptions. **Properties** | Property | Type | Description | | -------------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------- | | `on_progress?` | (`transcription`, `index`) => `void` | Callback invoked before each transcription is deleted. Receives the transcription data and its 0-based index. | | `signal?` | `AbortSignal` | AbortSignal for cancelling the delete\_all operation. | *** ## ExpressLikeRequest ```ts type ExpressLikeRequest = { body?: unknown; headers: Record; method: string; }; ``` Express/Connect-style request object **Properties** | Property | Type | | --------- | ----------------------------------------------------------- | | `body?` | `unknown` | | `headers` | `Record`\<`string`, `string` \| `string`\[] \| `undefined`> | | `method` | `string` | *** ## FastifyLikeRequest ```ts type FastifyLikeRequest = { body?: unknown; headers: Record; method: string; }; ``` Fastify-style request object **Properties** | Property | Type | | --------- | ----------------------------------------------------------- | | `body?` | `unknown` | | `headers` | `Record`\<`string`, `string` \| `string`\[] \| `undefined`> | | `method` | `string` | *** ## FileIdentifier ```ts type FileIdentifier = | string | { id: string; }; ``` File identifier - either a string ID or an object with an id property. *** ## HandleWebhookOptions ```ts type HandleWebhookOptions = { auth?: WebhookAuthConfig; body: unknown; headers: WebhookHeaders; method: string; }; ``` Options for the handleWebhook function **Properties** | Property | Type | Description | | --------- | ---------------------------------------------- | ---------------------------------------- | | `auth?` | [`WebhookAuthConfig`](types#webhookauthconfig) | Optional authentication configuration | | `body` | `unknown` | Request body (parsed JSON or raw string) | | `headers` | [`WebhookHeaders`](types#webhookheaders) | Request headers | | `method` | `string` | HTTP method of the request | *** ## HonoLikeContext ```ts type HonoLikeContext = { req: { method: string; header: string | undefined; json: Promise; }; }; ``` Hono context object **Properties** | Property | Type | | ------------ | ------------------------------------------------------------------------------------------ | | `req` | \{ `method`: `string`; `header`: `string` \| `undefined`; `json`: `Promise`\<`unknown`>; } | | `req.method` | `string` | | `req.header` | `string` \| `undefined` | | `req.json` | `Promise`\<`unknown`> | *** ## HttpErrorCode ```ts type HttpErrorCode = "network_error" | "timeout" | "aborted" | "http_error" | "parse_error"; ``` Error codes for HTTP client errors *** ## HttpMethod ```ts type HttpMethod = "GET" | "POST" | "PUT" | "PATCH" | "DELETE" | "HEAD"; ``` HTTP methods supported by the client *** ## HttpRequestBody ```ts type HttpRequestBody = | string | Record | ArrayBuffer | Uint8Array | FormData | null; ``` Request body types *** ## HttpResponseType ```ts type HttpResponseType = "json" | "text" | "arrayBuffer"; ``` Response types *** ## ListFilesOptions ```ts type ListFilesOptions = { cursor?: string; limit?: number; signal?: AbortSignal; }; ``` Options for listing files. **Properties** | Property | Type | Description | | --------- | ------------- | ------------------------------------------------------------------------------------ | | `cursor?` | `string` | Pagination cursor for the next page of results. | | `limit?` | `number` | Maximum number of files to return. **Default** `1000` **Minimum** 1 **Maximum** 1000 | | `signal?` | `AbortSignal` | AbortSignal for cancelling the request | *** ## ListFilesResponse\ ```ts type ListFilesResponse = { files: T[]; next_page_cursor: string | null; }; ``` Response from listing files. **Type Parameters** | Type Parameter | | -------------- | | `T` | **Properties** | Property | Type | Description | | ------------------ | ------------------ | ------------------------------------------------------------------------------------------------------------ | | `files` | `T`\[] | List of uploaded files. | | `next_page_cursor` | `string` \| `null` | A pagination token that references the next page of results. When null, no additional results are available. | *** ## ListTranscriptionsOptions ```ts type ListTranscriptionsOptions = { cursor?: string; limit?: number; }; ``` Options for listing transcriptions **Properties** | Property | Type | Description | | --------- | -------- | --------------------------------------------------------------------------------------------- | | `cursor?` | `string` | Pagination cursor for the next page of results | | `limit?` | `number` | Maximum number of transcriptions to return. **Default** `1000` **Minimum** 1 **Maximum** 1000 | *** ## ListTranscriptionsResponse\ ```ts type ListTranscriptionsResponse = { next_page_cursor: string | null; transcriptions: T[]; }; ``` Response from listing transcriptions. **Type Parameters** | Type Parameter | | -------------- | | `T` | **Properties** | Property | Type | Description | | ------------------ | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------ | | `next_page_cursor` | `string` \| `null` | A pagination token that references the next page of results. When null, no additional results are available. TODO: potentially can be undefined? | | `transcriptions` | `T`\[] | List of transcriptions. | *** ## NestJSLikeRequest ```ts type NestJSLikeRequest = { body?: unknown; headers: Record; method: string; }; ``` NestJS-style request object (uses Express under the hood by default) **Properties** | Property | Type | | --------- | ----------------------------------------------------------- | | `body?` | `unknown` | | `headers` | `Record`\<`string`, `string` \| `string`\[] \| `undefined`> | | `method` | `string` | *** ## OneWayTranslationConfig ```ts type OneWayTranslationConfig = { target_language: string; type: "one_way"; }; ``` One-way translation configuration. Translates all spoken languages into a single target language. **Properties** | Property | Type | Description | | ----------------- | ----------- | -------------------------------------------------------------- | | `target_language` | `string` | Target language code for translation (e.g., "fr", "es", "de"). | | `type` | `"one_way"` | Translation type. | *** ## QueryParams ```ts type QueryParams = Record; ``` Query parameters *** ## RealtimeClientOptions ```ts type RealtimeClientOptions = { api_key: string; default_session_options?: SttSessionOptions; ws_base_url: string; }; ``` Real-time API configuration options for the client. **Properties** | Property | Type | Description | | -------------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------- | | `api_key` | `string` | API key for real-time sessions. | | `default_session_options?` | [`SttSessionOptions`](types#sttsessionoptions) | Default session options applied to all real-time sessions. Can be overridden per-session. | | `ws_base_url` | `string` | WebSocket base URL for real-time connections. **Default** `'wss://stt-rt.soniox.com/transcribe-websocket'` | *** ## RealtimeErrorCode ```ts type RealtimeErrorCode = | "auth_error" | "bad_request" | "quota_exceeded" | "connection_error" | "network_error" | "aborted" | "state_error" | "realtime_error"; ``` Error codes for Real-time (WebSocket) API errors *** ## RealtimeEvent ```ts type RealtimeEvent = | { data: RealtimeResult; kind: "result"; } | { kind: "endpoint"; } | { kind: "finalized"; } | { kind: "finished"; }; ``` Typed event for async iterator consumption. *** ## RealtimeOptions ```ts type RealtimeOptions = { default_session_options?: SttSessionOptions; ws_base_url?: string; }; ``` Real-time configuration options for the main client. **Properties** | Property | Type | Description | | -------------------------- | ---------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | `default_session_options?` | [`SttSessionOptions`](types#sttsessionoptions) | Default session options applied to all real-time sessions. Can be overridden per-session. | | `ws_base_url?` | `string` | WebSocket base URL for real-time connections. Falls back to SONIOX\_WS\_URL environment variable, then to 'wss\://stt-rt.soniox.com/transcribe-websocket'. | *** ## RealtimeResult ```ts type RealtimeResult = { final_audio_proc_ms: number; finished?: boolean; tokens: RealtimeToken[]; total_audio_proc_ms: number; }; ``` A result message from the real-time WebSocket. **Properties** | Property | Type | Description | | --------------------- | ----------------------------------------- | -------------------------------------------------- | | `final_audio_proc_ms` | `number` | Milliseconds of audio that have been finalized. | | `finished?` | `boolean` | Whether this is the final result (session ending). | | `tokens` | [`RealtimeToken`](types#realtimetoken)\[] | Tokens in this result. | | `total_audio_proc_ms` | `number` | Total milliseconds of audio processed. | *** ## RealtimeSegment ```ts type RealtimeSegment = { end_ms?: number; language?: string; speaker?: string; start_ms?: number; text: string; tokens: RealtimeToken[]; }; ``` A segment of contiguous real-time tokens grouped by speaker/language. **Properties** | Property | Type | Description | | ----------- | ----------------------------------------- | ------------------------------------------------------------- | | `end_ms?` | `number` | End time of the segment in milliseconds (from last token). | | `language?` | `string` | Detected language code (if language identification enabled). | | `speaker?` | `string` | Speaker identifier (if diarization enabled). | | `start_ms?` | `number` | Start time of the segment in milliseconds (from first token). | | `text` | `string` | Concatenated text of all tokens in this segment. | | `tokens` | [`RealtimeToken`](types#realtimetoken)\[] | Original tokens in this segment. | *** ## RealtimeSegmentBufferOptions ```ts type RealtimeSegmentBufferOptions = { final_only?: boolean; group_by?: SegmentGroupKey[]; max_ms?: number; max_tokens?: number; }; ``` Options for rolling real-time segmentation buffers. **Properties** | Property | Type | Description | | ------------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | | `final_only?` | `boolean` | When true, only tokens marked as final are buffered. **Default** `true` | | `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` | | `max_ms?` | `number` | Maximum time window to keep in milliseconds (requires token timings). | | `max_tokens?` | `number` | Maximum number of tokens to keep in the buffer. **Default** `2000` | *** ## RealtimeSegmentOptions ```ts type RealtimeSegmentOptions = { final_only?: boolean; group_by?: SegmentGroupKey[]; }; ``` Options for segmenting real-time tokens. **Properties** | Property | Type | Description | | ------------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | | `final_only?` | `boolean` | When true, only tokens marked as final are included. **Default** `false` | | `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` | *** ## RealtimeToken ```ts type RealtimeToken = { confidence: number; end_ms?: number; is_final: boolean; language?: string; source_language?: string; speaker?: string; start_ms?: number; text: string; translation_status?: "none" | "original" | "translation"; }; ``` A single token from the real-time transcription. **Properties** | Property | Type | Description | | --------------------- | ------------------------------------------- | ------------------------------------------------------------ | | `confidence` | `number` | Confidence score (0.0 to 1.0). | | `end_ms?` | `number` | End time in milliseconds relative to audio start. | | `is_final` | `boolean` | Whether this is a finalized token. | | `language?` | `string` | Detected language code (if language identification enabled). | | `source_language?` | `string` | Source language for translated tokens. | | `speaker?` | `string` | Speaker identifier (if diarization enabled). | | `start_ms?` | `number` | Start time in milliseconds relative to audio start. | | `text` | `string` | The transcribed text. | | `translation_status?` | `"none"` \| `"original"` \| `"translation"` | Translation status of this token. | *** ## RealtimeUtterance ```ts type RealtimeUtterance = { end_ms?: number; final_audio_proc_ms?: number; language?: string; segments: RealtimeSegment[]; speaker?: string; start_ms?: number; text: string; tokens: RealtimeToken[]; total_audio_proc_ms?: number; }; ``` A single utterance built from real-time segments. **Properties** | Property | Type | Description | | ---------------------- | --------------------------------------------- | ----------------------------------------------------------------- | | `end_ms?` | `number` | End time of the utterance in milliseconds (from last segment). | | `final_audio_proc_ms?` | `number` | Milliseconds of audio that have been finalized at flush time. | | `language?` | `string` | Detected language code when consistent across segments. | | `segments` | [`RealtimeSegment`](types#realtimesegment)\[] | Segments included in this utterance. | | `speaker?` | `string` | Speaker identifier when consistent across segments. | | `start_ms?` | `number` | Start time of the utterance in milliseconds (from first segment). | | `text` | `string` | Concatenated text of all segments in this utterance. | | `tokens` | [`RealtimeToken`](types#realtimetoken)\[] | Tokens included in this utterance. | | `total_audio_proc_ms?` | `number` | Total milliseconds of audio processed at flush time. | *** ## RealtimeUtteranceBufferOptions ```ts type RealtimeUtteranceBufferOptions = { final_only?: boolean; group_by?: SegmentGroupKey[]; max_ms?: number; max_tokens?: number; }; ``` Options for buffering real-time utterances. **Properties** | Property | Type | Description | | ------------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | | `final_only?` | `boolean` | When true, only tokens marked as final are buffered. **Default** `true` | | `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` | | `max_ms?` | `number` | Maximum time window to keep in milliseconds (requires token timings). | | `max_tokens?` | `number` | Maximum number of tokens to keep in the buffer. **Default** `2000` | *** ## SegmentGroupKey ```ts type SegmentGroupKey = "speaker" | "language"; ``` Fields that can be used to group tokens into segments *** ## SegmentTranscriptOptions ```ts type SegmentTranscriptOptions = { group_by?: SegmentGroupKey[]; }; ``` Options for segmenting a transcript **Properties** | Property | Type | Description | | ----------- | --------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | | `group_by?` | [`SegmentGroupKey`](types#segmentgroupkey)\[] | Fields to group by. A new segment starts when any of these fields changes **Default** `['speaker', 'language']` | *** ## SendStreamOptions ```ts type SendStreamOptions = { finish?: boolean; pace_ms?: number; }; ``` Options for streaming audio from an async iterable source. **Properties** | Property | Type | Description | | ---------- | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- | | `finish?` | `boolean` | When true, calls finish() automatically after the stream ends. **Default** `false` | | `pace_ms?` | `number` | Delay in milliseconds between sending chunks. Useful for simulating real-time pace when streaming pre-recorded files. Not needed for live audio sources. | *** ## SonioxErrorCode ```ts type SonioxErrorCode = | RealtimeErrorCode | "soniox_error" | HttpErrorCode; ``` All possible SDK error codes (core real-time + HTTP-specific codes) *** ## SonioxFileData ```ts type SonioxFileData = { client_reference_id?: string | null; created_at: string; filename: string; id: string; size: number; }; ``` Raw file metadata from the API. **Properties** | Property | Type | Description | | ---------------------- | ------------------ | ------------------------------------------------------------------------- | | `client_reference_id?` | `string` \| `null` | Optional tracking identifier string. | | `created_at` | `string` | UTC timestamp indicating when the file was uploaded. **Format** date-time | | `filename` | `string` | Name of the file. | | `id` | `string` | Unique identifier of the file. **Format** uuid | | `size` | `number` | Size of the file in bytes. | *** ## SonioxLanguage ```ts type SonioxLanguage = { code: string; name: string; }; ``` **Properties** | Property | Type | Description | | -------- | -------- | ----------------------- | | `code` | `string` | 2-letter language code. | | `name` | `string` | Language name. | *** ## SonioxModel ```ts type SonioxModel = { aliased_model_id: string | null; context_version: number | null; id: string; languages: SonioxLanguage[]; name: string; one_way_translation: string | null; supports_language_hints_strict: boolean; supports_max_endpoint_delay: boolean; transcription_mode: SonioxTranscriptionMode; translation_targets: SonioxTranslationTarget[]; two_way_translation: string | null; two_way_translation_pairs: string[]; }; ``` **Properties** | Property | Type | Description | | -------------------------------- | ------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | `aliased_model_id` | `string` \| `null` | If this is an alias, the id of the aliased model. Null for non-alias models. | | `context_version` | `number` \| `null` | Version of context supported. | | `id` | `string` | Unique identifier of the model. | | `languages` | [`SonioxLanguage`](types#sonioxlanguage)\[] | List of languages supported by the model. | | `name` | `string` | Name of the model. | | `one_way_translation` | `string` \| `null` | When contains string 'all\_languages', any laguage from languages can be used | | `supports_language_hints_strict` | `boolean` | TODO: Add documentation | | `supports_max_endpoint_delay` | `boolean` | - | | `transcription_mode` | [`SonioxTranscriptionMode`](types#sonioxtranscriptionmode) | Transcription mode of the model. | | `translation_targets` | [`SonioxTranslationTarget`](types#sonioxtranslationtarget)\[] | List of supported one-way translation targets. If list is empty, check for one\_way\_translation field | | `two_way_translation` | `string` \| `null` | When contains string 'all\_languages',' any laguage pair from languages can be used | | `two_way_translation_pairs` | `string`\[] | List of supported two-way translation pairs. If list is empty, check for two\_way\_translation field | *** ## SonioxNodeClientOptions ```ts type SonioxNodeClientOptions = { api_key?: string; base_url?: string; http_client?: HttpClient; realtime?: RealtimeOptions; }; ``` **Properties** | Property | Type | Description | | -------------- | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------- | | `api_key?` | `string` | API key for authentication. Falls back to SONIOX\_API\_KEY environment variable if not provided. | | `base_url?` | `string` | Base URL for the REST API. Falls back to SONIOX\_API\_BASE\_URL environment variable, then to '[https://api.soniox.com](https://api.soniox.com)'. | | `http_client?` | [`HttpClient`](types#httpclient) | Custom HTTP client implementation. | | `realtime?` | [`RealtimeOptions`](types#realtimeoptions) | Real-time API configuration options. | *** ## SonioxTranscriptionData ```ts type SonioxTranscriptionData = { audio_duration_ms?: number | null; audio_url?: string | null; client_reference_id?: string | null; context?: TranscriptionContext | null; created_at: string; enable_language_identification: boolean; enable_speaker_diarization: boolean; error_message?: string | null; error_type?: string | null; file_id?: string | null; filename: string; id: string; language_hints?: string[] | null; model: string; status: TranscriptionStatus; webhook_auth_header_name?: string | null; webhook_auth_header_value?: string | null; webhook_status_code?: number | null; webhook_url?: string | null; }; ``` Raw transcription metadata from the API. **Properties** | Property | Type | Description | | -------------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | | `audio_duration_ms?` | `number` \| `null` | Duration of the audio in milliseconds. Only available after processing begins. | | `audio_url?` | `string` \| `null` | URL of the audio file being transcribed. | | `client_reference_id?` | `string` \| `null` | Optional tracking identifier. **Max Length** 256 | | `context?` | [`TranscriptionContext`](types#transcriptioncontext) \| `null` | Additional context provided for the transcription. | | `created_at` | `string` | UTC timestamp when the transcription was created. **Format** date-time | | `enable_language_identification` | `boolean` | When true, language is detected for each part of the transcription. | | `enable_speaker_diarization` | `boolean` | When true, speakers are identified and separated in the transcription output. | | `error_message?` | `string` \| `null` | Error message if transcription failed. Null for successful or in-progress transcriptions. | | `error_type?` | `string` \| `null` | Error type if transcription failed. Null for successful or in-progress transcriptions. | | `file_id?` | `string` \| `null` | ID of the uploaded file being transcribed. **Format** uuid | | `filename` | `string` | Name of the file being transcribed. | | `id` | `string` | Unique identifier of the transcription. **Format** uuid | | `language_hints?` | `string`\[] \| `null` | Expected languages in the audio. If not specified, languages are automatically detected. | | `model` | `string` | Speech-to-text model used. | | `status` | [`TranscriptionStatus`](types#transcriptionstatus) | Current status of the transcription. | | `webhook_auth_header_name?` | `string` \| `null` | Name of the authentication header sent with webhook notifications. | | `webhook_auth_header_value?` | `string` \| `null` | Authentication header value. Always returned masked. | | `webhook_status_code?` | `number` \| `null` | HTTP status code received from your server when webhook was delivered. Null if not yet sent. | | `webhook_url?` | `string` \| `null` | URL to receive webhook notifications when transcription is completed or fails. | *** ## SonioxTranscriptionMode ```ts type SonioxTranscriptionMode = "real_time" | "async"; ``` Transcription mode of the model. *** ## SonioxTranslationTarget ```ts type SonioxTranslationTarget = { exclude_source_languages: string[]; source_languages: string[]; target_language: string; }; ``` **Properties** | Property | Type | | -------------------------- | ----------- | | `exclude_source_languages` | `string`\[] | | `source_languages` | `string`\[] | | `target_language` | `string` | *** ## SttSessionConfig ```ts type SttSessionConfig = { audio_format?: "auto" | AudioFormat; client_reference_id?: string; context?: TranscriptionContext; enable_endpoint_detection?: boolean; enable_language_identification?: boolean; enable_speaker_diarization?: boolean; language_hints?: string[]; language_hints_strict?: boolean; max_endpoint_delay_ms?: number; model: string; num_channels?: number; sample_rate?: number; translation?: TranslationConfig; }; ``` Configuration sent to the Soniox WebSocket API when starting a session. **Properties** | Property | Type | Description | | --------------------------------- | ---------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | `audio_format?` | `"auto"` \| [`AudioFormat`](types#audioformat) | Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample\_rate and num\_channels. **Default** `'auto'` | | `client_reference_id?` | `string` | Optional tracking identifier (max 256 chars). | | `context?` | [`TranscriptionContext`](types#transcriptioncontext) | Additional context to improve transcription accuracy. | | `enable_endpoint_detection?` | `boolean` | Enable endpoint detection for utterance boundaries. Useful for voice AI agents. | | `enable_language_identification?` | `boolean` | Enable automatic language detection. | | `enable_speaker_diarization?` | `boolean` | Enable speaker identification. | | `language_hints?` | `string`\[] | Expected languages in the audio (ISO language codes). | | `language_hints_strict?` | `boolean` | When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee. | | `max_endpoint_delay_ms?` | `number` | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms | | `model` | `string` | Speech-to-text model to use. | | `num_channels?` | `number` | Number of audio channels (required for raw audio formats). | | `sample_rate?` | `number` | Sample rate in Hz (required for PCM formats). | | `translation?` | [`TranslationConfig`](types#translationconfig) | Translation configuration. | *** ## SttSessionEvents ```ts type SttSessionEvents = { connected: () => void; disconnected: (reason?) => void; endpoint: () => void; error: (error) => void; finalized: () => void; finished: () => void; result: (result) => void; state_change: (update) => void; token: (token) => void; }; ``` Event handlers for the STT session. **Properties** | Property | Type | Description | | -------------- | --------------------- | ------------------------------------------------- | | `connected` | () => `void` | Session connected and ready. | | `disconnected` | (`reason?`) => `void` | Session disconnected. | | `endpoint` | () => `void` | Endpoint detected (\ token). | | `error` | (`error`) => `void` | Error occurred. | | `finalized` | () => `void` | Finalization complete (\ token). | | `finished` | () => `void` | Session finished (server signaled end of stream). | | `result` | (`result`) => `void` | Parsed result received. | | `state_change` | (`update`) => `void` | Session state transition. | | `token` | (`token`) => `void` | Individual token received. | *** ## SttSessionOptions ```ts type SttSessionOptions = { keepalive_interval_ms?: number; signal?: AbortSignal; }; ``` SDK-level session options (not sent to the server). **Properties** | Property | Type | Description | | ------------------------ | ------------- | --------------------------------------------------------------------------------------- | | `keepalive_interval_ms?` | `number` | Interval for sending keepalive messages while paused (milliseconds). **Default** `5000` | | `signal?` | `AbortSignal` | AbortSignal for cancellation. | *** ## SttSessionState ```ts type SttSessionState = | "idle" | "connecting" | "connected" | "finishing" | "finished" | "canceled" | "closed" | "error"; ``` Session lifecycle states. *** ## TemporaryApiKeyRequest ```ts type TemporaryApiKeyRequest = { client_reference_id?: string; expires_in_seconds: number; usage_type: TemporaryApiKeyUsageType; }; ``` **Properties** | Property | Type | Description | | ---------------------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------- | | `client_reference_id?` | `string` | Optional tracking identifier string. Does not need to be unique **Max Length** 256 | | `expires_in_seconds` | `number` | Duration in seconds until the temporary API key expires **Minimum** 1 **Maximum** 3600 | | `usage_type` | [`TemporaryApiKeyUsageType`](types#temporaryapikeyusagetype) | Intended usage of the temporary API key. | *** ## TemporaryApiKeyResponse ```ts type TemporaryApiKeyResponse = { api_key: string; expires_at: string; }; ``` **Properties** | Property | Type | Description | | ------------ | -------- | ------------------------------------------------------------------------------------------ | | `api_key` | `string` | Created temporary API key. | | `expires_at` | `string` | UTC timestamp indicating when generated temporary API key will expire **Format** date-time | *** ## TemporaryApiKeyUsageType ```ts type TemporaryApiKeyUsageType = "transcribe_websocket"; ``` *** ## TranscribeBaseOptions ```ts type TranscribeBaseOptions = { cleanup?: CleanupTarget[]; client_reference_id?: string; context?: TranscriptionContext; enable_language_identification?: boolean; enable_speaker_diarization?: boolean; fetch_transcript?: boolean; language_hints?: string[]; language_hints_strict?: boolean; model: string; signal?: AbortSignal; timeout_ms?: number; translation?: TranslationConfig; wait?: boolean; wait_options?: WaitOptions; webhook_auth_header_name?: string; webhook_auth_header_value?: string; webhook_query?: string | URLSearchParams | Record; webhook_url?: string; }; ``` Base options shared by all audio source variants. **Properties** | Property | Type | Description | | --------------------------------- | -------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cleanup?` | [`CleanupTarget`](types#cleanuptarget)\[] | Resources to clean up after transcription completes or on error/timeout. Only applies when `wait: true`. Cleanup runs in all cases when `wait: true`: - After successful completion - After transcription errors (status: 'error') - On timeout or abort This ensures no orphaned resources are left behind. **Example** `// Delete only the uploaded file cleanup: ['file'] // Delete only the transcription record cleanup: ['transcription'] // Delete both file and transcription cleanup: ['file', 'transcription']` | | `client_reference_id?` | `string` | Optional tracking identifier. **Max Length** 256 | | `context?` | [`TranscriptionContext`](types#transcriptioncontext) | Additional context to improve transcription accuracy and formatting of specialized terms. | | `enable_language_identification?` | `boolean` | Enable automatic language identification. | | `enable_speaker_diarization?` | `boolean` | Enable speaker diarization to identify different speakers. | | `fetch_transcript?` | `boolean` | When true (default), fetches the transcript and attaches it to the result when wait=true and the transcription completes successfully. Set to false to skip fetching the full transcript payload. **Default** `true` | | `language_hints?` | `string`\[] | Array of expected ISO language codes to bias recognition. | | `language_hints_strict?` | `boolean` | When true, model relies more heavily on language hints. | | `model` | `string` | Speech-to-text model to use. **Max Length** 32 | | `signal?` | `AbortSignal` | AbortSignal to cancel the operation | | `timeout_ms?` | `number` | Timeout in milliseconds | | `translation?` | [`TranslationConfig`](types#translationconfig) | Translation configuration. | | `wait?` | `boolean` | When true, waits for transcription to complete before returning. **Default** `false` | | `wait_options?` | [`WaitOptions`](types#waitoptions) | Options for waiting (only used when wait=true). | | `webhook_auth_header_name?` | `string` | Name of the authentication header sent with webhook notifications. **Max Length** 256 | | `webhook_auth_header_value?` | `string` | Authentication header value sent with webhook notifications. **Max Length** 256 | | `webhook_query?` | `string` \| `URLSearchParams` \| `Record`\<`string`, `string`> | Query parameters to append to the webhook URL. Useful for encoding metadata like transcription ID in the webhook callback. Can be a string, URLSearchParams, or Record\. | | `webhook_url?` | `string` | URL to receive webhook notifications when transcription is completed or fails. **Max Length** 256 | *** ## TranscribeFromFile ```ts type TranscribeFromFile = TranscribeBaseOptions & { audio_url?: never; file: UploadFileInput; file_id?: never; filename?: string; }; ``` Transcribe from a direct file upload (Buffer, Uint8Array, Blob, or ReadableStream) **Type Declaration** | Name | Type | Description | | ------------ | ------------------------------------------ | ----------------------------------- | | `audio_url?` | `never` | - | | `file` | [`UploadFileInput`](types#uploadfileinput) | File data to upload and transcribe. | | `file_id?` | `never` | - | | `filename?` | `string` | - | *** ## TranscribeFromFileId ```ts type TranscribeFromFileId = TranscribeBaseOptions & { audio_url?: never; file?: never; file_id: string; filename?: never; }; ``` Transcribe from a previously uploaded file **Type Declaration** | Name | Type | Description | | ------------ | -------- | ------------------------------------------------- | | `audio_url?` | `never` | - | | `file?` | `never` | - | | `file_id` | `string` | ID of a previously uploaded file. **Format** uuid | | `filename?` | `never` | - | *** ## TranscribeFromFileIdOptions ```ts type TranscribeFromFileIdOptions = Omit; ``` Options for transcribing from an uploaded file ID via `transcribeFromFileId`. *** ## TranscribeFromFileOptions ```ts type TranscribeFromFileOptions = Omit; ``` Options for transcribing from a file via `transcribeFromFile`. *** ## TranscribeFromUrl ```ts type TranscribeFromUrl = TranscribeBaseOptions & { audio_url: string; file?: never; file_id?: never; filename?: never; }; ``` Transcribe from a publicly accessible audio URL **Type Declaration** | Name | Type | Description | | ----------- | -------- | ------------------------------------------------------------ | | `audio_url` | `string` | URL of a publicly accessible audio file. **Max Length** 4096 | | `file?` | `never` | - | | `file_id?` | `never` | - | | `filename?` | `never` | - | *** ## TranscribeFromUrlOptions ```ts type TranscribeFromUrlOptions = Omit; ``` Options for transcribing from a URL via `transcribeFromUrl`. *** ## TranscribeOptions ```ts type TranscribeOptions = | TranscribeFromFile | TranscribeFromFileId | TranscribeFromUrl; ``` Options for the unified transcribe method Exactly one audio source must be provided: `file`, `file_id`, or `audio_url` *** ## TranscriptResponse ```ts type TranscriptResponse = { id: string; text: string; tokens: TranscriptToken[]; }; ``` Response from getting a transcription transcript. **Properties** | Property | Type | Description | | -------- | --------------------------------------------- | ---------------------------------------------------------------------------------- | | `id` | `string` | Unique identifier of the transcription this transcript belongs to. **Format** uuid | | `text` | `string` | Complete transcribed text content. | | `tokens` | [`TranscriptToken`](types#transcripttoken)\[] | List of detailed token information with timestamps and metadata. | *** ## TranscriptSegment ```ts type TranscriptSegment = { end_ms: number; language?: string; speaker?: string; start_ms: number; text: string; tokens: TranscriptToken[]; }; ``` A segment of contiguous tokens grouped by speaker and language **Properties** | Property | Type | Description | | ----------- | --------------------------------------------- | ---------------------------------------------------------------- | | `end_ms` | `number` | End time of the segment in milliseconds (from last token). | | `language?` | `string` | Detected language code (if language identification was enabled). | | `speaker?` | `string` | Speaker identifier (if speaker diarization was enabled). | | `start_ms` | `number` | Start time of the segment in milliseconds (from first token). | | `text` | `string` | Concatenated text of all tokens in this segment. | | `tokens` | [`TranscriptToken`](types#transcripttoken)\[] | Original tokens in this segment. | *** ## TranscriptToken ```ts type TranscriptToken = { confidence: number; end_ms: number; is_audio_event?: boolean | null; language?: string | null; speaker?: string | null; start_ms: number; text: string; translation_status?: "none" | "original" | "translation" | null; }; ``` A single token from the transcript with timing and confidence information. **Properties** | Property | Type | Description | | --------------------- | ----------------------------------------------------- | ---------------------------------------------------------------- | | `confidence` | `number` | Confidence score for this token (0.0 to 1.0). | | `end_ms` | `number` | End time of the token in milliseconds. | | `is_audio_event?` | `boolean` \| `null` | Whether this token represents an audio event. | | `language?` | `string` \| `null` | Detected language code (if language identification was enabled). | | `speaker?` | `string` \| `null` | Speaker identifier (if speaker diarization was enabled). | | `start_ms` | `number` | Start time of the token in milliseconds. | | `text` | `string` | The text content of this token. | | `translation_status?` | `"none"` \| `"original"` \| `"translation"` \| `null` | Translation status for this token. | *** ## TranscriptionContext ```ts type TranscriptionContext = { general?: ContextGeneralEntry[]; terms?: string[]; text?: string; translation_terms?: ContextTranslationTerm[]; }; ``` Additional context to improve transcription and translation accuracy. All sections are optional - include only what's relevant for your use case. **Properties** | Property | Type | Description | | -------------------- | ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------- | | `general?` | [`ContextGeneralEntry`](types#contextgeneralentry)\[] | Structured key-value pairs describing domain, topic, intent, participant names, etc. | | `terms?` | `string`\[] | Domain-specific or uncommon words to recognize. | | `text?` | `string` | Longer free-form background text, prior interaction history, reference documents, or meeting notes. | | `translation_terms?` | [`ContextTranslationTerm`](types#contexttranslationterm)\[] | Custom translations for ambiguous terms. | *** ## TranscriptionIdentifier ```ts type TranscriptionIdentifier = | string | { id: string; }; ``` Transcription identifier - either a string ID or an object with an id property. *** ## TranscriptionStatus ```ts type TranscriptionStatus = "queued" | "processing" | "completed" | "error"; ``` Status of a transcription request. *** ## TranslationConfig ```ts type TranslationConfig = | OneWayTranslationConfig | TwoWayTranslationConfig; ``` Translation configuration. *** ## TwoWayTranslationConfig ```ts type TwoWayTranslationConfig = { language_a: string; language_b: string; type: "two_way"; }; ``` Two-way translation configuration. Translates between two specified languages. **Properties** | Property | Type | Description | | ------------ | ----------- | --------------------- | | `language_a` | `string` | First language code. | | `language_b` | `string` | Second language code. | | `type` | `"two_way"` | Translation type. | *** ## UploadFileInput ```ts type UploadFileInput = | Buffer | Uint8Array | Blob | ReadableStream | NodeJS.ReadableStream; ``` Supported input types for file upload *** ## UploadFileOptions ```ts type UploadFileOptions = { client_reference_id?: string; filename?: string; signal?: AbortSignal; timeout_ms?: number; }; ``` Options for uploading a file **Properties** | Property | Type | Description | | ---------------------- | ------------- | ---------------------------------------------------------------------------------- | | `client_reference_id?` | `string` | Optional tracking identifier string. Does not need to be unique **Max Length** 256 | | `filename?` | `string` | Custom filename for the uploaded file | | `signal?` | `AbortSignal` | AbortSignal for cancelling the upload | | `timeout_ms?` | `number` | Request timeout in milliseconds | *** ## WaitOptions ```ts type WaitOptions = { interval_ms?: number; on_status_change?: (status, transcription) => void; signal?: AbortSignal; timeout_ms?: number; }; ``` Options for polling/waiting for transcription completion. **Properties** | Property | Type | Description | | ------------------- | ------------------------------------- | ---------------------------------------------------------------------- | | `interval_ms?` | `number` | Polling interval in milliseconds. **Default** `1000` **Minimum** 1000 | | `on_status_change?` | (`status`, `transcription`) => `void` | Callback invoked when status changes. | | `signal?` | `AbortSignal` | AbortSignal to cancel waiting. | | `timeout_ms?` | `number` | Maximum time to wait in milliseconds. **Default** `300000 (5 minutes)` | *** ## WebhookAuthConfig ```ts type WebhookAuthConfig = { name: string; value: string; }; ``` Authentication configuration for webhook verification **Properties** | Property | Type | Description | | -------- | -------- | -------------------------------------------------- | | `name` | `string` | Expected header name (case-insensitive comparison) | | `value` | `string` | Expected header value (exact match) | *** ## WebhookEvent ```ts type WebhookEvent = { id: string; status: WebhookEventStatus; }; ``` Webhook event payload sent by Soniox when a transcription completes or fails. **Properties** | Property | Type | Description | | -------- | ------------------------------------------------ | -------------------------------- | | `id` | `string` | Transcription ID **Format** uuid | | `status` | [`WebhookEventStatus`](types#webhookeventstatus) | Transcription result status | *** ## WebhookEventStatus ```ts type WebhookEventStatus = "completed" | "error"; ``` Webhook event status values *** ## WebhookHandlerResult ```ts type WebhookHandlerResult = { error?: string; event?: WebhookEvent; ok: boolean; status: number; }; ``` Result of webhook handling **Properties** | Property | Type | Description | | -------- | ------------------------------------ | ------------------------------------------------ | | `error?` | `string` | Error message (only present when ok=false) | | `event?` | [`WebhookEvent`](types#webhookevent) | Parsed webhook event (only present when ok=true) | | `ok` | `boolean` | Whether the webhook was handled successfully | | `status` | `number` | HTTP status code to return | *** ## WebhookHandlerResultWithFetch ```ts type WebhookHandlerResultWithFetch = WebhookHandlerResult & { fetchTranscript: | () => Promise | undefined; fetchTranscription: | () => Promise | undefined; }; ``` Result of webhook handling with lazy fetch capabilities. When using `client.webhooks.handleExpress()` (or other framework handlers), the result includes helper methods to fetch the transcript or transcription. **Type Declaration** | Name | Type | Description | | -------------------- | -------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `fetchTranscript` | \| () => `Promise`\<[`ISonioxTranscript`](types#isonioxtranscript) \| `null`> \| `undefined` | Fetch the transcript for a completed transcription. Only available when `ok=true` and `event.status='completed'`. **Example** `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); }` | | `fetchTranscription` | \| () => `Promise`\<[`ISonioxTranscription`](types#isonioxtranscription) \| `null`> \| `undefined` | Fetch the full transcription object. Useful for both completed (metadata) and error (error details) statuses. **Example** `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'error') { const transcription = await result.fetchTranscription(); console.log(transcription?.error_message); }` | *** ## WebhookHeaders ```ts type WebhookHeaders = | Headers | Record | { get: string | null; }; ``` Headers object type - supports both standard headers and record types *** ## HttpClient Pluggable HTTP client interface **Methods** **request()** ```ts request(request): Promise>; ``` Perform an HTTP request **Type Parameters** | Type Parameter | | -------------- | | `T` | **Parameters** | Parameter | Type | Description | | --------- | ---------------------------------- | --------------------- | | `request` | [`HttpRequest`](types#httprequest) | Request configuration | **Returns** `Promise`\<[`HttpResponse`](types#httpresponset)\<`T`>> Promise resolving to the response **Throws** [SonioxHttpError](classes#sonioxhttperror) On network errors, timeouts, HTTP errors, or parse errors *** ## HttpErrorDetails Error details for SonioxHttpError **Properties** | Property | Type | Description | | ------------- | -------------------------------------- | ---------------------------------- | | `bodyText?` | `string` | Response body text (capped at 4KB) | | `cause?` | `unknown` | - | | `code` | [`HttpErrorCode`](types#httperrorcode) | - | | `headers?` | `Record`\<`string`, `string`> | - | | `message` | `string` | - | | `method` | [`HttpMethod`](types#httpmethod) | - | | `statusCode?` | `number` | - | | `url` | `string` | - | *** ## HttpRequest HTTP request configuration **Properties** | Property | Type | Description | | --------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | `body?` | [`HttpRequestBody`](types#httprequestbody) | Request body | | `headers?` | `Record`\<`string`, `string`> | Request headers | | `method` | [`HttpMethod`](types#httpmethod) | HTTP method | | `path` | `string` | URL path (relative to baseUrl) or absolute URL | | `query?` | [`QueryParams`](types#queryparams) | Query parameters (will be URL-encoded) | | `responseType?` | [`HttpResponseType`](types#httpresponsetype) | Expected response type **Default** `'json'` | | `signal?` | `AbortSignal` | Optional AbortSignal for request cancellation If provided along with timeoutMs, both will be respected | | `timeoutMs?` | `number` | Request timeout in milliseconds If not specified, uses the client's default timeout | *** ## HttpResponse\ HTTP response from the client **Type Parameters** | Type Parameter | | -------------- | | `T` | **Properties** | Property | Type | Description | | --------- | ----------------------------- | ----------------------------------------------- | | `data` | `T` | Parsed response data | | `headers` | `Record`\<`string`, `string`> | Response headers (normalized to lowercase keys) | | `status` | `number` | HTTP status code | *** ## ISonioxTranscript Type contract for SonioxTranscript class. **See** SonioxTranscript for full documentation. **Methods** **segments()** ```ts segments(options?): TranscriptSegment[]; ``` **Parameters** | Parameter | Type | | ---------- | ------------------------------------------------------------ | | `options?` | [`SegmentTranscriptOptions`](types#segmenttranscriptoptions) | **Returns** [`TranscriptSegment`](types#transcriptsegment)\[] **Properties** | Property | Type | | -------- | --------------------------------------------- | | `id` | `string` | | `text` | `string` | | `tokens` | [`TranscriptToken`](types#transcripttoken)\[] | *** ## ISonioxTranscription Type contract for SonioxTranscription class. **See** SonioxTranscription for full documentation. **Methods** **delete()** ```ts delete(): Promise; ``` **Returns** `Promise`\<`void`> *** **destroy()** ```ts destroy(): Promise; ``` **Returns** `Promise`\<`void`> *** **getTranscript()** ```ts getTranscript(options?): Promise; ``` **Parameters** | Parameter | Type | | ----------------- | --------------------------------------------------- | | `options?` | \{ `force?`: `boolean`; `signal?`: `AbortSignal`; } | | `options.force?` | `boolean` | | `options.signal?` | `AbortSignal` | **Returns** `Promise`\<[`ISonioxTranscript`](types#isonioxtranscript) | `null`> *** **refresh()** ```ts refresh(signal?): Promise; ``` **Parameters** | Parameter | Type | | --------- | ------------- | | `signal?` | `AbortSignal` | **Returns** `Promise`\<`ISonioxTranscription`> *** **toJSON()** ```ts toJSON(): SonioxTranscriptionData; ``` **Returns** [`SonioxTranscriptionData`](types#sonioxtranscriptiondata) *** **wait()** ```ts wait(options?): Promise; ``` **Parameters** | Parameter | Type | | ---------- | ---------------------------------- | | `options?` | [`WaitOptions`](types#waitoptions) | **Returns** `Promise`\<`ISonioxTranscription`> **Properties** | Property | Type | | -------------------------------- | -------------------------------------------------------------------------------- | | `audio_duration_ms` | `number` \| `null` \| `undefined` | | `audio_url` | `string` \| `null` \| `undefined` | | `client_reference_id` | `string` \| `null` \| `undefined` | | `context` | \| [`TranscriptionContext`](types#transcriptioncontext) \| `null` \| `undefined` | | `created_at` | `string` | | `enable_language_identification` | `boolean` | | `enable_speaker_diarization` | `boolean` | | `error_message` | `string` \| `null` \| `undefined` | | `error_type` | `string` \| `null` \| `undefined` | | `file_id` | `string` \| `null` \| `undefined` | | `filename` | `string` | | `id` | `string` | | `language_hints` | `string`\[] \| `undefined` | | `model` | `string` | | `status` | [`TranscriptionStatus`](types#transcriptionstatus) | | `transcript` | [`ISonioxTranscript`](types#isonioxtranscript) \| `null` \| `undefined` | | `webhook_auth_header_name` | `string` \| `null` \| `undefined` | | `webhook_auth_header_value` | `string` \| `null` \| `undefined` | | `webhook_status_code` | `number` \| `null` \| `undefined` | | `webhook_url` | `string` \| `null` \| `undefined` | # Async Client URL: /stt/SDKs/python-SDK/Full-SDK-reference/async_client Soniox Python SDK - Async Client Reference *** ## AsyncSonioxClient Asynchronous Soniox REST client exposing HTTP and realtime helpers. ### Constructor ```python AsyncSonioxClient(api_key: str | None = None, api_base_url: str | None = None, websocket_base_url: str | None = None, timeout_sec: float | None = None, webhook_secret: str | None = None, webhook_signature_header: str | None = None, **client_kwargs: Any) ``` **Parameters** | Parameter | Type | Description | | -------------------------- | --------------- | ------------------------------------------------ | | `api_key` | `str \| None` | API key used for authentication. | | `api_base_url` | `str \| None` | Base URL for Soniox REST API requests. | | `websocket_base_url` | `str \| None` | Base URL for Soniox realtime WebSocket endpoint. | | `timeout_sec` | `float \| None` | Maximum wait time in seconds. | | `webhook_secret` | `str \| None` | Webhook secret used for signature verification. | | `webhook_signature_header` | `str \| None` | Webhook signature header name. | | `client_kwargs` | `Any` | Additional HTTP client keyword arguments. | **Returns** `None` ### Properties | Property | Type | Description | | ---------- | ------------------------ | ----------------------------------------------------------- | | `files` | `AsyncFilesAPI` | List of uploaded files. | | `stt` | `AsyncSttAPI` | Speech-to-text API namespace. | | `models` | `AsyncModelsAPI` | List of all available models. | | `auth` | `AsyncAuthAPI` | Authentication API namespace. | | `webhooks` | `AsyncSonioxWebhooksAPI` | Webhook utilities API namespace. | | `realtime` | `AsyncRealtimeAPI` | Entrypoint for async realtime helpers on AsyncSonioxClient. | ### request() ```python request(method: str, path: str, *, params: Mapping[str, Any] | None = None, json: Any | None = None, data: Mapping[str, Any] | None = None, files: Mapping[str, Any] | None = None) -> httpx.Response ``` Perform a request against the configured Soniox REST endpoint. **Parameters** | Parameter | Type | Description | | --------- | --------------------------- | ----------------------------------- | | `method` | `str` | HTTP method to use for the request. | | `path` | `str` | Relative API path for the request. | | `params` | `Mapping[str, Any] \| None` | Query parameters for the request. | | `json` | `Any \| None` | JSON request payload. | | `data` | `Mapping[str, Any] \| None` | Form-encoded request payload. | | `files` | `Mapping[str, Any] \| None` | Multipart file payload mapping. | **Returns** `httpx.Response` *** ### aclose() ```python aclose() -> None ``` Close any outstanding async HTTP connections. **Returns** `None` *** ## AsyncFilesAPI ### Constructor ```python AsyncFilesAPI(client: AsyncSonioxClient) ``` **Parameters** | Parameter | Type | Description | | --------- | ------------------- | ----------------------- | | `client` | `AsyncSonioxClient` | Soniox client instance. | **Returns** `None` ### list() ```python list(limit: int = 100, cursor: str | None = None) -> GetFilesResponse ``` List uploaded files. Performs a GET request to `/files` with optional pagination. **Parameters** | Parameter | Type | Description | | --------- | ------------- | ----------------------------------------------- | | `limit` | `int` | Maximum number of files to return. | | `cursor` | `str \| None` | Pagination cursor for the next page of results. | **Returns** `GetFilesResponse` **Raises** * `SonioxAPIError` When the API returns an error. *** ### list\_all() ```python list_all(limit: int = 100) -> AsyncGenerator[File, None] ``` Iterate through all uploaded files across all pages. **Parameters** | Parameter | Type | Description | | --------- | ----- | ---------------------------------- | | `limit` | `int` | Maximum number of files to return. | **Yields** `AsyncGenerator[File, None]` File: The next file object from the API. **Raises** * `SonioxAPIError` When the API returns an error. *** ### get() ```python get(file_id: str) -> File ``` Retrieve a file by ID. Performs a GET request to `/files/{file_id}`. **Parameters** | Parameter | Type | Description | | --------- | ----- | --------------------------------- | | `file_id` | `str` | ID of a previously uploaded file. | **Returns** `File` **Raises** * `SonioxAPIError` When the API returns an error. *** ### get\_or\_none() ```python get_or_none(file_id: str) -> File | None ``` Retrieve a file by ID. Returns `None` if the file does not exist. **Parameters** | Parameter | Type | Description | | --------- | ----- | --------------------------------- | | `file_id` | `str` | ID of a previously uploaded file. | **Returns** `File | None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### delete() ```python delete(file_id: str) -> None ``` Delete a file by ID. Performs a DELETE request to `/files/{file_id}`. **Parameters** | Parameter | Type | Description | | --------- | ----- | --------------------------------- | | `file_id` | `str` | ID of a previously uploaded file. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### delete\_if\_exists() ```python delete_if_exists(file_id: str) -> None ``` Delete a file by ID if it exists. Ignores missing files. **Parameters** | Parameter | Type | Description | | --------- | ----- | --------------------------------- | | `file_id` | `str` | ID of a previously uploaded file. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### upload() ```python upload(file: BinaryIO | bytes | Path | str, *, filename: str | None = None, client_reference_id: str | None = None) -> File ``` Upload a file. Performs a multipart POST request to `/files`. **Parameters** | Parameter | Type | Description | | --------------------- | ---------------------------------- | --------------------------------------------------------------- | | `file` | `BinaryIO \| bytes \| Path \| str` | File input to upload or transcribe. | | `filename` | `str \| None` | Filename associated with uploaded file data. | | `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique | **Returns** `File` **Raises** * `SonioxAPIError` When the API returns an error. *** ### delete\_all() ```python delete_all(limit: int = 100) -> None ``` Delete all files. Iterates through all pages and deletes each file. Stops and raises on the first failed deletion. **Parameters** | Parameter | Type | Description | | --------- | ----- | ---------------------------------- | | `limit` | `int` | Maximum number of files to return. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error. *** ## AsyncSttAPI ### Constructor ```python AsyncSttAPI(client: AsyncSonioxClient) ``` **Parameters** | Parameter | Type | Description | | --------- | ------------------- | ----------------------- | | `client` | `AsyncSonioxClient` | Soniox client instance. | **Returns** `None` ### list() ```python list(limit: int = 100, cursor: str | None = None) -> GetTranscriptionsResponse ``` List transcriptions. Performs a GET request to `/transcriptions` with optional pagination. **Parameters** | Parameter | Type | Description | | --------- | ------------- | ----------------------------------------------- | | `limit` | `int` | Maximum number of transcriptions to return. | | `cursor` | `str \| None` | Pagination cursor for the next page of results. | **Returns** `GetTranscriptionsResponse` **Raises** * `SonioxAPIError` When the API returns an error. *** ### list\_all() ```python list_all(limit: int = 100) -> AsyncGenerator[Transcription, None] ``` Iterate through all transcriptions across all pages. **Parameters** | Parameter | Type | Description | | --------- | ----- | ------------------------------------------- | | `limit` | `int` | Maximum number of transcriptions to return. | **Yields** `AsyncGenerator[Transcription, None]` File: The next transcription object from the API. **Raises** * `SonioxAPIError` When the API returns an error. *** ### delete\_all() ```python delete_all(limit: int = 100) -> None ``` Delete all transcriptions. Iterates through all pages and deletes each transcription. Stops and raises on the first failed deletion. **Parameters** | Parameter | Type | Description | | --------- | ----- | ------------------------------------------- | | `limit` | `int` | Maximum number of transcriptions to return. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### create() ```python create(*, model: str = DEFAULT_MODEL, file_id: str | None = None, audio_url: str | None = None, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription ``` Create a transcription. Performs a POST request to `/transcriptions`. **Parameters** | Parameter | Type | Description | | --------------------- | ----------------------------------- | ----------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `file_id` | `str \| None` | ID of a previously uploaded file. | | `audio_url` | `str \| None` | Publicly accessible audio URL. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. *** ### get() ```python get(transcription_id: str) -> Transcription ``` Retrieve a transcription by ID. Performs a GET request to `/transcriptions/{transcription_id}`. **Parameters** | Parameter | Type | Description | | ------------------ | ----- | ------------------------- | | `transcription_id` | `str` | Transcription identifier. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. *** ### get\_or\_none() ```python get_or_none(transcription_id: str) -> Transcription | None ``` Retrieve a transcription by ID. Returns `None` if the transcription does not exist. **Parameters** | Parameter | Type | Description | | ------------------ | ----- | ------------------------- | | `transcription_id` | `str` | Transcription identifier. | **Returns** `Transcription | None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### delete() ```python delete(transcription_id: str) -> None ``` Delete a transcription by ID. Performs a DELETE request to `/transcriptions/{transcription_id}`. **Parameters** | Parameter | Type | Description | | ------------------ | ----- | ------------------------- | | `transcription_id` | `str` | Transcription identifier. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### delete\_if\_exists() ```python delete_if_exists(transcription_id: str) -> None ``` Delete a transcription by ID if it exists. Ignores missing transcriptions. **Parameters** | Parameter | Type | Description | | ------------------ | ----- | ------------------------- | | `transcription_id` | `str` | Transcription identifier. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### destroy() ```python destroy(transcription_id: str) -> None ``` Delete a transcription and its associated uploaded file. **Parameters** | Parameter | Type | Description | | ------------------ | ----- | ------------------------- | | `transcription_id` | `str` | Transcription identifier. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error. *** ### destroy\_all() ```python destroy_all(limit: int = 100) -> None ``` Delete all transcriptions and their associated files. Stops and raises on the first failed deletion. **Parameters** | Parameter | Type | Description | | --------- | ----- | ------------------------------------------- | | `limit` | `int` | Maximum number of transcriptions to return. | **Returns** `None` **Raises** * `SonioxAPIError` When the API returns an error during listing. *** ### get\_transcript() ```python get_transcript(transcription_id: str) -> TranscriptionTranscript ``` Retrieve the transcript for a transcription. Performs a GET request to `/transcriptions/{transcription_id}/transcript`. **Parameters** | Parameter | Type | Description | | ------------------ | ----- | ------------------------- | | `transcription_id` | `str` | Transcription identifier. | **Returns** `TranscriptionTranscript` **Raises** * `SonioxAPIError` When the API returns an error. *** ### wait() ```python wait(transcription_id: str, *, interval_sec: float = 5.0, timeout_sec: float | None = None) -> Transcription ``` Poll a transcription until it leaves the queued or processing state. **Parameters** | Parameter | Type | Description | | ------------------ | --------------- | ----------------------------- | | `transcription_id` | `str` | Transcription identifier. | | `interval_sec` | `float` | Polling interval in seconds. | | `timeout_sec` | `float \| None` | Maximum wait time in seconds. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. * `TimeoutError` Waiting for the transcription to finish exceeded `timeout_sec`. *** ### transcribe\_from\_url() ```python transcribe_from_url(*, model: str = DEFAULT_MODEL, audio_url: str, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription ``` Create a transcription from an audio URL. **Parameters** | Parameter | Type | Description | | --------------------- | ----------------------------------- | ----------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `audio_url` | `str` | Publicly accessible audio URL. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. *** ### transcribe\_from\_file\_id() ```python transcribe_from_file_id(*, model: str = DEFAULT_MODEL, file_id: str, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription ``` Create a transcription from an existing uploaded file. **Parameters** | Parameter | Type | Description | | --------------------- | ----------------------------------- | ----------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `file_id` | `str` | ID of a previously uploaded file. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. *** ### transcribe\_from\_file() ```python transcribe_from_file(*, model: str = DEFAULT_MODEL, file: BinaryIO | bytes | Path | str, filename: str | None = None, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription ``` Upload a file and create a transcription from it. **Parameters** | Parameter | Type | Description | | --------------------- | ----------------------------------- | -------------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `file` | `BinaryIO \| bytes \| Path \| str` | File input to upload or transcribe. | | `filename` | `str \| None` | Filename associated with uploaded file data. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. *** ### transcribe() ```python transcribe(*, model: str = DEFAULT_MODEL, audio_url: str | None = None, file_id: str | None = None, file: BinaryIO | bytes | Path | str | None = None, filename: str | None = None, client_reference_id: str | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription ``` Create a transcription from a file, file ID, or audio URL. Validates mutually exclusive inputs before submission. **Parameters** | Parameter | Type | Description | | --------------------- | ------------------------------------------ | -------------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `audio_url` | `str \| None` | Publicly accessible audio URL. | | `file_id` | `str \| None` | ID of a previously uploaded file. | | `file` | `BinaryIO \| bytes \| Path \| str \| None` | File input to upload or transcribe. | | `filename` | `str \| None` | Filename associated with uploaded file data. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. * `SonioxValidationError` When the payload fails validation. *** ### transcribe\_file\_with\_webhook() ```python transcribe_file_with_webhook(*, model: str = DEFAULT_MODEL, file: BinaryIO | bytes | Path | str, webhook_url: str, filename: str | None = None, client_reference_id: str | None = None, webhook_auth: WebhookAuthConfig | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription ``` Upload a file, configure a webhook, and start transcription. **Parameters** | Parameter | Type | Description | | --------------------- | ----------------------------------- | -------------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `file` | `BinaryIO \| bytes \| Path \| str` | File input to upload or transcribe. | | `webhook_url` | `str` | URL to receive webhook notifications. | | `filename` | `str \| None` | Filename associated with uploaded file data. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `webhook_auth` | `WebhookAuthConfig \| None` | Webhook authentication configuration. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. *** ### transcribe\_and\_wait() ```python transcribe_and_wait(*, model: str = DEFAULT_MODEL, audio_url: str | None = None, file_id: str | None = None, file: BinaryIO | bytes | Path | str | None = None, filename: str | None = None, client_reference_id: str | None = None, delete_after: bool = False, wait_interval_sec: float = 5.0, wait_timeout_sec: float | None = None, config: CreateTranscriptionConfig | None = None) -> Transcription ``` Create a transcription and wait for completion. Returns a Transcription object after it is completed. Optionally deletes the transcription and the uploaded file after completion. **Parameters** | Parameter | Type | Description | | --------------------- | ------------------------------------------ | ----------------------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `audio_url` | `str \| None` | Publicly accessible audio URL. | | `file_id` | `str \| None` | ID of a previously uploaded file. | | `file` | `BinaryIO \| bytes \| Path \| str \| None` | File input to upload or transcribe. | | `filename` | `str \| None` | Filename associated with uploaded file data. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `delete_after` | `bool` | Whether to delete created resources after completion. | | `wait_interval_sec` | `float` | Polling interval in seconds while waiting. | | `wait_timeout_sec` | `float \| None` | Maximum wait time in seconds while polling. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `Transcription` **Raises** * `SonioxAPIError` When the API returns an error. * `SonioxValidationError` When the payload fails validation. * `TimeoutError` Waiting for the transcription to finish exceeded `timeout_sec`. *** ### transcribe\_and\_wait\_with\_tokens() ```python transcribe_and_wait_with_tokens(*, model: str = DEFAULT_MODEL, audio_url: str | None = None, file_id: str | None = None, file: BinaryIO | bytes | Path | str | None = None, filename: str | None = None, client_reference_id: str | None = None, delete_after: bool = False, wait_interval_sec: float = 5.0, wait_timeout_sec: float | None = None, config: CreateTranscriptionConfig | None = None) -> TranscriptionTranscript ``` Create a transcription, wait for completion, and return the transcript. Optionally deletes the transcription and uploaded file after completion. **Parameters** | Parameter | Type | Description | | --------------------- | ------------------------------------------ | ----------------------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `audio_url` | `str \| None` | Publicly accessible audio URL. | | `file_id` | `str \| None` | ID of a previously uploaded file. | | `file` | `BinaryIO \| bytes \| Path \| str \| None` | File input to upload or transcribe. | | `filename` | `str \| None` | Filename associated with uploaded file data. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | | `delete_after` | `bool` | Whether to delete created resources after completion. | | `wait_interval_sec` | `float` | Polling interval in seconds while waiting. | | `wait_timeout_sec` | `float \| None` | Maximum wait time in seconds while polling. | | `config` | `CreateTranscriptionConfig \| None` | Configuration options for this operation. | **Returns** `TranscriptionTranscript` **Raises** * `SonioxAPIError` When the API returns an error. * `SonioxValidationError` When the payload fails validation. * `TimeoutError` Waiting for the transcription to finish exceeded `timeout_sec`. *** ## AsyncModelsAPI ### Constructor ```python AsyncModelsAPI(client: AsyncSonioxClient) ``` **Parameters** | Parameter | Type | Description | | --------- | ------------------- | ----------------------- | | `client` | `AsyncSonioxClient` | Soniox client instance. | **Returns** `None` ### list() ```python list() -> GetModelsResponse ``` List available models. Performs a GET request to `/models`. **Returns** `GetModelsResponse` **Raises** * `SonioxAPIError` When the API returns an error. *** ## AsyncAuthAPI ### Constructor ```python AsyncAuthAPI(client: AsyncSonioxClient) ``` **Parameters** | Parameter | Type | Description | | --------- | ------------------- | ----------------------- | | `client` | `AsyncSonioxClient` | Soniox client instance. | **Returns** `None` ### create\_temporary\_api\_key() ```python create_temporary_api_key(*, usage_type: TemporaryApiKeyUsageType = 'transcribe_websocket', expires_in_seconds: int = 5 * 60, client_reference_id: str | None = None) -> CreateTemporaryApiKeyResponse ``` Create a temporary API key. Performs a POST request to `/auth/temporary-api-key`. **Parameters** | Parameter | Type | Description | | --------------------- | -------------------------- | --------------------------------------------------------------- | | `usage_type` | `TemporaryApiKeyUsageType` | Intended usage of the temporary API key. | | `expires_in_seconds` | `int` | Duration in seconds until the temporary API key expires | | `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique | **Returns** `CreateTemporaryApiKeyResponse` **Raises** * `SonioxAPIError` When the API returns an error. *** ## AsyncSonioxWebhooksAPI # Realtime Client URL: /stt/SDKs/python-SDK/Full-SDK-reference/realtime_client Soniox Python SDK - Realtime Client Reference *** ## RealtimeAPI Entrypoint for realtime helpers on SonioxClient. ### Constructor ```python RealtimeAPI(client: SonioxClient) ``` **Parameters** | Parameter | Type | Description | | --------- | -------------- | ----------------------- | | `client` | `SonioxClient` | Soniox client instance. | **Returns** `None` ### Properties | Property | Type | Description | | -------- | ------------------- | ----------------------------- | | `stt` | `RealtimeSTTClient` | Speech-to-text API namespace. | *** ## AsyncRealtimeAPI Entrypoint for async realtime helpers on AsyncSonioxClient. ### Constructor ```python AsyncRealtimeAPI(client: AsyncSonioxClient) ``` **Parameters** | Parameter | Type | Description | | --------- | ------------------- | ----------------------- | | `client` | `AsyncSonioxClient` | Soniox client instance. | **Returns** `None` ### Properties | Property | Type | Description | | -------- | ------------------------ | ----------------------------- | | `stt` | `AsyncRealtimeSTTClient` | Speech-to-text API namespace. | *** ## RealtimeSTTClient Factory for creating synchronous realtime speech-to-text sessions. This class validates credentials and prepares session configuration, but does not itself manage WebSocket connections. ### Constructor ```python RealtimeSTTClient(client: SonioxClient) ``` Create a realtime STT client bound to an existing API client. **Parameters** | Parameter | Type | Description | | --------- | -------------- | ------------------------------------------------------------- | | `client` | `SonioxClient` | Parent Soniox client providing configuration and credentials. | **Returns** `None` ### connect() ```python connect(*, config: RealtimeSTTConfig, api_key: str | None = None) -> RealtimeSTTSession ``` Create a new realtime STT session. The returned session is not connected until entered as a context manager. **Parameters** | Parameter | Type | Description | | --------- | ------------------- | --------------------------------------------------------------------------------- | | `config` | `RealtimeSTTConfig` | Realtime transcription configuration. | | `api_key` | `str \| None` | Optional API key override. If not provided, the client's default API key is used. | **Returns** `RealtimeSTTSession` A new RealtimeSTTSession instance. **Raises** * `SonioxValidationError` If no API key is available. *** ## RealtimeSTTSession Synchronous WebSocket session for a single real-time speech-to-text stream. This class manages the full lifecycle of a real-time transcription session: connecting to the WebSocket endpoint, streaming audio data, receiving events, and gracefully closing the stream. A session is stateful and represents exactly one streaming interaction with the Soniox realtime API. Instances are designed to be used as context managers. ### Constructor ```python RealtimeSTTSession(url: str, config: RealtimeSTTConfig) ``` Create a new realtime STT session. This does not open a network connection. The WebSocket connection is established when entering the context manager. **Parameters** | Parameter | Type | Description | | --------- | ------------------- | -------------------------------------------------------------------------------------- | | `url` | `str` | WebSocket URL for the realtime transcription endpoint. | | `config` | `RealtimeSTTConfig` | Configuration describing the audio format and transcription behavior for this session. | **Returns** `None` ### Properties | Property | Type | Description | | -------------- | ----------------------- | --------------------------------------------------------- | | `config` | `RealtimeSTTConfig` | Return the configuration used to initialize this session. | | `paused` | `bool` | Return True if the session is currently paused. | | `last_message` | `RealtimeEvent \| None` | Return the most recently received realtime event, if any. | ### close() ```python close() -> None ``` Gracefully close the realtime session. Sends a final empty message to signal end-of-stream, then closes the WebSocket connection. Calling this method multiple times is safe. **Returns** `None` *** ### send\_byte\_chunk() ```python send_byte_chunk(chunk: bytes) -> None ``` Send a single chunk of raw audio bytes to the realtime stream. The audio data must match the format declared in the session configuration (sample rate, channels, encoding). **Parameters** | Parameter | Type | Description | | --------- | ------- | ------------------------ | | `chunk` | `bytes` | Raw audio bytes to send. | **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected or the send operation fails. *** ### send\_bytes() ```python send_bytes(chunks: bytes | Iterator[bytes], *, finish: bool = True) -> None ``` Send audio data to the realtime stream. This method accepts either a single bytes object or an iterator yielding audio chunks. When an iterator is provided, a FINISH control message is sent automatically after all chunks have been transmitted. **Parameters** | Parameter | Type | Description | | --------- | -------------------------- | ---------------------------------------------------------- | | `chunks` | `bytes \| Iterator[bytes]` | Audio data as raw bytes or an iterator of byte chunks. | | `finish` | `bool` | Whether to send a finish signal after streaming completes. | **Returns** `None` *** ### send\_control\_message() ```python send_control_message(control_type: RealtimeControlType) -> None ``` Send a control message to the realtime session. Control messages modify the state of the stream, such as signaling end-of-audio or requesting finalization. **Parameters** | Parameter | Type | Description | | -------------- | --------------------- | ------------------------------------ | | `control_type` | `RealtimeControlType` | The type of control message to send. | **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected or the message cannot be sent. *** ### finish() ```python finish() -> None ``` Signal that no more audio will be sent for this session. **Returns** `None` *** ### keep\_alive() ```python keep_alive() -> None ``` Send a keep-alive message to prevent the session from timing out. **Returns** `None` *** ### finalize() ```python finalize() -> None ``` Finalize all outstanding non-final tokens while keeping the session open. Subsequent tokens will be delivered with `is_final=True`. **Returns** `None` *** ### recv\_bytes() ```python recv_bytes() -> bytes ``` Receive a raw message from the WebSocket connection. **Returns** `bytes` The received message as bytes. An empty bytes object indicates that the connection has been closed. *** ### parse\_event() ```python parse_event(raw: str | bytes) -> RealtimeEvent ``` Parse a raw WebSocket message into a structured realtime event. **Parameters** | Parameter | Type | Description | | --------- | -------------- | --------------------------------------------- | | `raw` | `str \| bytes` | Raw message payload received from the server. | **Returns** `RealtimeEvent` A validated RealtimeEvent instance. *** ### receive\_event() ```python receive_event() -> RealtimeEvent | None ``` Receive and parse the next realtime event from the server. **Returns** `RealtimeEvent | None` The next RealtimeEvent, or None if the connection has closed. **Raises** * `SonioxRealtimeError` If the session is not connected. *** ### receive\_events() ```python receive_events() -> Iterator[RealtimeEvent] ``` Yield realtime events as they are received from the server. Iteration stops automatically when the connection is closed. **Returns** `Iterator[RealtimeEvent]` *** ### handle\_events() ```python handle_events(handler: Callable[[RealtimeEvent], None]) -> None ``` Receive realtime events and dispatch them to a handler callback. **Parameters** | Parameter | Type | Description | | --------- | --------------------------------- | ------------------------------------------------- | | `handler` | `Callable[[RealtimeEvent], None]` | Callable invoked for each received RealtimeEvent. | **Returns** `None` *** ### pause() ```python pause() -> None ``` Pause the session, suppressing outgoing audio and starting a background keepalive thread. While paused, calls to :meth:`send_byte_chunk` are silently dropped. A background thread sends a keepalive message every `KEEP_ALIVE_INTERVAL_SEC` seconds to prevent the server from timing out the session. Calling `pause` on an already-paused session is a no-op. **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected. *** ### resume() ```python resume() -> None ``` Resume a paused session, stopping the keepalive thread and allowing audio to be sent again. Calling `resume` on a session that is not paused is a no-op. **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected. *** ## AsyncRealtimeSTTClient Factory for creating asynchronous realtime speech-to-text sessions. This class validates credentials and prepares session configuration, but does not itself manage WebSocket connections. ### Constructor ```python AsyncRealtimeSTTClient(client: AsyncSonioxClient) ``` Create a realtime STT client bound to an existing API client. **Parameters** | Parameter | Type | Description | | --------- | ------------------- | ------------------------------------------------------------- | | `client` | `AsyncSonioxClient` | Parent Soniox client providing configuration and credentials. | **Returns** `None` ### connect() ```python connect(*, config: RealtimeSTTConfig, api_key: str | None = None) -> AsyncRealtimeSTTSession ``` Create a new realtime STT session. The returned session is not connected until entered as an async context manager. **Parameters** | Parameter | Type | Description | | --------- | ------------------- | --------------------------------------------------------------------------------- | | `config` | `RealtimeSTTConfig` | Realtime transcription configuration. | | `api_key` | `str \| None` | Optional API key override. If not provided, the client's default API key is used. | **Returns** `AsyncRealtimeSTTSession` A new AsyncRealtimeSTTSession instance. **Raises** * `SonioxValidationError` If no API key is available. *** ## AsyncRealtimeSTTSession Asynchronous WebSocket session for a single real-time speech-to-text stream. This class manages the full lifecycle of a real-time transcription session: connecting to the WebSocket endpoint, streaming audio data, receiving events, and gracefully closing the stream. A session is stateful and represents exactly one streaming interaction with the Soniox realtime API. Instances are designed to be used as async context managers. ### Constructor ```python AsyncRealtimeSTTSession(url: str, config: RealtimeSTTConfig) ``` Create a new realtime STT session. This does not open a network connection. The WebSocket connection is established when entering the async context manager. **Parameters** | Parameter | Type | Description | | --------- | ------------------- | -------------------------------------------------------------------------------------- | | `url` | `str` | WebSocket URL for the realtime transcription endpoint. | | `config` | `RealtimeSTTConfig` | Configuration describing the audio format and transcription behavior for this session. | **Returns** `None` ### Properties | Property | Type | Description | | -------------- | ----------------------- | --------------------------------------------------------- | | `config` | `RealtimeSTTConfig` | Return the configuration used to initialize this session. | | `paused` | `bool` | Return True if the session is currently paused. | | `last_message` | `RealtimeEvent \| None` | Return the most recently received realtime event, if any. | ### close() ```python close() -> None ``` Gracefully close the realtime session. Sends a final empty message to signal end-of-stream, then closes the WebSocket connection. Calling this method multiple times is safe. **Returns** `None` *** ### send\_byte\_chunk() ```python send_byte_chunk(chunk: bytes) -> None ``` Send a single chunk of raw audio bytes to the realtime stream. The audio data must match the format declared in the session configuration (sample rate, channels, encoding). **Parameters** | Parameter | Type | Description | | --------- | ------- | ------------------------ | | `chunk` | `bytes` | Raw audio bytes to send. | **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected or the send operation fails. *** ### send\_bytes() ```python send_bytes(chunks: bytes | AsyncIterator[bytes], *, finish: bool = True) -> None ``` Send audio data to the realtime stream. This method accepts either a single bytes object or an iterator yielding audio chunks. When an iterator is provided, a FINISH control message is sent automatically after all chunks have been transmitted. **Parameters** | Parameter | Type | Description | | --------- | ------------------------------- | ---------------------------------------------------------- | | `chunks` | `bytes \| AsyncIterator[bytes]` | Audio data as raw bytes or an iterator of byte chunks. | | `finish` | `bool` | Whether to send a finish signal after streaming completes. | **Returns** `None` *** ### send\_control\_message() ```python send_control_message(control_type: RealtimeControlType) -> None ``` Send a control message to the realtime session. Control messages modify the state of the stream, such as signaling end-of-audio or requesting finalization. **Parameters** | Parameter | Type | Description | | -------------- | --------------------- | ------------------------------------ | | `control_type` | `RealtimeControlType` | The type of control message to send. | **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected or the message cannot be sent. *** ### finish() ```python finish() -> None ``` Signal that no more audio will be sent for this session. **Returns** `None` *** ### keep\_alive() ```python keep_alive() -> None ``` Send a keep-alive message to prevent the session from timing out. **Returns** `None` *** ### finalize() ```python finalize() -> None ``` Finalize all outstanding non-final tokens while keeping the session open. Subsequent tokens will be delivered with `is_final=True`. **Returns** `None` *** ### recv\_bytes() ```python recv_bytes() -> bytes ``` Receive a raw message from the WebSocket connection. **Returns** `bytes` The received message as bytes. An empty bytes object indicates that the connection has been closed. *** ### parse\_event() ```python parse_event(raw: str | bytes) -> RealtimeEvent ``` Parse a raw WebSocket message into a structured realtime event. **Parameters** | Parameter | Type | Description | | --------- | -------------- | --------------------------------------------- | | `raw` | `str \| bytes` | Raw message payload received from the server. | **Returns** `RealtimeEvent` A validated RealtimeEvent instance. *** ### receive\_event() ```python receive_event() -> RealtimeEvent | None ``` Receive and parse the next realtime event from the server. **Returns** `RealtimeEvent | None` The next RealtimeEvent, or None if the connection has closed. **Raises** * `SonioxRealtimeError` If the session is not connected. *** ### receive\_events() ```python receive_events() -> AsyncIterator[RealtimeEvent] ``` Yield realtime events as they are received from the server. Iteration stops automatically when the connection is closed. **Returns** `AsyncIterator[RealtimeEvent]` *** ### handle\_events() ```python handle_events(handler: Callable[[RealtimeEvent], Awaitable[None]]) -> None ``` Receive realtime events and dispatch them to a handler callback. **Parameters** | Parameter | Type | Description | | --------- | -------------------------------------------- | ------------------------------------------------- | | `handler` | `Callable[[RealtimeEvent], Awaitable[None]]` | Callable invoked for each received RealtimeEvent. | **Returns** `None` *** ### pause() ```python pause() -> None ``` Pause the session, suppressing outgoing audio and starting a background keepalive task. While paused, calls to :meth:`send_byte_chunk` are silently dropped. A background task sends a keepalive message every `KEEP_ALIVE_INTERVAL_SEC` seconds to prevent the server from timing out the session. Calling `pause` on an already-paused session is a no-op. **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected. *** ### resume() ```python resume() -> None ``` Resume a paused session, stopping the keepalive task and allowing audio to be sent again. Calling `resume` on a session that is not paused is a no-op. **Returns** `None` **Raises** * `SonioxRealtimeError` If the session is not connected. # Types URL: /stt/SDKs/python-SDK/Full-SDK-reference/types Soniox Python SDK - Types Reference *** ## Token Token metadata emitted during realtime streaming transcriptions. ### Properties | Property | Type | Description | | -------------------- | --------------- | ------------------------------------------------------------ | | `text` | `str` | The transcribed text. | | `start_ms` | `int \| None` | Start time in milliseconds relative to audio start. | | `end_ms` | `int \| None` | End time in milliseconds relative to audio start. | | `confidence` | `float \| None` | Confidence score (0.0 to 1.0). | | `is_final` | `bool \| None` | Whether this is a finalized token. | | `speaker` | `str \| None` | Speaker identifier (if diarization enabled). | | `translation_status` | `str \| None` | Translation status of this token. | | `language` | `str \| None` | Detected language code (if language identification enabled). | | `source_language` | `str \| None` | Source language for translated tokens. | *** ## ApiError Structured representation of a non-2xx API response payload. ### Properties | Property | Type | Description | | ------------------- | ------------------------------- | ------------------------------------------------------------------------------------------ | | `status_code` | `int` | HTTP status code. | | `error_type` | `str` | High-level error code (e.g., 'bad\_request', 'quota\_exceeded') for programmatic handling. | | `message` | `str` | Detailed error message describing the failure. | | `validation_errors` | `list[ApiErrorValidationError]` | List of specific field validation failures, if applicable. | | `request_id` | `str \| None` | Unique identifier for the request, useful for troubleshooting. | *** ## ApiErrorValidationError Details a single validation error reported by the Soniox API. ### Properties | Property | Type | Description | | ------------ | ----- | -------------------------------------------------------- | | `error_type` | `str` | The category of validation error. | | `location` | `str` | The location of the error, e.g. \['body', 'audio\_url']. | | `message` | `str` | A human-readable description of the validation failure. | *** ## CreateTemporaryApiKeyPayload Payload for requesting a temporary API key (e.g., websocket). ### Properties | Property | Type | Description | | --------------------- | -------------------------- | --------------------------------------------------------------- | | `usage_type` | `TemporaryApiKeyUsageType` | Intended usage of the temporary API key. | | `expires_in_seconds` | `int` | Duration in seconds until the temporary API key expires | | `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique | *** ## CreateTemporaryApiKeyResponse Response data for a temp API key request. ### Properties | Property | Type | Description | | ------------ | ---------- | --------------------------------------------------------------------- | | `api_key` | `str` | Created temporary API key. | | `expires_at` | `datetime` | UTC timestamp indicating when generated temporary API key will expire | *** ## CreateTranscriptionPayload Payload sent to create an asynchronous transcription job. ### Properties | Property | Type | Description | | -------------------------------- | --------------------------- | ------------------------------------------------------------------------------------------------- | | `model` | `str` | Speech-to-text model to use. | | `audio_url` | `str \| None` | URL of a publicly accessible audio file. | | `file_id` | `str \| None` | ID of a previously uploaded file (UUID). | | `language_hints` | `list[str] \| None` | Array of expected ISO language codes to bias recognition. | | `language_hints_strict` | `bool \| None` | When true, model relies more heavily on language hints (best results with one language hint set). | | `enable_speaker_diarization` | `bool \| None` | Enable speaker diarization to identify different speakers. | | `enable_language_identification` | `bool \| None` | Enable automatic language identification. | | `translation` | `TranslationConfig \| None` | Translation configuration. | | `context` | `StructuredContext \| None` | Additional context to improve transcription accuracy and formatting of specialized terms. | | `webhook_url` | `str \| None` | URL to receive webhook notifications when transcription is completed or fails. | | `webhook_auth_header_name` | `str \| None` | Name of the authentication header sent with webhook notifications | | `webhook_auth_header_value` | `str \| None` | Authentication header value sent with webhook notifications. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | *** ## CreateTranscriptionConfig Helper config used when building transcription payloads. ### Properties | Property | Type | Description | | -------------------------------- | --------------------------- | ----------------------------------------------------------------------------------------- | | `model` | `str \| None` | Speech-to-text model to use. | | `language_hints` | `list[str] \| None` | Array of expected ISO language codes to bias recognition. | | `language_hints_strict` | `bool \| None` | When true, model relies more heavily on language hints. | | `enable_speaker_diarization` | `bool \| None` | Enable speaker diarization to identify different speakers. | | `enable_language_identification` | `bool \| None` | Enable automatic language identification | | `translation` | `TranslationConfig \| None` | Translation configuration | | `context` | `StructuredContext \| None` | Additional context to improve transcription accuracy and formatting of specialized terms. | | `webhook_url` | `str \| None` | URL to receive webhook notifications when transcription is completed or fails. | | `webhook_auth_header_name` | `str \| None` | Name of the authentication header sent with webhook notifications | | `webhook_auth_header_value` | `str \| None` | Authentication header value sent with webhook notifications | | `client_reference_id` | `str \| None` | Optional tracking identifier | *** ## File Metadata describing an uploaded file in the Soniox API. ### Properties | Property | Type | Description | | --------------------- | ------------- | ---------------------------------------------------- | | `id` | `str` | Unique identifier of the file (UUID). | | `filename` | `str` | Name of the file. | | `size` | `int` | Size of the file in bytes. | | `created_at` | `datetime` | UTC timestamp indicating when the file was uploaded. | | `client_reference_id` | `str \| None` | Optional tracking identifier string. | *** ## GetFilesPayload Parameters accepted by the file listing endpoint. ### Properties | Property | Type | Description | | -------- | ------------- | ----------------------------------------------- | | `limit` | `int` | Maximum number of files to return. | | `cursor` | `str \| None` | Pagination cursor for the next page of results. | *** ## GetFilesResponse Paginated response returned when listing uploaded files. ### Properties | Property | Type | Description | | ------------------ | ------------- | ------------------------------------------------------------------------------------------------------------ | | `files` | `list[File]` | List of uploaded files. | | `next_page_cursor` | `str \| None` | A pagination token that references the next page of results. When None, no additional results are available. | *** ## GetModelsResponse Response returned when listing available models. ### Properties | Property | Type | Description | | -------- | ------------- | ----------------------------- | | `models` | `list[Model]` | List of all available models. | *** ## GetTranscriptionsPayload Parameters for listing transcription jobs. ### Properties | Property | Type | Description | | -------- | ------------- | ----------------------------------------------- | | `limit` | `int` | Maximum number of transcriptions to return. | | `cursor` | `str \| None` | Pagination cursor for the next page of results. | *** ## GetTranscriptionsResponse Paginated response for transcription listings. ### Properties | Property | Type | Description | | ------------------ | --------------------- | ------------------------------------------------------------------------------------------------------------ | | `transcriptions` | `list[Transcription]` | List of transcriptions. | | `next_page_cursor` | `str \| None` | A pagination token that references the next page of results. When None, no additional results are available. | *** ## Model Describes a Soniox transcription model. ### Properties | Property | Type | Description | | -------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------- | | `id` | `str` | Unique identifier of the model. | | `aliased_model_id` | `str \| None` | If this is an alias, the id of the aliased model. None for non-alias models. | | `name` | `str` | Name of the model. | | `context_version` | `int \| None` | Version of context supported. | | `transcription_mode` | `TranscriptionMode` | Transcription mode of the model. | | `languages` | `list[Language]` | List of languages supported by the model. | | `supports_language_hints_strict` | `bool` | If model supports 'language\_hints\_strict' option. | | `translation_targets` | `list[TranslationTarget]` | List of supported one-way translation targets. If list is empty, check for one\_way\_translation field. | | `two_way_translation_pairs` | `list[str]` | List of supported two-way translation pairs. If list is empty, check for one\_way\_translation field. | | `one_way_translation` | `str \| None` | When contains string 'all\_languages', any language from languages can be used | | `two_way_translation` | `str \| None` | When contains string 'all\_languages',' any language pair from languages can be used | *** ## StructuredContext Optional structured context provided to the transcription engine. ### Properties | Property | Type | Description | | ------------------- | ------------------------------------------------ | --------------------------------------------------------------------------------------------------- | | `general` | `list[StructuredContextGeneralItem] \| None` | Structured key-value pairs describing domain, topic, intent, participant names, etc. | | `text` | `str \| None` | Longer free-form background text, prior interaction history, reference documents, or meeting notes. | | `terms` | `list[str] \| None` | Domain-specific or uncommon words to recognize. | | `translation_terms` | `list[StructuredContextTranslationTerm] \| None` | Custom translations for ambiguous terms. | *** ## StructuredContextGeneralItem Single general context key/value pair for transcription context. ### Properties | Property | Type | Description | | -------- | ----- | ------------------------------------------------------------------------ | | `key` | `str` | The key describing the context type (e.g., "domain", "topic", "doctor"). | | `value` | `str` | The value for the context key. | *** ## StructuredContextTranslationTerm Defines a translation term mapping used in structured context. ### Properties | Property | Type | Description | | -------- | ----- | ------------------------------------ | | `source` | `str` | The source term to translate. | | `target` | `str` | The target translation for the term. | *** ## Transcription Represents a transcription job tracked by Soniox. ### Properties | Property | Type | Description | | -------------------------------- | --------------------- | -------------------------------------------------------------------------------------------- | | `id` | `str` | Unique identifier of the transcription (UUID). | | `status` | `TranscriptionStatus` | Current status of the transcription. | | `created_at` | `datetime` | UTC timestamp when the transcription was created. | | `model` | `str` | Speech-to-text model used. | | `audio_url` | `str \| None` | URL of the audio file being transcribed. | | `file_id` | `str \| None` | ID of the uploaded file being transcribed (UUID). | | `filename` | `str` | Name of the file being transcribed. | | `language_hints` | `list[str] \| None` | Expected languages in the audio. If not specified, languages are automatically detected. | | `enable_speaker_diarization` | `bool` | When true, speakers are identified and separated in the transcription output. | | `enable_language_identification` | `bool` | When true, language is detected for each part of the transcription. | | `audio_duration_ms` | `int \| None` | Duration of the audio in milliseconds. Only available after processing begins. | | `error_type` | `str \| None` | Error type if transcription failed. None for successful or in-progress transcriptions. | | `error_message` | `str \| None` | Error message if transcription failed. None for successful or in-progress transcriptions. | | `webhook_url` | `str \| None` | URL to receive webhook notifications when transcription is completed or fails. | | `webhook_auth_header_name` | `str \| None` | Name of the authentication header sent with webhook notifications. | | `webhook_auth_header_value` | `str \| None` | Authentication header value. Always returned masked. | | `webhook_status_code` | `int \| None` | HTTP status code received from your server when webhook was delivered. None if not yet sent. | | `client_reference_id` | `str \| None` | Optional tracking identifier. | *** ## TranscriptionStatus ```python TranscriptionStatus = Literal["queued", "processing", "completed", "error"] ``` Current status of the transcription job. *** ## TranscriptionTranscript Transcript data including the full text and tokens. ### Properties | Property | Type | Description | | -------- | ------------- | ------------------------------------------------------------------------- | | `id` | `str` | Unique identifier of the transcription this transcript belongs to (UUID). | | `text` | `str` | Complete transcribed text content. | | `tokens` | `list[Token]` | List of detailed token information with timestamps and metadata. | *** ## TranslationConfig Configuration describing how translation should be performed. ### Properties | Property | Type | Description | | ----------------- | ----------------- | ------------------------------------------------------------------------- | | `type` | `TranslationType` | Translation type. | | `target_language` | `str \| None` | Target language code for translation (e.g., "fr", "es", "de") (one\_way). | | `language_a` | `str \| None` | First language code (two\_way). | | `language_b` | `str \| None` | Second language code (two\_way). | ### validate\_logic() ```python validate_logic() -> TranslationConfig ``` **Returns** `TranslationConfig` *** ## TranslationTarget Describes translation targets offered by a model. ### Properties | Property | Type | Description | | -------------------------- | ----------- | ------------------------------------------------------------------------- | | `target_language` | `str` | Target language code for translation (e.g., "fr", "es", "de") (one\_way). | | `source_languages` | `list[str]` | List of source language codes. | | `exclude_source_languages` | `list[str]` | Source language codes excluded for this target. | *** ## TranslationType ```python TranslationType = Literal["one_way", "two_way"] ``` Supported translation configuration types. *** ## TemporaryApiKeyUsageType ```python TemporaryApiKeyUsageType = Literal["transcribe_websocket"] ``` Intended usage for temporary API keys. *** ## UploadFilePayload Optional metadata supplied at upload time. ### Properties | Property | Type | Description | | --------------------- | ------------- | --------------------------------------------------------------- | | `client_reference_id` | `str \| None` | Optional tracking identifier string. Does not need to be unique | *** ## RealtimeEvent Event payload received from the realtime STT websocket. ### Properties | Property | Type | Description | | --------------------- | ------------- | -------------------------------------------------- | | `tokens` | `list[Token]` | Tokens in this result. | | `final_audio_proc_ms` | `int \| None` | Milliseconds of audio that have been finalized. | | `total_audio_proc_ms` | `int \| None` | Total milliseconds of audio processed. | | `finished` | `bool` | Whether this is the final result (session ending). | | `error_code` | `int \| None` | Error code if the realtime operation failed. | | `error_message` | `str \| None` | Human-readable description of the error. | ### validate\_event() ```python validate_event(raw: str | bytes) -> RealtimeEvent ``` **Parameters** | Parameter | Type | Description | | --------- | -------------- | ---------------------------------------- | | `raw` | `str \| bytes` | Raw event payload from the realtime API. | **Returns** `RealtimeEvent` *** ## RealtimeSTTConfig Configuration for initiating a realtime transcription session. ### Properties | Property | Type | Description | | -------------------------------- | --------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | `api_key` | `str \| None` | API key for real-time sessions. | | `model` | `str` | Speech-to-text model to use. | | `audio_format` | `str` | Audio format. Use 'auto' for automatic detection of container formats. | | `num_channels` | `int \| None` | Number of audio channels (required for raw audio formats). | | `sample_rate` | `int \| None` | Sample rate in Hz (required for PCM formats). | | `language_hints` | `list[str] \| None` | Expected languages in the audio (ISO language codes). | | `language_hints_strict` | `bool \| None` | When true, recognition is strongly biased toward language hints (best results when using one language in language\_hints). | | `context` | `StructuredContext \| None` | Additional context to improve transcription accuracy. | | `enable_speaker_diarization` | `bool \| None` | Enable speaker identification. | | `enable_language_identification` | `bool \| None` | Enable automatic language detection. | | `enable_endpoint_detection` | `bool \| None` | Enable endpoint detection for utterance boundaries. | | `max_endpoint_delay_ms` | `int \| None` | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms | | `translation` | `TranslationConfig \| None` | Translation configuration. | | `client_reference_id` | `str \| None` | Optional tracking identifier (max 256 chars). | ### build\_payload() ```python build_payload(api_key: str) -> RealtimeSTTConfig ``` **Parameters** | Parameter | Type | Description | | --------- | ----- | -------------------------------- | | `api_key` | `str` | API key used for authentication. | **Returns** `RealtimeSTTConfig` *** ## Headers ```python Headers = Mapping[str, str] ``` *** ## WebhookAuthConfig Configuration for webhook authentication headers. ### Properties | Property | Type | Description | | -------- | ----- | --------------------------------------------------- | | `name` | `str` | Expected header name (case-insensitive comparison). | | `value` | `str` | Expected header value (exact match). | *** ## WebhookEvent Basic webhook event metadata. ### Properties | Property | Type | Description | | -------- | ------------------------------- | ---------------------------- | | `id` | `str` | Transcription ID (UUID). | | `status` | `Literal['completed', 'error']` | Transcription result status. | # Full React SDK reference URL: /stt/SDKs/react-SDK/reference Full SDK reference for the React SDK ## Components | Component | Description | | ---------------------------------------------------------------------- | --------------------------------------------------------------------- | | [`SonioxProvider`](/stt/SDKs/react-SDK/reference/types#sonioxprovider) | Provider component for the Soniox client | | [`AudioLevel`](/stt/SDKs/react-SDK/reference/types#audiolevel) | Component to display the audio level (not available for React Native) | ## Hooks | Hook | Description | | ---------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------- | | [`useRecording`](/stt/SDKs/react-SDK/reference/types#userecording) | Main hook for real-time speech-to-text session | | [`useMicrophonePermission`](/stt/SDKs/react-SDK/reference/types#usemicrophonepermission) | Hook for checking microphone permission | | [`useAudioLevel`](/stt/SDKs/react-SDK/reference/types#useaudiolevel) | Hook for real-time audio volume and spectrum data (not available for React Native) | | [`useSoniox`](/stt/SDKs/react-SDK/reference/types#usesoniox) | Hook toaccess the SonioxClient from context | # Types URL: /stt/SDKs/react-SDK/reference/types Soniox React SDK — Types Reference ## SonioxProviderProps ```ts type SonioxProviderProps = { children: ReactNode; } & SonioxProviderConfigProps | SonioxProviderClientProps; ``` Props for SonioxProvider. Supply either a pre-built `client` instance or configuration props **Type Declaration** | Name | Type | | ---------- | ----------- | | `children` | `ReactNode` | *** ## UnsupportedReason ```ts type UnsupportedReason = "ssr" | "no-mediadevices" | "no-getusermedia" | "insecure-context"; ``` Reason why the built-in browser `MicrophoneSource` is unavailable: * `'ssr'` — `navigator` is undefined (SSR, React Native, or other non-browser JS runtimes). * `'no-mediadevices'` — `navigator` exists but `navigator.mediaDevices` is missing. * `'no-getusermedia'` — `navigator.mediaDevices` exists but `getUserMedia` is not a function. * `'insecure-context'` — the page is not served over HTTPS. This only reflects whether the **default** `MicrophoneSource` can work. Custom `AudioSource` implementations (e.g. for React Native) bypass this check entirely and can record regardless of this value. *** ## AudioLevelProps **Extends** * [`UseAudioLevelOptions`](types#useaudioleveloptions) **Properties** | Property | Type | Description | | ------------ | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `active?` | `boolean` | Whether volume metering is active. When false, resources are released. | | `bands?` | `number` | Number of frequency bands to return. When set, the `bands` array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. | | `children` | (`state`) => `ReactNode` | - | | `fftSize?` | `number` | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. **Default** `256` | | `smoothing?` | `number` | Exponential smoothing factor (0-1). Higher = smoother/slower decay. **Default** `0.85` | *** ## AudioSupportResult **Properties** | Property | Type | | ------------- | ---------------------------------------------- | | `isSupported` | `boolean` | | `reason?` | [`UnsupportedReason`](types#unsupportedreason) | *** ## MicrophonePermissionState **Properties** | Property | Type | Description | | ------------- | ------------------------ | ---------------------------------------------------------------------- | | `canRequest` | `boolean` | Whether the permission can be requested (e.g., via a prompt). | | `check` | () => `Promise`\<`void`> | Check (or re-check) the microphone permission. No-op when unsupported. | | `isDenied` | `boolean` | `status === 'denied'`. | | `isGranted` | `boolean` | `status === 'granted'`. | | `isSupported` | `boolean` | Whether permission checking is available. | | `status` | `MicPermissionStatus` | Current permission status. | *** ## RecordingSnapshot Immutable snapshot of the recording state exposed to React. **Extended by** * [`UseRecordingReturn`](types#userecordingreturn) **Properties** | Property | Type | Description | | --------------- | ---------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `error` | `Error` \| `null` | Latest error, if any. | | `finalText` | `string` | Accumulated finalized text. | | `finalTokens` | readonly `RealtimeToken`\[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with `partialTokens` for the complete ordered stream. | | `groups` | `Readonly`\<`Record`\<`string`, `TokenGroup`>> | Tokens grouped by the active `groupBy` strategy. Auto-populated when `translation` config is provided: - `one_way` → keys: `"original"`, `"translation"` - `two_way` → keys: language codes (e.g. `"en"`, `"es"`) Empty `{}` when no grouping is active. | | `isActive` | `boolean` | `true` when state is not idle/stopped/canceled/error. | | `isPaused` | `boolean` | `true` when `state === 'paused'`. | | `isRecording` | `boolean` | `true` when `state === 'recording'`. | | `isSourceMuted` | `boolean` | `true` when the audio source is muted externally (e.g. OS-level or hardware mute). | | `partialText` | `string` | Text from current non-final tokens. | | `partialTokens` | readonly `RealtimeToken`\[] | Non-final tokens from the latest result. | | `result` | `RealtimeResult` \| `null` | Latest raw result from the server. | | `segments` | readonly `RealtimeSegment`\[] | Accumulated final segments. | | `state` | `RecordingState` | Current recording lifecycle state. | | `text` | `string` | Full transcript: `finalText + partialText`. | | `tokens` | readonly `RealtimeToken`\[] | Tokens from the latest result message. | | `utterances` | readonly `RealtimeUtterance`\[] | Accumulated utterances (one per endpoint). | *** ## UseAudioLevelOptions **Extended by** * [`AudioLevelProps`](types#audiolevelprops) **Properties** | Property | Type | Description | | ------------ | --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `active?` | `boolean` | Whether volume metering is active. When false, resources are released. | | `bands?` | `number` | Number of frequency bands to return. When set, the `bands` array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. | | `fftSize?` | `number` | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. **Default** `256` | | `smoothing?` | `number` | Exponential smoothing factor (0-1). Higher = smoother/slower decay. **Default** `0.85` | *** ## UseAudioLevelReturn **Properties** | Property | Type | Description | | -------- | -------------------- | ------------------------------------------------------------------------------------ | | `bands` | readonly `number`\[] | Per-band frequency levels, each 0-1. Empty array when the `bands` option is not set. | | `volume` | `number` | Current volume level, 0 to 1. Updated every animation frame. | *** ## UseMicrophonePermissionOptions **Properties** | Property | Type | Description | | ------------ | --------- | ---------------------------------------- | | `autoCheck?` | `boolean` | Automatically check permission on mount. | *** ## UseRecordingConfig Configuration for useRecording. Extends the STT session config (model, language\_hints, etc.) with recording-specific and React-specific options. Can be used **with or without** a ``: * **With Provider:** omit `apiKey` — the client is read from context. * **Without Provider:** pass `apiKey` directly — a client is created internally. **Extends** * `SttSessionConfig` **Properties** | Property | Type | Description | | --------------------------------- | ----------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `apiKey?` | `ApiKeyConfig` | API key — string or async function that fetches a temporary key. Required when not using ``. | | `audio_format?` | `"auto"` \| `AudioFormat` | Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample\_rate and num\_channels. **Default** `'auto'` | | `buffer_queue_size?` | `number` | Maximum audio chunks to buffer during connection setup. | | `client_reference_id?` | `string` | Optional tracking identifier (max 256 chars). | | `context?` | `TranscriptionContext` | Additional context to improve transcription accuracy. | | `enable_endpoint_detection?` | `boolean` | Enable endpoint detection for utterance boundaries. Useful for voice AI agents. | | `enable_language_identification?` | `boolean` | Enable automatic language detection. | | `enable_speaker_diarization?` | `boolean` | Enable speaker identification. | | `groupBy?` | `"translation"` \| `"language"` \| `"speaker"` \| (`token`) => `string` | Group tokens by a key for easy splitting (e.g. translation, language, speaker). - `'translation'` — group by `translation_status`: keys `"original"` and `"translation"` - `'language'` — group by token `language` field: keys are language codes - `'speaker'` — group by token `speaker` field: keys are speaker identifiers - `(token) => string` — custom grouping function **Auto-defaults** when `translation` config is provided: - `one_way` → `'translation'` - `two_way` → `'language'` | | `language_hints?` | `string`\[] | Expected languages in the audio (ISO language codes). | | `language_hints_strict?` | `boolean` | When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee. | | `max_endpoint_delay_ms?` | `number` | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms | | `model` | `string` | Speech-to-text model to use. | | `num_channels?` | `number` | Number of audio channels (required for raw audio formats). | | `onConnected?` | () => `void` | Called when the WebSocket connects. | | `onEndpoint?` | () => `void` | Called when an endpoint is detected. | | `onError?` | (`error`) => `void` | Called when an error occurs. | | `onFinished?` | () => `void` | Called when the recording session finishes. | | `onResult?` | (`result`) => `void` | Called on each result from the server. | | `onSourceMuted?` | () => `void` | Called when the audio source is muted externally (e.g. OS-level or hardware mute). | | `onSourceUnmuted?` | () => `void` | Called when the audio source is unmuted after an external mute. | | `onStateChange?` | (`update`) => `void` | Called on each state transition. | | `permissions?` | `PermissionResolver` \| `null` | Permission resolver override (only used when `apiKey` is provided). Pass `null` to explicitly disable. | | `resetOnStart?` | `boolean` | Reset transcript state when `start()` is called. **Default** `true` | | `sample_rate?` | `number` | Sample rate in Hz (required for PCM formats). | | `session_options?` | `SttSessionOptions` | SDK-level session options (signal, etc.). | | `source?` | `AudioSource` | Custom audio source (bypasses default MicrophoneSource). | | `translation?` | `TranslationConfig` | Translation configuration. | | `wsBaseUrl?` | `string` | WebSocket URL override (only used when `apiKey` is provided). | *** ## UseRecordingReturn Immutable snapshot of the recording state exposed to React. **Extends** * [`RecordingSnapshot`](types#recordingsnapshot) **Properties** | Property | Type | Description | | ------------------- | ------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cancel` | () => `void` | Immediately cancel — does not wait for final results. | | `clearTranscript` | () => `void` | Clear transcript state (finalText, partialText, utterances, segments). | | `error` | `Error` \| `null` | Latest error, if any. | | `finalize` | (`options?`) => `void` | Request the server to finalize current non-final tokens. | | `finalText` | `string` | Accumulated finalized text. | | `finalTokens` | readonly `RealtimeToken`\[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with `partialTokens` for the complete ordered stream. | | `groups` | `Readonly`\<`Record`\<`string`, `TokenGroup`>> | Tokens grouped by the active `groupBy` strategy. Auto-populated when `translation` config is provided: - `one_way` → keys: `"original"`, `"translation"` - `two_way` → keys: language codes (e.g. `"en"`, `"es"`) Empty `{}` when no grouping is active. | | `isActive` | `boolean` | `true` when state is not idle/stopped/canceled/error. | | `isPaused` | `boolean` | `true` when `state === 'paused'`. | | `isRecording` | `boolean` | `true` when `state === 'recording'`. | | `isSourceMuted` | `boolean` | `true` when the audio source is muted externally (e.g. OS-level or hardware mute). | | `isSupported` | `boolean` | Whether the built-in browser `MicrophoneSource` is available. Custom `AudioSource` implementations work regardless of this value. | | `partialText` | `string` | Text from current non-final tokens. | | `partialTokens` | readonly `RealtimeToken`\[] | Non-final tokens from the latest result. | | `pause` | () => `void` | Pause recording — pauses audio capture and activates keepalive. | | `result` | `RealtimeResult` \| `null` | Latest raw result from the server. | | `resume` | () => `void` | Resume recording after pause. | | `segments` | readonly `RealtimeSegment`\[] | Accumulated final segments. | | `start` | () => `void` | Start a new recording. Aborts any in-flight recording first. | | `state` | `RecordingState` | Current recording lifecycle state. | | `stop` | () => `Promise`\<`void`> | Gracefully stop — waits for final results from the server. | | `text` | `string` | Full transcript: `finalText + partialText`. | | `tokens` | readonly `RealtimeToken`\[] | Tokens from the latest result message. | | `unsupportedReason` | [`UnsupportedReason`](types#unsupportedreason) \| `undefined` | Why the built-in `MicrophoneSource` is unavailable, if applicable. Custom `AudioSource` implementations bypass this check entirely. | | `utterances` | readonly `RealtimeUtterance`\[] | Accumulated utterances (one per endpoint). | *** ## AudioLevel() ```ts function AudioLevel(__namedParameters): ReactNode; ``` **Parameters** | Parameter | Type | | ------------------- | ------------------------------------------ | | `__namedParameters` | [`AudioLevelProps`](types#audiolevelprops) | **Returns** `ReactNode` *** ## SonioxProvider() ```ts function SonioxProvider(props): ReactNode; ``` **Parameters** | Parameter | Type | | --------- | -------------------------------------------------- | | `props` | [`SonioxProviderProps`](types#sonioxproviderprops) | **Returns** `ReactNode` *** ## checkAudioSupport() ```ts function checkAudioSupport(): AudioSupportResult; ``` Check whether the current environment supports the built-in browser `MicrophoneSource` (which uses `navigator.mediaDevices.getUserMedia`). This does **not** reflect general recording capability — custom `AudioSource` implementations (e.g. for React Native) bypass this check entirely and can record regardless of the result. **Returns** [`AudioSupportResult`](types#audiosupportresult) **Platform** browser *** ## useAudioLevel() ```ts function useAudioLevel(options?): UseAudioLevelReturn; ``` **Parameters** | Parameter | Type | | ---------- | ---------------------------------------------------- | | `options?` | [`UseAudioLevelOptions`](types#useaudioleveloptions) | **Returns** [`UseAudioLevelReturn`](types#useaudiolevelreturn) *** ## useMicrophonePermission() ```ts function useMicrophonePermission(options?): MicrophonePermissionState; ``` **Parameters** | Parameter | Type | | ---------- | ------------------------------------------------------------------------ | | `options?` | [`UseMicrophonePermissionOptions`](types#usemicrophonepermissionoptions) | **Returns** [`MicrophonePermissionState`](types#microphonepermissionstate) *** ## useRecording() ```ts function useRecording(config): UseRecordingReturn; ``` **Parameters** | Parameter | Type | | --------- | ------------------------------------------------ | | `config` | [`UseRecordingConfig`](types#userecordingconfig) | **Returns** [`UseRecordingReturn`](types#userecordingreturn) *** ## useSoniox() ```ts function useSoniox(): SonioxClient; ``` Returns the `SonioxClient` instance provided by the nearest `SonioxProvider` **Returns** `SonioxClient` **Throws** Error if called outside a `SonioxProvider` # Classes URL: /stt/SDKs/web-SDK/reference/classes Soniox Client SDK — Class Reference ## SonioxClient Main entry point for the Soniox client SDK. ### Example ```typescript const client = new SonioxClient({ api_key: async () => { const res = await fetch('/api/get-temporary-key', { method: 'POST' }); return (await res.json()).api_key; }, }); // High-level: record from microphone const recording = client.realtime.record({ model: 'stt-rt-v4' }); recording.on('result', (r) => console.log(r.tokens)); await recording.stop(); // Low-level: direct session access const session = client.realtime.stt({ model: 'stt-rt-v4' }, { api_key: key }); await session.connect(); ``` ### permissions ```ts get permissions(): PermissionResolver | undefined; ``` Permission resolver, if configured. Returns `undefined` if no resolver was provided (SSR-safe). **Example** ```typescript const mic = await client.permissions?.check('microphone'); if (mic?.status === 'denied') { showSettingsMessage(); } ``` **Returns** [`PermissionResolver`](types#permissionresolver) | `undefined` ### Constructor ```ts new SonioxClient(options): SonioxClient; ``` **Parameters** | Parameter | Type | | --------- | -------------------------------------------------- | | `options` | [`SonioxClientOptions`](types#sonioxclientoptions) | **Returns** `SonioxClient` ### Properties | Property | Type | Description | | ----------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `realtime` | \{ `record`: (`options`) => [`Recording`](classes#recording); `stt`: (`config`, `options`) => `RealtimeSttSession`; } | Real-time API namespace | | `realtime.record` | (`options`) => [`Recording`](classes#recording) | Start a high-level recording session. Returns synchronously so callers can attach event listeners before any async work (key fetch, mic access, connection) begins. | | `realtime.stt` | (`config`, `options`) => `RealtimeSttSession` | Create a low-level STT session | *** ## Recording High-level recording orchestrator Manages the lifecycle of audio capture and real-time transcription: 1. Starts audio source immediately (buffers chunks) 2. Resolves the API key (from string or async function) 3. Connects to the Soniox WebSocket API 4. Drains buffered audio, then pipes live audio to the session ### Example ```typescript const recording = client.realtime.record({ model: 'stt-rt-v4' }); recording.on('result', (r) => console.log(r.tokens)); recording.on('error', (e) => console.error(e)); // Later: await recording.stop(); ``` ### state ```ts get state(): RecordingState; ``` Current recording state **Returns** [`RecordingState`](types#recordingstate) ### cancel() ```ts cancel(): void; ``` Immediately cancel recording without waiting for final results **Returns** `void` *** ### finalize() ```ts finalize(options?): void; ``` Request the server to finalize current non-final tokens. **Parameters** | Parameter | Type | | ------------------------------ | -------------------------------------- | | `options?` | \{ `trailing_silence_ms?`: `number`; } | | `options.trailing_silence_ms?` | `number` | **Returns** `void` *** ### off() ```ts off(event, handler): this; ``` Remove an event handler **Type Parameters** | Type Parameter | | -------------------------------------------------------------- | | `E` *extends* keyof [`RecordingEvents`](types#recordingevents) | **Parameters** | Parameter | Type | | --------- | ------------------------------------------------ | | `event` | `E` | | `handler` | [`RecordingEvents`](types#recordingevents)\[`E`] | **Returns** `this` *** ### on() ```ts on(event, handler): this; ``` Register an event handler **Type Parameters** | Type Parameter | | -------------------------------------------------------------- | | `E` *extends* keyof [`RecordingEvents`](types#recordingevents) | **Parameters** | Parameter | Type | | --------- | ------------------------------------------------ | | `event` | `E` | | `handler` | [`RecordingEvents`](types#recordingevents)\[`E`] | **Returns** `this` *** ### once() ```ts once(event, handler): this; ``` Register a one-time event handler **Type Parameters** | Type Parameter | | -------------------------------------------------------------- | | `E` *extends* keyof [`RecordingEvents`](types#recordingevents) | **Parameters** | Parameter | Type | | --------- | ------------------------------------------------ | | `event` | `E` | | `handler` | [`RecordingEvents`](types#recordingevents)\[`E`] | **Returns** `this` *** ### pause() ```ts pause(): void; ``` Pause recording. Pauses the audio source (stops microphone capture) and pauses the session (activates automatic keepalive to prevent server disconnect). **Returns** `void` *** ### resume() ```ts resume(): void; ``` Resume recording after pause. Resumes the audio source and session. Audio capture and transmission continue from where they left off. **Returns** `void` *** ### stop() ```ts stop(): Promise; ``` Gracefully stop recording Stops the audio source and waits for the server to process all buffered audio and return final results. **Returns** `Promise`\<`void`> Promise that resolves when the server acknowledges completion *** ## MicrophoneSource Browser microphone audio source Uses `navigator.mediaDevices.getUserMedia` to capture audio from the microphone and `MediaRecorder` to encode it into chunks. ### Example ```typescript const source = new MicrophoneSource(); await source.start({ onData: (chunk) => session.sendAudio(chunk), onError: (err) => console.error(err), }); // Later: source.stop(); ``` ### Constructor ```ts new MicrophoneSource(options): MicrophoneSource; ``` **Parameters** | Parameter | Type | | --------- | ---------------------------------------------------------- | | `options` | [`MicrophoneSourceOptions`](types#microphonesourceoptions) | **Returns** `MicrophoneSource` ### pause() ```ts pause(): void; ``` Pause audio capture **Returns** `void` *** ### resume() ```ts resume(): void; ``` Resume audio capture **Returns** `void` *** ### start() ```ts start(handlers): Promise; ``` Request microphone access and start recording **Parameters** | Parameter | Type | | ---------- | -------------------------------------------------- | | `handlers` | [`AudioSourceHandlers`](types#audiosourcehandlers) | **Returns** `Promise`\<`void`> **Throws** AudioUnavailableError if getUserMedia or MediaRecorder is not supported **Throws** AudioPermissionError if microphone access is denied **Throws** AudioDeviceError if no microphone is found *** ### stop() ```ts stop(): void; ``` Stop recording and release all resources **Returns** `void` *** ## BrowserPermissionResolver Browser permission resolver for checking and requesting microphone access. ### Example ```typescript const resolver = new BrowserPermissionResolver(); const mic = await resolver.check('microphone'); if (mic.status === 'prompt') { const result = await resolver.request('microphone'); if (result.status === 'denied') { showDeniedMessage(); } } ``` ### Constructor ```ts new BrowserPermissionResolver(): BrowserPermissionResolver; ``` **Returns** `BrowserPermissionResolver` ### check() ```ts check(permission): Promise; ``` Check current microphone permission status without prompting the user. **Parameters** | Parameter | Type | | ------------ | -------------- | | `permission` | `"microphone"` | **Returns** `Promise`\<[`PermissionResult`](types#permissionresult)> *** ### request() ```ts request(permission): Promise; ``` Request microphone permission from the user. This may show a browser permission prompt. **Parameters** | Parameter | Type | | ------------ | -------------- | | `permission` | `"microphone"` | **Returns** `Promise`\<[`PermissionResult`](types#permissionresult)> *** ## AudioPermissionError Thrown when microphone access is denied by the user or blocked by the browser. Maps to `getUserMedia` `NotAllowedError` DOMException. ### Extends * `SonioxError` ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** ```ts SonioxError.toJSON ``` *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** ```ts SonioxError.toString ``` ### Properties | Property | Type | Description | | ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## AudioDeviceError Thrown when no audio input device is found Maps to `getUserMedia` `NotFoundError` DOMException. ### Extends * `SonioxError` ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** ```ts SonioxError.toJSON ``` *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** ```ts SonioxError.toString ``` ### Properties | Property | Type | Description | | ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | *** ## AudioUnavailableError Thrown when audio capture is not supported in the current environment For example, when `getUserMedia` or `MediaRecorder` is not available. ### Extends * `SonioxError` ### toJSON() ```ts toJSON(): Record; ``` Converts to a plain object for logging/serialization **Returns** `Record`\<`string`, `unknown`> **Inherited from** ```ts SonioxError.toJSON ``` *** ### toString() ```ts toString(): string; ``` Creates a human-readable string representation **Returns** `string` **Inherited from** ```ts SonioxError.toString ``` ### Properties | Property | Type | Description | | ------------ | --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | | `cause` | `unknown` | The underlying error that caused this error, if any. | | `code` | \| `SonioxErrorCode` \| `string` & \{ } | Error code describing the type of error. Typed as `string` at the base level to allow subclasses (e.g. HTTP errors) to use their own error code unions. | | `statusCode` | `number` \| `undefined` | HTTP status code when applicable (e.g., 401 for auth errors, 500 for server errors). | # Full Web SDK reference URL: /stt/SDKs/web-SDK/reference Full SDK reference for the Web SDK ## Client ### Available client methods | Method | Description | | --------------------------------------------------------------------------- | --------------------------- | | [`client.realtime.record()`](/stt/SDKs/web-SDK/reference/classes#recording) | Create a recording instance | ## Recording ### Available recording methods | Method | Description | | ---------------------------------------------------------------------- | ------------------------------------------------------- | | [`recording.finalize()`](/stt/SDKs/web-SDK/reference/classes#finalize) | Request the server to finalize current non-final tokens | | [`recording.on()`](/stt/SDKs/web-SDK/reference/classes#on) | Register an event handler | | [`recording.once()`](/stt/SDKs/web-SDK/reference/classes#once) | Register a one-time event handler | | [`recording.off()`](/stt/SDKs/web-SDK/reference/classes#off) | Remove an event handler | | [`recording.pause()`](/stt/SDKs/web-SDK/reference/classes#pause) | Pause recording | | [`recording.resume()`](/stt/SDKs/web-SDK/reference/classes#resume) | Resume recording | | [`recording.stop()`](/stt/SDKs/web-SDK/reference/classes#stop) | Stop recording | | [`recording.cancel()`](/stt/SDKs/web-SDK/reference/classes#cancel) | Cancel recording | ## AudioSource ### Available audio source methods | Method | Description | | ------------------------------------------------------------------ | ---------------------- | | [`source.start()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Start capturing audio | | [`source.stop()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Stop capturing audio | | [`source.pause()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Pause capturing audio | | [`source.resume()`](/stt/SDKs/web-SDK/reference/types#audiosource) | Resume capturing audio | ## PermissionResolver ### Available browser permission resolver methods | Method | Description | | ---------------------------------------------------------------------------- | -------------------------------- | | [`resolver.check()`](/stt/SDKs/web-SDK/reference/types#permissionresolver) | Check current permission status | | [`resolver.request()`](/stt/SDKs/web-SDK/reference/types#permissionresolver) | Request permission from the user | # Types URL: /stt/SDKs/web-SDK/reference/types Soniox Client SDK — Types Reference ## ApiKeyConfig ```ts type ApiKeyConfig = string | () => Promise; ``` API key configuration. * `string` - A pre-fetched temporary API key (e.g., injected from SSR) * `() => Promise` - An async function that fetches a fresh temporary key from your backend. Called once per recording session. **Example** ```typescript // Static key (for demos or SSR-injected keys) const client = new SonioxClient({ api_key: 'temp:...' }); // Async function (recommended for production) const client = new SonioxClient({ api_key: async () => { const res = await fetch('/api/get-temporary-key', { method: 'POST' }); const { api_key } = await res.json(); return api_key; }, }); ``` Note: If you use Node.js, you can use the `SonioxNodeClient` to fetch a temporary API key via `client.auth.createTemporaryKey()`. *** ## AudioErrorCode ```ts type AudioErrorCode = "permission_denied" | "device_not_found" | "audio_unavailable"; ``` Error codes for audio-related errors *** ## AudioSourceHandlers ```ts type AudioSourceHandlers = { onData: (chunk) => void; onError: (error) => void; onMuted?: () => void; onUnmuted?: () => void; }; ``` Callbacks for receiving audio data and errors from an AudioSource. **Properties** | Property | Type | Description | | ------------ | ------------------- | ---------------------------------------------------------------------------------- | | `onData` | (`chunk`) => `void` | Called when an audio chunk is available. | | `onError` | (`error`) => `void` | Called when a runtime error occurs during audio capture (after start). | | `onMuted?` | () => `void` | Called when the audio source is muted externally (e.g. OS-level or hardware mute). | | `onUnmuted?` | () => `void` | Called when the audio source is unmuted after an external mute. | *** ## MicrophoneSourceOptions ```ts type MicrophoneSourceOptions = { constraints?: MediaTrackConstraints; recorderOptions?: MediaRecorderOptions; timesliceMs?: number; }; ``` Options for MicrophoneSource **Properties** | Property | Type | Description | | ------------------ | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `constraints?` | `MediaTrackConstraints` | MediaTrackConstraints for the audio track. **Default** `{ echoCancellation: false, noiseSuppression: false, autoGainControl: false, channelCount: 1, sampleRate: 44100 }` | | `recorderOptions?` | `MediaRecorderOptions` | MediaRecorder options. **See** [https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/MediaRecorder](https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/MediaRecorder) | | `timesliceMs?` | `number` | Time interval in milliseconds between audio data chunks. **Default** `60` | *** ## PermissionResult ```ts type PermissionResult = { can_request: boolean; status: PermissionStatus; }; ``` Result of a permission check or request. **Properties** | Property | Type | Description | | ------------- | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `can_request` | `boolean` | Whether the user can be prompted again. `false` means permanently denied (e.g., browser "Block" or iOS settings). Useful for showing "go to settings" instructions. | | `status` | [`PermissionStatus`](types#permissionstatus) | Current permission status. | *** ## PermissionStatus ```ts type PermissionStatus = "granted" | "denied" | "prompt" | "unavailable"; ``` Unified permission status across all platforms. *** ## PermissionType ```ts type PermissionType = "microphone"; ``` Permission types supported by the resolver. *** ## RecordOptions ```ts type RecordOptions = SttSessionConfig & { buffer_queue_size?: number; session_options?: SttSessionOptions; signal?: AbortSignal; source?: AudioSource; }; ``` Options for creating a recording **Type Declaration** | Name | Type | Description | | -------------------- | ---------------------------------- | -------------------------------------------------------------------------------------------- | | `buffer_queue_size?` | `number` | Maximum number of audio chunks to buffer while waiting for key/connection **Default** `1000` | | `session_options?` | `SttSessionOptions` | SDK-level session options (signal, etc.) | | `signal?` | `AbortSignal` | AbortSignal for cancellation | | `source?` | [`AudioSource`](types#audiosource) | Audio source to use. Defaults to MicrophoneSource if not provided. | *** ## RecordingEvents ```ts type RecordingEvents = { connected: () => void; endpoint: () => void; error: (error) => void; finalized: () => void; finished: () => void; result: (result) => void; source_muted: () => void; source_unmuted: () => void; state_change: (update) => void; token: (token) => void; }; ``` Events emitted by a Recording instance **Properties** | Property | Type | Description | | ---------------- | -------------------- | ------------------------------------------------------------------- | | `connected` | () => `void` | WebSocket connected and ready. | | `endpoint` | () => `void` | Endpoint detected (speaker finished talking). | | `error` | (`error`) => `void` | Error occurred during recording. | | `finalized` | () => `void` | Finalization complete. | | `finished` | () => `void` | Recording finished (server acknowledged end of stream). | | `result` | (`result`) => `void` | Parsed result received from the server. | | `source_muted` | () => `void` | Audio source was muted externally (e.g. OS-level or hardware mute). | | `source_unmuted` | () => `void` | Audio source was unmuted after an external mute. | | `state_change` | (`update`) => `void` | Recording state transition. | | `token` | (`token`) => `void` | Individual token received. | *** ## RecordingState ```ts type RecordingState = | "idle" | "starting" | "connecting" | "recording" | "paused" | "stopping" | "stopped" | "error" | "canceled"; ``` Unified recording lifecycle states. *** ## SonioxClientOptions ```ts type SonioxClientOptions = { api_key: ApiKeyConfig; buffer_queue_size?: number; default_session_options?: SttSessionOptions; permissions?: PermissionResolver; ws_base_url?: string; }; ``` Options for creating a SonioxClient instance. **Properties** | Property | Type | Description | | -------------------------- | ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `api_key` | [`ApiKeyConfig`](types#apikeyconfig) | API key configuration. - `string` - A pre-fetched temporary API key (e.g., injected from SSR) - `() => Promise` - Async function that fetches a fresh key from your backend | | `buffer_queue_size?` | `number` | Default maximum number of audio chunks to buffer while waiting for key/connection. Can be overridden per-recording. **Default** `1000` | | `default_session_options?` | `SttSessionOptions` | Default session options applied to all sessions. Can be overridden per-recording. | | `permissions?` | [`PermissionResolver`](types#permissionresolver) | Optional permission resolver for pre-flight microphone permission checks. Not set by default (SSR-safe, RN-safe). **Example** `import { BrowserPermissionResolver } from '@soniox/client'; const client = new SonioxClient({ api_key: fetchKey, permissions: new BrowserPermissionResolver(), });` | | `ws_base_url?` | `string` | WebSocket URL for real-time connections. **Default** `'wss://stt-rt.soniox.com/transcribe-websocket'` | *** ## SttOptions ```ts type SttOptions = { api_key: string; session_options?: SttSessionOptions; }; ``` Options for creating a low-level STT session. **Properties** | Property | Type | Description | | ------------------ | ------------------- | ---------------------------------------- | | `api_key` | `string` | Resolved API key string (temporary key). | | `session_options?` | `SttSessionOptions` | Session options (signal, etc.). | *** ## AudioSource Platform-agnostic audio source interface. Implementations must: * Begin capturing audio in `start()` and deliver chunks via `handlers.onData` * Stop all capture and release resources in `stop()` * Throw typed errors from `start()` if capture cannot begin (e.g., permission denied) **Example** ```typescript // Built-in browser source const source = new MicrophoneSource(); // Custom source (e.g., React Native) class MyAudioSource implements AudioSource { async start(handlers: AudioSourceHandlers) { ... } stop() { ... } } ``` **Methods** **pause()?** ```ts optional pause(): void; ``` Pause audio capture (optional). When paused, no data should be delivered via onData. **Returns** `void` *** **resume()?** ```ts optional resume(): void; ``` Resume audio capture after pause (optional). **Returns** `void` *** **start()** ```ts start(handlers): Promise; ``` Start capturing audio. **Parameters** | Parameter | Type | Description | | ---------- | -------------------------------------------------- | ----------------------------------- | | `handlers` | [`AudioSourceHandlers`](types#audiosourcehandlers) | Callbacks for audio data and errors | **Returns** `Promise`\<`void`> **Throws** AudioPermissionError if microphone access is denied **Throws** AudioDeviceError if no audio device is found **Throws** AudioUnavailableError if audio capture is not supported *** **stop()** ```ts stop(): void; ``` Stop capturing audio and release all resources. Safe to call multiple times. **Returns** `void` *** ## PermissionResolver Platform-agnostic permission resolver. Implementations handle platform-specific permission APIs: * Browser: `navigator.permissions.query` + `getUserMedia` * React Native: `expo-av` or `react-native-permissions` **Example** ```typescript // Check before recording const mic = await resolver.check('microphone'); if (mic.status === 'denied' && !mic.can_request) { showGoToSettingsMessage(); } ``` **Methods** **check()** ```ts check(permission): Promise; ``` Check current permission status WITHOUT prompting the user. **Parameters** | Parameter | Type | | ------------ | -------------- | | `permission` | `"microphone"` | **Returns** `Promise`\<[`PermissionResult`](types#permissionresult)> *** **request()** ```ts request(permission): Promise; ``` Request permission from the user (may show a system prompt). On platforms where status is already 'granted', this is a no-op. **Parameters** | Parameter | Type | | ------------ | -------------- | | `permission` | `"microphone"` | **Returns** `Promise`\<[`PermissionResult`](types#permissionresult)> *** ## resolveApiKey() ```ts function resolveApiKey(config): Promise; ``` Resolves an ApiKeyConfig to a plain API key string. **Parameters** | Parameter | Type | Description | | --------- | ------------------------------------ | ------------------------- | | `config` | [`ApiKeyConfig`](types#apikeyconfig) | The API key configuration | **Returns** `Promise`\<`string`> The resolved API key string **Throws** If the function rejects or returns a non-string value # Create temporary API key URL: /stt/api-reference/auth/create_temporary_api_key Creates a short-lived API key for specific temporary use cases. The key will automatically expire after the specified duration. ## Create temporary API key **Endpoint:** `POST /v1/auth/temporary-api-key` Creates a short-lived API key for specific temporary use cases. The key will automatically expire after the specified duration. ### Request Body Content-Type: `application/json` (Required) Example (JSON): ```json { "client_reference_id": "reference_id", "expires_in_seconds": 1800, "usage_type": "transcribe_websocket" } ``` Schema (YAML Structural Definition): ```yaml properties: usage_type: description: Intended usage of the temporary API key. enum: - transcribe_websocket type: string expires_in_seconds: description: Duration in seconds until the temporary API key expires. maximum: 3600 minimum: 1 type: integer client_reference_id: anyOf: - maxLength: 256 type: string - type: 'null' description: Optional tracking identifier string. Does not need to be unique. required: - usage_type - expires_in_seconds type: object ``` ### Responses * **201**: Created temporary API key. Example (JSON): ```json { "api_key": "temp:WYJ67RBEFUWQXXPKYPD2UGXKWB", "expires_at": "2025-02-22T22:47:37.150Z" } ``` Schema (YAML Structural Definition): ```yaml properties: api_key: description: Created temporary API key. type: string expires_at: description: UTC timestamp indicating when generated temporary API key will expire. format: date-time type: string required: - api_key - expires_at type: object ``` * **400**: Invalid request. Error types: * `invalid_request`: Invalid request. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Delete file URL: /stt/api-reference/files/delete_file Permanently deletes specified file. ## Delete file **Endpoint:** `DELETE /v1/files/{file_id}` Permanently deletes specified file. ### Parameters * `file_id` (path) (Required): ### Responses * **204**: File deleted. * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **404**: File not found. Error types: * `file_not_found`: File could not be found. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Get file URL: /stt/api-reference/files/get_file Retrieve metadata for an uploaded file. ## Get file **Endpoint:** `GET /v1/files/{file_id}` Retrieve metadata for an uploaded file. ### Parameters * `file_id` (path) (Required): ### Responses * **200**: File metadata. Example (JSON): ```json { "client_reference_id": "some_internal_id", "created_at": "2024-11-26T00:00:00Z", "filename": "example.mp3", "id": "84c32fc6-4fb5-4e7a-b656-b5ec70493753", "size": 123456 } ``` Schema (YAML Structural Definition): ```yaml description: File metadata. properties: id: description: Unique identifier of the file. format: uuid type: string filename: description: Name of the file. type: string size: description: Size of the file in bytes. type: integer created_at: description: UTC timestamp indicating when the file was uploaded. format: date-time type: string client_reference_id: anyOf: - type: string - type: 'null' description: Tracking identifier string. required: - id - filename - size - created_at type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **404**: File not found. Error types: * `file_not_found`: File could not be found. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Get files URL: /stt/api-reference/files/get_files Retrieves list of uploaded files. ## Get files **Endpoint:** `GET /v1/files` Retrieves list of uploaded files. ### Parameters * `limit` (query): Maximum number of files to return. * `cursor` (query): Pagination cursor for the next page of results. ### Responses * **200**: List of files. Example (JSON): ```json { "files": [ { "created_at": "2024-11-26T00:00:00Z", "filename": "example.mp3", "id": "84c32fc6-4fb5-4e7a-b656-b5ec70493753", "size": 123456 } ], "next_page_cursor": "cursor_or_null" } ``` Schema (YAML Structural Definition): ```yaml description: A list of files. properties: files: description: List of uploaded files. items: description: File metadata. example: client_reference_id: some_internal_id created_at: '2024-11-26T00:00:00Z' filename: example.mp3 id: 84c32fc6-4fb5-4e7a-b656-b5ec70493753 size: 123456 properties: id: description: Unique identifier of the file. format: uuid type: string filename: description: Name of the file. type: string size: description: Size of the file in bytes. type: integer created_at: description: UTC timestamp indicating when the file was uploaded. format: date-time type: string client_reference_id: anyOf: - type: string - type: 'null' description: Tracking identifier string. required: - id - filename - size - created_at type: object type: array next_page_cursor: anyOf: - type: string - type: 'null' description: >- A pagination token that references the next page of results. When more data is available, this field contains a value to pass in the cursor parameter of a subsequent request. When null, no additional results are available. required: - files type: object ``` * **400**: Invalid request. Error types: * `invalid_cursor`: Invalid cursor parameter. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Upload file URL: /stt/api-reference/files/upload_file Uploads a new file. ## Upload file **Endpoint:** `POST /v1/files` Uploads a new file. ### Request Body Content-Type: `multipart/form-data` (Required) Schema (YAML Structural Definition): ```yaml type: object properties: client_reference_id: anyOf: - maxLength: 256 type: string - type: 'null' description: Optional tracking identifier string. Does not need to be unique. file: description: >- The file to upload. Original file name will be used unless a custom filename is provided. format: binary type: string required: - file ``` ### Responses * **201**: Uploaded file. Example (JSON): ```json { "client_reference_id": "some_internal_id", "created_at": "2024-11-26T00:00:00Z", "filename": "example.mp3", "id": "84c32fc6-4fb5-4e7a-b656-b5ec70493753", "size": 123456 } ``` Schema (YAML Structural Definition): ```yaml description: File metadata. properties: id: description: Unique identifier of the file. format: uuid type: string filename: description: Name of the file. type: string size: description: Size of the file in bytes. type: integer created_at: description: UTC timestamp indicating when the file was uploaded. format: date-time type: string client_reference_id: anyOf: - type: string - type: 'null' description: Tracking identifier string. required: - id - filename - size - created_at type: object ``` * **400**: Invalid request. Error types: * `invalid_request`: * Invalid request. * Exceeded maximum file size (maximum is 1073741824 bytes). Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Get models URL: /stt/api-reference/models/get_models Retrieves list of available models and their attributes. ## Get models **Endpoint:** `GET /v1/models` Retrieves list of available models and their attributes. ### Responses * **200**: List of available models and their attributes. Example (JSON): ```json { "models": [ { "aliased_model_id": null, "context_version": 2, "id": "stt-rt-v4", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Real-time v4", "one_way_translation": "all_languages", "supports_language_hints_strict": true, "supports_max_endpoint_delay": true, "transcription_mode": "real_time", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": null, "context_version": 2, "id": "stt-rt-v3", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Real-time v3", "one_way_translation": "all_languages", "supports_language_hints_strict": true, "supports_max_endpoint_delay": false, "transcription_mode": "real_time", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": null, "context_version": 2, "id": "stt-async-v4", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Async v4", "one_way_translation": "all_languages", "supports_language_hints_strict": true, "supports_max_endpoint_delay": false, "transcription_mode": "async", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": null, "context_version": 2, "id": "stt-async-v3", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Async v3", "one_way_translation": "all_languages", "supports_language_hints_strict": false, "supports_max_endpoint_delay": false, "transcription_mode": "async", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": "stt-rt-v3", "context_version": 2, "id": "stt-rt-preview", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Real-time Preview", "one_way_translation": "all_languages", "supports_language_hints_strict": true, "supports_max_endpoint_delay": false, "transcription_mode": "real_time", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": "stt-async-v3", "context_version": 2, "id": "stt-async-preview", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Async Preview", "one_way_translation": "all_languages", "supports_language_hints_strict": false, "supports_max_endpoint_delay": false, "transcription_mode": "async", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": "stt-rt-v3", "context_version": 2, "id": "stt-rt-v3-preview", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Real-time v3 Preview", "one_way_translation": "all_languages", "supports_language_hints_strict": true, "supports_max_endpoint_delay": false, "transcription_mode": "real_time", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": "stt-rt-v3", "context_version": 2, "id": "stt-rt-preview-v2", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Real-time Preview v2", "one_way_translation": "all_languages", "supports_language_hints_strict": true, "supports_max_endpoint_delay": false, "transcription_mode": "real_time", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] }, { "aliased_model_id": "stt-async-v3", "context_version": 2, "id": "stt-async-preview-v1", "languages": [ { "code": "af", "name": "Afrikaans" }, { "code": "sq", "name": "Albanian" }, { "code": "ar", "name": "Arabic" }, { "code": "az", "name": "Azerbaijani" }, { "code": "eu", "name": "Basque" }, { "code": "be", "name": "Belarusian" }, { "code": "bn", "name": "Bengali" }, { "code": "bs", "name": "Bosnian" }, { "code": "bg", "name": "Bulgarian" }, { "code": "ca", "name": "Catalan" }, { "code": "zh", "name": "Chinese" }, { "code": "hr", "name": "Croatian" }, { "code": "cs", "name": "Czech" }, { "code": "da", "name": "Danish" }, { "code": "nl", "name": "Dutch" }, { "code": "en", "name": "English" }, { "code": "et", "name": "Estonian" }, { "code": "fi", "name": "Finnish" }, { "code": "fr", "name": "French" }, { "code": "gl", "name": "Galician" }, { "code": "de", "name": "German" }, { "code": "el", "name": "Greek" }, { "code": "gu", "name": "Gujarati" }, { "code": "he", "name": "Hebrew" }, { "code": "hi", "name": "Hindi" }, { "code": "hu", "name": "Hungarian" }, { "code": "id", "name": "Indonesian" }, { "code": "it", "name": "Italian" }, { "code": "ja", "name": "Japanese" }, { "code": "kn", "name": "Kannada" }, { "code": "kk", "name": "Kazakh" }, { "code": "ko", "name": "Korean" }, { "code": "lv", "name": "Latvian" }, { "code": "lt", "name": "Lithuanian" }, { "code": "mk", "name": "Macedonian" }, { "code": "ms", "name": "Malay" }, { "code": "ml", "name": "Malayalam" }, { "code": "mr", "name": "Marathi" }, { "code": "no", "name": "Norwegian" }, { "code": "fa", "name": "Persian" }, { "code": "pl", "name": "Polish" }, { "code": "pt", "name": "Portuguese" }, { "code": "pa", "name": "Punjabi" }, { "code": "ro", "name": "Romanian" }, { "code": "ru", "name": "Russian" }, { "code": "sr", "name": "Serbian" }, { "code": "sk", "name": "Slovak" }, { "code": "sl", "name": "Slovenian" }, { "code": "es", "name": "Spanish" }, { "code": "sw", "name": "Swahili" }, { "code": "sv", "name": "Swedish" }, { "code": "tl", "name": "Tagalog" }, { "code": "ta", "name": "Tamil" }, { "code": "te", "name": "Telugu" }, { "code": "th", "name": "Thai" }, { "code": "tr", "name": "Turkish" }, { "code": "uk", "name": "Ukrainian" }, { "code": "ur", "name": "Urdu" }, { "code": "vi", "name": "Vietnamese" }, { "code": "cy", "name": "Welsh" } ], "name": "Speech-to-Text Async Preview v1", "one_way_translation": "all_languages", "supports_language_hints_strict": false, "supports_max_endpoint_delay": false, "transcription_mode": "async", "translation_targets": [], "two_way_translation": "all_languages", "two_way_translation_pairs": [] } ] } ``` Schema (YAML Structural Definition): ```yaml properties: models: description: List of available models and their attributes. items: properties: id: description: Unique identifier of the model. type: string aliased_model_id: anyOf: - type: string - type: 'null' description: If this is an alias, the id of the aliased model. name: description: Name of the model. type: string context_version: anyOf: - type: integer - type: 'null' description: Version of context supported. transcription_mode: description: Transcription mode of the model. enum: - real_time - async type: string languages: description: List of languages supported by the model. items: properties: code: description: 2-letter language code. type: string name: description: Language name. type: string required: - code - name type: object type: array supports_language_hints_strict: type: boolean supports_max_endpoint_delay: type: boolean translation_targets: description: >- List of supported one-way translation targets. If list is empty, check for one_way_translation field items: properties: target_language: type: string source_languages: items: type: string type: array exclude_source_languages: items: type: string type: array required: - target_language - source_languages - exclude_source_languages type: object type: array two_way_translation_pairs: description: >- List of supported two-way translation pairs. If list is empty, check for two_way_translation field items: type: string type: array one_way_translation: anyOf: - type: string - type: 'null' description: >- When contains string 'all_languages', any laguage from languages can be used two_way_translation: anyOf: - type: string - type: 'null' description: >- When contains string 'all_languages',' any laguage pair from languages can be used required: - id - aliased_model_id - name - context_version - transcription_mode - languages - supports_language_hints_strict - supports_max_endpoint_delay - translation_targets - two_way_translation_pairs - one_way_translation - two_way_translation type: object type: array required: - models type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Create transcription URL: /stt/api-reference/transcriptions/create_transcription Creates a new transcription. ## Create transcription **Endpoint:** `POST /v1/transcriptions` Creates a new transcription. ### Request Body Content-Type: `application/json` (Required) Schema (YAML Structural Definition): ```yaml properties: model: description: Speech-to-text model to use for the transcription. maxLength: 32 type: string audio_url: anyOf: - maxLength: 4096 pattern: ^https?://[^\s]+$ type: string - type: 'null' description: >- URL of the audio file to transcribe. Cannot be specified if `file_id` is specified. file_id: anyOf: - format: uuid type: string - type: 'null' description: >- ID of the uploaded file to transcribe. Cannot be specified if `audio_url` is specified. language_hints: anyOf: - items: maxLength: 10 type: string maxItems: 100 type: array - type: 'null' description: >- Expected languages in the audio. If not specified, languages are automatically detected. language_hints_strict: anyOf: - type: boolean - type: 'null' description: When `true`, the model will rely more on language hints. enable_speaker_diarization: anyOf: - type: boolean - type: 'null' description: >- When `true`, speakers are identified and separated in the transcription output. enable_language_identification: anyOf: - type: boolean - type: 'null' description: When `true`, language is detected for each part of the transcription. translation: anyOf: - properties: type: enum: - one_way - two_way type: string target_language: anyOf: - type: string - type: 'null' language_a: anyOf: - type: string - type: 'null' language_b: anyOf: - type: string - type: 'null' required: - type type: object - type: 'null' description: Translation configuration. context: anyOf: - properties: general: anyOf: - items: properties: key: description: Item key (e.g. "Domain"). type: string value: description: Item value (e.g. "medicine"). type: string required: - key - value type: object type: array - type: 'null' description: General context items. text: anyOf: - type: string - type: 'null' description: Text context. terms: anyOf: - items: type: string type: array - type: 'null' description: Terms that might occur in speech. translation_terms: anyOf: - items: properties: source: description: Source term. type: string target: description: Target term to translate to. type: string required: - source - target type: object type: array - type: 'null' description: >- Hints how to translate specific terms. Ignored if translation is not enabled. type: object - type: string - type: 'null' description: >- Additional context to improve transcription accuracy and formatting of specialized terms. webhook_url: anyOf: - maxLength: 256 pattern: ^https?://[^\s]+$ type: string - type: 'null' description: >- URL to receive webhook notifications when transcription is completed or fails. webhook_auth_header_name: anyOf: - maxLength: 256 type: string - type: 'null' description: Name of the authentication header sent with webhook notifications. webhook_auth_header_value: anyOf: - maxLength: 256 type: string - type: 'null' description: Authentication header value sent with webhook notifications. client_reference_id: anyOf: - maxLength: 256 type: string - type: 'null' description: Optional tracking identifier string. Does not need to be unique. required: - model type: object ``` ### Responses * **201**: Created transcription. Example (JSON): ```json { "audio_duration_ms": 0, "audio_url": "https://soniox.com/media/examples/coffee_shop.mp3", "client_reference_id": "some_internal_id", "created_at": "2024-11-26T00:00:00Z", "error_message": null, "error_type": null, "file_id": null, "filename": "coffee_shop.mp3", "id": "73d4357d-cad2-4338-a60d-ec6f2044f721", "language_hints": [ "en", "fr" ], "model": "stt-async-preview", "status": "queued", "webhook_auth_header_name": "Authorization", "webhook_auth_header_value": "******************", "webhook_status_code": null, "webhook_url": "https://example.com/webhook" } ``` Schema (YAML Structural Definition): ```yaml description: A transcription. properties: id: description: Unique identifier for the transcription request. format: uuid type: string status: description: Transcription status. enum: - queued - processing - completed - error type: string created_at: description: UTC timestamp indicating when the transcription was created. format: date-time type: string model: description: Speech-to-text model used for the transcription. type: string audio_url: anyOf: - type: string - type: 'null' description: URL of the file being transcribed. file_id: anyOf: - format: uuid type: string - type: 'null' description: ID of the file being transcribed. filename: description: Name of the file being transcribed. type: string language_hints: anyOf: - items: type: string type: array - type: 'null' description: >- Expected languages in the audio. If not specified, languages are automatically detected. enable_speaker_diarization: description: >- When `true`, speakers are identified and separated in the transcription output. type: boolean enable_language_identification: description: When `true`, language is detected for each part of the transcription. type: boolean audio_duration_ms: anyOf: - type: integer - type: 'null' description: >- Duration of the audio in milliseconds. Only available after processing begins. error_type: anyOf: - type: string - type: 'null' description: >- Error type if transcription failed. `null` for successful or in-progress transcriptions. error_message: anyOf: - type: string - type: 'null' description: >- Error message if transcription failed. `null` for successful or in-progress transcriptions. webhook_url: anyOf: - type: string - type: 'null' description: >- URL to receive webhook notifications when transcription is completed or fails. webhook_auth_header_name: anyOf: - type: string - type: 'null' description: Name of the authentication header sent with webhook notifications. webhook_auth_header_value: anyOf: - type: string - type: 'null' description: >- Authentication header value. Always returned masked as `******************`. webhook_status_code: anyOf: - type: integer - type: 'null' description: >- HTTP status code received from your server when webhook was delivered. `null` if not yet sent. client_reference_id: anyOf: - type: string - type: 'null' description: Tracking identifier string. required: - id - status - created_at - model - filename - enable_speaker_diarization - enable_language_identification type: object ``` * **400**: Invalid request. Error types: * `invalid_request`: Invalid request. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Delete transcription URL: /stt/api-reference/transcriptions/delete_transcription Permanently deletes a transcription and its associated files. Cannot delete transcriptions that are currently processing. ## Delete transcription **Endpoint:** `DELETE /v1/transcriptions/{transcription_id}` Permanently deletes a transcription and its associated files. Cannot delete transcriptions that are currently processing. ### Parameters * `transcription_id` (path) (Required): ### Responses * **204**: Transcription deleted. * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **404**: Transcription not found. Error types: * `transcription_not_found`: Transcription could not be found. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **409**: Invalid transcription state. Error types: * `transcription_invalid_state`: * Cannot delete transcription with processing status. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Get transcription URL: /stt/api-reference/transcriptions/get_transcription Retrieves detailed information about a specific transcription. ## Get transcription **Endpoint:** `GET /v1/transcriptions/{transcription_id}` Retrieves detailed information about a specific transcription. ### Parameters * `transcription_id` (path) (Required): ### Responses * **200**: Transcription details. Example (JSON): ```json { "audio_duration_ms": 0, "audio_url": "https://soniox.com/media/examples/coffee_shop.mp3", "client_reference_id": "some_internal_id", "created_at": "2024-11-26T00:00:00Z", "error_message": null, "error_type": null, "file_id": null, "filename": "coffee_shop.mp3", "id": "73d4357d-cad2-4338-a60d-ec6f2044f721", "language_hints": [ "en", "fr" ], "model": "stt-async-preview", "status": "queued", "webhook_auth_header_name": "Authorization", "webhook_auth_header_value": "******************", "webhook_status_code": null, "webhook_url": "https://example.com/webhook" } ``` Schema (YAML Structural Definition): ```yaml description: A transcription. properties: id: description: Unique identifier for the transcription request. format: uuid type: string status: description: Transcription status. enum: - queued - processing - completed - error type: string created_at: description: UTC timestamp indicating when the transcription was created. format: date-time type: string model: description: Speech-to-text model used for the transcription. type: string audio_url: anyOf: - type: string - type: 'null' description: URL of the file being transcribed. file_id: anyOf: - format: uuid type: string - type: 'null' description: ID of the file being transcribed. filename: description: Name of the file being transcribed. type: string language_hints: anyOf: - items: type: string type: array - type: 'null' description: >- Expected languages in the audio. If not specified, languages are automatically detected. enable_speaker_diarization: description: >- When `true`, speakers are identified and separated in the transcription output. type: boolean enable_language_identification: description: When `true`, language is detected for each part of the transcription. type: boolean audio_duration_ms: anyOf: - type: integer - type: 'null' description: >- Duration of the audio in milliseconds. Only available after processing begins. error_type: anyOf: - type: string - type: 'null' description: >- Error type if transcription failed. `null` for successful or in-progress transcriptions. error_message: anyOf: - type: string - type: 'null' description: >- Error message if transcription failed. `null` for successful or in-progress transcriptions. webhook_url: anyOf: - type: string - type: 'null' description: >- URL to receive webhook notifications when transcription is completed or fails. webhook_auth_header_name: anyOf: - type: string - type: 'null' description: Name of the authentication header sent with webhook notifications. webhook_auth_header_value: anyOf: - type: string - type: 'null' description: >- Authentication header value. Always returned masked as `******************`. webhook_status_code: anyOf: - type: integer - type: 'null' description: >- HTTP status code received from your server when webhook was delivered. `null` if not yet sent. client_reference_id: anyOf: - type: string - type: 'null' description: Tracking identifier string. required: - id - status - created_at - model - filename - enable_speaker_diarization - enable_language_identification type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **404**: Transcription not found. Error types: * `transcription_not_found`: Transcription could not be found. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Get transcription transcript URL: /stt/api-reference/transcriptions/get_transcription_transcript Retrieves the full transcript text and detailed tokens for a completed transcription. Only available for successfully completed transcriptions. ## Get transcription transcript **Endpoint:** `GET /v1/transcriptions/{transcription_id}/transcript` Retrieves the full transcript text and detailed tokens for a completed transcription. Only available for successfully completed transcriptions. ### Parameters * `transcription_id` (path) (Required): ### Responses * **200**: Transcription transcript. Example (JSON): ```json { "id": "19b6d61d-02db-4c25-bc71-b4094dc310c8", "text": "Hello", "tokens": [ { "confidence": 0.95, "end_ms": 90, "start_ms": 10, "text": "Hel" }, { "confidence": 0.98, "end_ms": 160, "start_ms": 110, "text": "lo" } ] } ``` Schema (YAML Structural Definition): ```yaml description: The transcription text. properties: id: description: Unique identifier of the transcription this transcript belongs to. format: uuid type: string text: description: Complete transcribed text content. type: string tokens: description: List of detailed token information with timestamps and metadata. items: description: The transcript token. example: confidence: 0.95 end_ms: 90 start_ms: 10 text: Hel properties: text: description: Token text content. type: string start_ms: description: Start time of the token in milliseconds. type: integer end_ms: description: End time of the token in milliseconds. type: integer confidence: description: Confidence score of the token, between 0.0 and 1.0. type: number speaker: anyOf: - type: string - type: 'null' description: >- Speaker identifier. Only present when speaker diarization is enabled. language: anyOf: - type: string - type: 'null' description: >- Detected language code for this token. Only present when language identification is enabled. is_audio_event: anyOf: - type: boolean - type: 'null' description: >- Boolean indicating if this token represents an audio event. Only present when audio event detection is enabled. translation_status: anyOf: - type: string - type: 'null' description: >- Translation status ("none", "original" or "translation"). Only when if translation is enabled. required: - text - start_ms - end_ms - confidence type: object type: array required: - id - text - tokens type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **404**: Transcription not found. Error types: * `transcription_not_found`: Transcription could not be found. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **409**: Invalid transcription state. Error types: * `transcription_invalid_state`: * Can only get transcript with completed status. * File transcription has failed. * Transcript no longer available. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` # Get transcriptions URL: /stt/api-reference/transcriptions/get_transcriptions Retrieves list of transcriptions. ## Get transcriptions **Endpoint:** `GET /v1/transcriptions` Retrieves list of transcriptions. ### Parameters * `limit` (query): Maximum number of transcriptions to return. * `cursor` (query): Pagination cursor for the next page of results. ### Responses * **200**: A list of transcriptions. Schema (YAML Structural Definition): ```yaml properties: transcriptions: description: List of transcriptions. items: description: A transcription. example: audio_duration_ms: 0 audio_url: https://soniox.com/media/examples/coffee_shop.mp3 client_reference_id: some_internal_id created_at: '2024-11-26T00:00:00Z' error_message: null error_type: null file_id: null filename: coffee_shop.mp3 id: 73d4357d-cad2-4338-a60d-ec6f2044f721 language_hints: - en - fr model: stt-async-preview status: queued webhook_auth_header_name: Authorization webhook_auth_header_value: '******************' webhook_status_code: null webhook_url: https://example.com/webhook properties: id: description: Unique identifier for the transcription request. format: uuid type: string status: description: Transcription status. enum: - queued - processing - completed - error type: string created_at: description: UTC timestamp indicating when the transcription was created. format: date-time type: string model: description: Speech-to-text model used for the transcription. type: string audio_url: anyOf: - type: string - type: 'null' description: URL of the file being transcribed. file_id: anyOf: - format: uuid type: string - type: 'null' description: ID of the file being transcribed. filename: description: Name of the file being transcribed. type: string language_hints: anyOf: - items: type: string type: array - type: 'null' description: >- Expected languages in the audio. If not specified, languages are automatically detected. enable_speaker_diarization: description: >- When `true`, speakers are identified and separated in the transcription output. type: boolean enable_language_identification: description: >- When `true`, language is detected for each part of the transcription. type: boolean audio_duration_ms: anyOf: - type: integer - type: 'null' description: >- Duration of the audio in milliseconds. Only available after processing begins. error_type: anyOf: - type: string - type: 'null' description: >- Error type if transcription failed. `null` for successful or in-progress transcriptions. error_message: anyOf: - type: string - type: 'null' description: >- Error message if transcription failed. `null` for successful or in-progress transcriptions. webhook_url: anyOf: - type: string - type: 'null' description: >- URL to receive webhook notifications when transcription is completed or fails. webhook_auth_header_name: anyOf: - type: string - type: 'null' description: Name of the authentication header sent with webhook notifications. webhook_auth_header_value: anyOf: - type: string - type: 'null' description: >- Authentication header value. Always returned masked as `******************`. webhook_status_code: anyOf: - type: integer - type: 'null' description: >- HTTP status code received from your server when webhook was delivered. `null` if not yet sent. client_reference_id: anyOf: - type: string - type: 'null' description: Tracking identifier string. required: - id - status - created_at - model - filename - enable_speaker_diarization - enable_language_identification type: object type: array next_page_cursor: anyOf: - type: string - type: 'null' description: >- A pagination token that references the next page of results. When more data is available, this field contains a value to pass in the cursor parameter of a subsequent request. When null, no additional results are available. required: - transcriptions type: object ``` * **400**: Invalid request. Error types: * `invalid_cursor`: Invalid cursor parameter. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **401**: Authentication error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ``` * **500**: Internal server error. Schema (YAML Structural Definition): ```yaml properties: status_code: type: integer error_type: type: string message: type: string validation_errors: items: properties: error_type: type: string location: type: string message: type: string required: - error_type - location - message type: object type: array request_id: type: string required: - status_code - error_type - message - validation_errors - request_id type: object ```