## Soniox docs summary: Table of contents (links) - Community and support: https://soniox.com/docs/community-and-support - FAQ: https://soniox.com/docs/faq - Introduction: https://soniox.com/docs/ - AI engineering: https://soniox.com/docs/stt/ai-engineering - Data residency: https://soniox.com/docs/stt/data-residency - Get started: https://soniox.com/docs/stt/get-started - Models: https://soniox.com/docs/stt/models - Security and privacy: https://soniox.com/docs/stt/security-and-privacy - React Native SDK: https://soniox.com/docs/stt/SDKs/react-native-SDK - API reference: https://soniox.com/docs/stt/api-reference - WebSocket API: https://soniox.com/docs/stt/api-reference/websocket-api - Soniox Live: https://soniox.com/docs/stt/demo-apps/soniox-live - Async transcription: https://soniox.com/docs/stt/async/async-transcription - Async translation: https://soniox.com/docs/stt/async/async-translation - Error handling (async): https://soniox.com/docs/stt/async/error-handling - Limits & quotas (async): https://soniox.com/docs/stt/async/limits-and-quotas - Webhooks (async): https://soniox.com/docs/stt/async/webhooks - Confidence scores: https://soniox.com/docs/stt/concepts/confidence-scores - Context: https://soniox.com/docs/stt/concepts/context - Language hints: https://soniox.com/docs/stt/concepts/language-hints - Language identification: https://soniox.com/docs/stt/concepts/language-identification - Language restrictions: https://soniox.com/docs/stt/concepts/language-restrictions - Speaker diarization: https://soniox.com/docs/stt/concepts/speaker-diarization - Supported languages: https://soniox.com/docs/stt/concepts/supported-languages - Timestamps: https://soniox.com/docs/stt/concepts/timestamps - Direct stream: https://soniox.com/docs/stt/guides/direct-stream - Proxy stream: https://soniox.com/docs/stt/guides/proxy-stream - Connection keepalive: https://soniox.com/docs/stt/rt/connection-keepalive - Endpoint detection: https://soniox.com/docs/stt/rt/endpoint-detection - Error handling (real-time): https://soniox.com/docs/stt/rt/error-handling - Limits & quotas (real-time): https://soniox.com/docs/stt/rt/limits-and-quotas - Manual finalization: https://soniox.com/docs/stt/rt/manual-finalization - Real-time transcription: https://soniox.com/docs/stt/rt/real-time-transcription - Real-time translation: https://soniox.com/docs/stt/rt/real-time-translation - Community integrations: https://soniox.com/docs/stt/integrations/community-integrations - Integrations overview: https://soniox.com/docs/stt/integrations - LiveKit: https://soniox.com/docs/stt/integrations/livekit - n8n: https://soniox.com/docs/stt/integrations/n8n - Pipecat: https://soniox.com/docs/stt/integrations/pipecat - TanStack AI SDK: https://soniox.com/docs/stt/integrations/tanstack-ai-sdk - Twilio: https://soniox.com/docs/stt/integrations/twilio - Vercel AI SDK: https://soniox.com/docs/stt/integrations/vercel-ai-sdk - Node SDK async transcription: https://soniox.com/docs/stt/SDKs/node-SDK/async-transcription - Node SDK files: https://soniox.com/docs/stt/SDKs/node-SDK/files - Node SDK overview: https://soniox.com/docs/stt/SDKs/node-SDK - Node SDK realtime transcription: https://soniox.com/docs/stt/SDKs/node-SDK/realtime-transcription - Node SDK webhooks: https://soniox.com/docs/stt/SDKs/node-SDK/webhooks - Python SDK async transcription: https://soniox.com/docs/stt/SDKs/python-SDK/async-transcription - Python SDK files: https://soniox.com/docs/stt/SDKs/python-SDK/files - Python SDK overview: https://soniox.com/docs/stt/SDKs/python-SDK - Python SDK realtime transcription: https://soniox.com/docs/stt/SDKs/python-SDK/realtime-transcription - Python SDK sync vs async: https://soniox.com/docs/stt/SDKs/python-SDK/sync-vs-async-clients - Python SDK webhooks: https://soniox.com/docs/stt/SDKs/python-SDK/webhooks - React SDK: https://soniox.com/docs/stt/SDKs/react-SDK - React SDK realtime transcription: https://soniox.com/docs/stt/SDKs/react-SDK/realtime-transcription - Web SDK: https://soniox.com/docs/stt/SDKs/web-SDK - Web SDK realtime transcription: https://soniox.com/docs/stt/SDKs/web-SDK/realtime-transcription - LangChain.js (JavaScript): https://soniox.com/docs/stt/integrations/langchain/langchain-js - LangChain (Python): https://soniox.com/docs/stt/integrations/langchain/langchain - Node SDK classes & reference: https://soniox.com/docs/stt/SDKs/node-SDK/reference - Node SDK full reference: https://soniox.com/docs/stt/SDKs/node-SDK/reference - Node SDK types: https://soniox.com/docs/stt/SDKs/node-SDK/reference/types - Python SDK Async Client reference: https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/async_client - Python SDK Realtime Client reference: https://soniox.com/docs/stt/SDKs/python-SDK/Full-SDK-reference/realtime_client - Node SDK "Get models" endpoint: https://soniox.com/docs/stt/api-reference/models/get_models - Auth - create temporary API key: https://soniox.com/docs/stt/api-reference/auth/create_temporary_api_key - Create transcription: https://soniox.com/docs/stt/api-reference/transcriptions/create_transcription - Delete transcription: https://soniox.com/docs/stt/api-reference/transcriptions/delete_transcription - Get transcription: https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription - Get transcription transcript: https://soniox.com/docs/stt/api-reference/transcriptions/get_transcription_transcript - Get transcriptions: https://soniox.com/docs/stt/api-reference/transcriptions/get_transcriptions - Delete file: https://soniox.com/docs/stt/api-reference/files/delete_file - Get file: https://soniox.com/docs/stt/api-reference/files/get_file - Get file URL: https://soniox.com/docs/stt/api-reference/files/get_file_url - Get files: https://soniox.com/docs/stt/api-reference/files/get_files - Upload file: https://soniox.com/docs/stt/api-reference/files/upload_file ------ Community and support URL: /community-and-support - Support tiers: - Free: community Discord. - Business: priority email and onboarding (contact sales@soniox.com). - Enterprise: dedicated channels, SLAs (contact sales@soniox.com). - GitHub: official SDKs and integrations tracked at https://github.com/soniox. - Website for product/pricing: https://soniox.com/. FAQ URL: /faq - Common issues: - High WebSocket connection times: caused by network latency, wrong region, large initial context. Recommendation: buffer audio locally and send buffered audio immediately after sending start/config message. - Increase concurrent requests: request via Soniox Console Org Limits (review 1–3 business days). - Compliance docs (MSA, DPA): available for Business & Enterprise — contact sales@soniox.com. Introduction URL: / - Soniox: real-time & async speech-to-text and translation, 60+ languages, REST + WebSocket + SDKs. - Before starting: create account at Console (https://console.soniox.com/), generate API keys per project. AI engineering URL: /stt/ai-engineering - Tools: MCP server for editor integration, AI assistant embedded in docs, LLM context files. - MCP server example config: { "soniox-docs": { "command":"npx", "args":["-y","mcp-remote","https://soniox.com/docs/api/mcp/mcp"] } } - LLM context files: /llms.txt (core), /llms-full.txt (extended). - Copy/Open buttons allow sending docs to ChatGPT/Claude. Data residency URL: /stt/data-residency - Content (audio + transcripts) is NOT used to train models. - Per-project region selection: content processed & stored inside region. - System data (account metadata, billing) may be processed outside region. - Regional endpoints: - US: api.soniox.com, stt-rt.soniox.com - EU: api.eu.soniox.com, stt-rt.eu.soniox.com - JP: api.jp.soniox.com, stt-rt.jp.soniox.com - Contact sales@soniox.com to enable regional deployments. Get started URL: /stt/get-started - Quick path: 1. Create Soniox account and get API key (project-level). export SONIOX_API_KEY= 2. Clone examples repo: github.com/soniox/soniox_examples 3. Run sample scripts (Python/Node) — examples provided for real-time and async. - Examples (CLI snippets): see quick commands for Python and Node folders (real-time and async). - Next steps: real-time API and async API pages. Models URL: /stt/models - Current active models: - stt-rt-v4 (real-time) - stt-async-v4 (async) - stt-rt-v3, stt-async-v3 (active; routing to v4 after deprecation date) - Alias stability: e.g., stt-rt-preview-v2 → stt-rt-v3 - Changelog: v4 real-time/async announcements, v3 release notes; v3/v4 compatibility claimed (replace model name in requests). - Key capabilities: improved accuracy, longer duration (up to 5 hours), better multilingual switching. Security and privacy URL: /stt/security-and-privacy - Certifications: SOC 2 Type 2, ISO/IEC 27001:2022, GDPR, HIPAA. - Data handling: - No model training on customer content. - No retention unless stored (async storage or when user opts in). - Deletion via Console or API. - Logging: minimal, no raw audio or transcripts in logs. - Encryption: TLS 1.2+; stored data access restricted by API keys. React Native SDK URL: /stt/SDKs/react-native-SDK - Uses @soniox/react + @soniox/client. - Install: npm install @soniox/react @soniox/client - Client must fetch temporary API keys from server (avoid exposing API key). - Implement AudioSource interface: start(handlers) → call handlers.onData(ArrayBuffer), onError, onMuted/onUnmuted; stop() releases resources. - Example: wrap custom MyAudioSource and use useRecording hook: useRecording({ model:"stt-rt-v4", audio_format:"pcm_s16le", sample_rate:16000, num_channels:1, source }) - Provide SonioxProvider apiKey fetching function to obtain temporary keys from server endpoint. API reference (overview) URL: /stt/api-reference - REST API base: https://api.soniox.com/v1 (OpenAPI: /v1/openapi.json). - Auth API: create temporary API key - Files API: upload, list, get, delete, get URL - Models API: get models - Transcriptions API: create/list/get/delete transcriptions and transcripts - WebSocket API: real-time streaming at wss://stt-rt.soniox.com/transcribe-websocket (region endpoints available as above). WebSocket API URL: /stt/api-reference/websocket-api - Endpoint: wss://stt-rt.soniox.com/transcribe-websocket (use regional stt-rt..soniox.com). - Start session: send initial JSON config (text frame) before binary audio frames. Example start message (key fields): { "api_key":"", "model":"stt-rt-v4", "audio_format":"auto", "language_hints":["en","es"], "context":{ "general":[{"key":"domain","value":"Healthcare"}], "text":"...","terms":["Celebrex"], "translation_terms":[{"source":"Mr. Smith","target":"Sr. Smith"}] }, "enable_speaker_diarization":true, "enable_language_identification":true, "translation": { "type":"two_way", "language_a":"en", "language_b":"es" }, "enable_endpoint_detection": true, "max_endpoint_delay_ms": 2000, "client_reference_id":"optional-id" } - Audio streaming: - After start, stream binary frames for audio chunks. - For raw PCM formats, also set sample_rate and num_channels. - End-of-stream: send an empty WebSocket frame ("" or empty binary); server sends finished response then closes. - Responses: JSON objects with tokens[], final_audio_proc_ms, total_audio_proc_ms, possibly finished or error fields. Token fields: - text (string) - start_ms, end_ms (ms) — present for spoken tokens (not translation tokens) - confidence (0.0–1.0) - is_final (bool) - speaker (string) if diarization enabled - translation_status ("none"|"original"|"translation") - language, source_language (when applicable) - Finished response: { "tokens": [], "final_audio_proc_ms": , "total_audio_proc_ms": , "finished": true } - Error response (server closes connection afterwards): { "tokens": [], "error_code": , "error_message":"..." } - Start-message parameters (summary): - api_key (string) required - model (string) required - audio_format (string) required (e.g., "auto" or pcm encoding) - num_channels, sample_rate required for raw audio - language_hints: array - language_hints_strict: bool - context: object (general/text/terms/translation_terms) - enable_speaker_diarization: bool - enable_language_identification: bool - enable_endpoint_detection: bool - max_endpoint_delay_ms: number (500..3000; default 2000) - client_reference_id: string - translation object: - one_way: { type:"one_way", target_language:"es" } - two_way: { type:"two_way", language_a:"en", language_b:"es" } - Audio formats: auto-detected container formats (mp3/wav/ogg/etc) or raw PCM/float/μ-law. For raw formats provide sample_rate & num_channels. - Error codes (common categories): - 400 Bad request: missing audio_format, invalid model, audio too long, malformed start message, etc. - 401 Unauthorized: invalid or missing API key - 402 Payment required: balance/budget exhausted - 408 Request timeout: no start message or no audio in time - 429 Too many requests: rate or concurrency limits exceeded - 500 Internal server error - 503 Service unavailable: cannot continue request, "Cannot continue request (code N). Please restart the request." - Keepalive control message to stay connected with no audio: {"type":"keepalive"} — send at least once every 20 seconds when idle. Real-time token behavior and endpoint detection - Tokens delivered as non-final then re-emitted as final; final tokens are never changed after emitted as final. - Endpoint detection (enable_endpoint_detection=true): - When model detects end of utterance, all preceding tokens are emitted with is_final=true and a "" token is returned (final). - Use max_endpoint_delay_ms to reduce latency. - Manual finalization: - Send {"type":"finalize"} control message to force finalization. Server returns "" token as marker. Soniox Live (demo app) URL: /stt/demo-apps/soniox-live - Demo: browser & React Native examples streaming mic to WebSocket with server-issued temporary keys. - Architecture: server issues temporary API keys (keeps secret key server-side); client requests key and connects directly to Soniox. Async transcription URL: /stt/async/async-transcription - Modes: - audio_url: public URL - file_id: uploaded via Files API (upload first) - Supported file formats: aac, aiff, amr, asf, flac, mp3, ogg, wav, webm, m4a, mp4 - Create transcription: POST /v1/transcriptions (model + audio_url or file_id); optional language_hints, enable_language_identification, enable_speaker_diarization, context, translation, webhook_url, client_reference_id. - Polling vs webhooks: can poll GET /v1/transcriptions/{id} or set webhook_url to get POST callback when completed. - SDK convenience: Node/Python SDKs provide combined upload + transcribe + wait helpers and cleanup options. Async translation URL: /stt/async/async-translation - Same as async transcription but include translation in CreateTranscription config: - one_way: { type:"one_way", target_language:"es" } - two_way: { type:"two_way", language_a:"en", language_b:"es" } - SDK examples include options to wait and fetch transcript or receive via webhook. Async error handling URL: /stt/async/error-handling - File upload errors: file duration > 300 minutes fails; watch storage & file count quotas. - Transcription request errors: max 100 pending; total <= 2,000. - Webhook failures: Soniox retries automatically; if permanently failed, you can fetch result using transcription ID: curl https://api.soniox.com/v1/transcriptions/ -H "Authorization: Bearer $SONIOX_API_KEY" Limits & quotas (async) URL: /stt/async/limits-and-quotas - File limits: - Total storage: 10 GB (default) - Max uploaded files: 1,000 - File duration: 300 minutes (fixed) - Transcription limits: - Pending transcriptions: 100 - Total transcriptions (pending+completed+failed): 2,000 - Request increases via Soniox Console. Webhooks (async) URL: /stt/async/webhooks - Set webhook_url in CreateTranscription config. Soniox POSTs when transcription is completed or errored: { "id":"", "status":"completed" } (or "error") - Secure webhooks: set webhook_auth_header_name and webhook_auth_header_value in request; Soniox includes that header on callback. - Add metadata via query params appended to webhook_url. - Retry behavior: Soniox retries multiple times; if permanently failed, fetch result manually. Log transcription IDs. Confidence scores URL: /stt/concepts/confidence-scores - Each token includes confidence (0.0–1.0). - Use to flag uncertain words or trigger verification flows. Context URL: /stt/concepts/context - Use structured context to improve transcription & translation: - general: [{key,value}, ...] — short structured metadata (domain, topic, names). - text: long free-form background text (documents, notes). - terms: list of domain-specific tokens (vocab). - translation_terms: [{source,target}, ...] to force/guide translations. - Limit: approx 8,000 tokens (~10,000 chars). Large contexts may incur errors if exceeded. - Tips: use general first, small term lists, translation_terms only for translation. Language hints URL: /stt/concepts/language-hints - language_hints: array of ISO codes (e.g., ["en","es"]) biases recognition (not strict unless language_hints_strict=true). - When you know likely languages, provide hints to improve accuracy. Language identification URL: /stt/concepts/language-identification - enable_language_identification: tokens include language field; token-level tagging, but model maintains sentence-level coherence. - Real-time: labels may be revised as more context arrives. Language restrictions URL: /stt/concepts/language-restrictions - Use language_hints + language_hints_strict=true to strongly prefer specified languages (best with single language). Not a hard guarantee. Speaker diarization URL: /stt/concepts/speaker-diarization - enable_speaker_diarization: tokens include speaker field. Up to 15 speakers supported. - Async mode provides higher diarization accuracy than real-time. Supported languages URL: /stt/concepts/supported-languages - 60+ languages supported; list provided (af, sq, ar, ..., cy). See models GET for programmatic list. Timestamps URL: /stt/concepts/timestamps - Tokens include start_ms and end_ms (milliseconds) by default for spoken tokens (not translation tokens). Guides: Direct stream & Proxy stream - Direct stream (browser → Soniox): - Server issues temporary API key via REST endpoint: POST https://api.soniox.com/v1/auth/temporary-api-key Body: { "usage_type":"transcribe_websocket", "expires_in_seconds":60 } - Client obtains temp key, constructs RecordTranscribe or Web SDK recording, connects directly to Soniox. - Example server: FastAPI or Node Express that POSTs to /v1/auth/temporary-api-key and returns temporary api_key. - Client HTML example uses RecordTranscribe from @soniox/speech-to-text-web and starts/stops recording. - Proxy stream: - Client → your proxy WebSocket server → your server connects to Soniox WebSocket and relays audio & responses. - Example servers in Python (websockets) and Node (ws) included. - Use when you must inspect/transform/store audio or apply server-side logic before Soniox. Connection keepalive (real-time) URL: /stt/rt/connection-keepalive - Send {"type":"keepalive"} when no audio has been sent to keep session active. - Send at least every 20s when idle; 5–10s is common. - Billing: charged for full session duration (not just audio processed). Endpoint detection URL: /stt/rt/endpoint-detection - enable_endpoint_detection: model emits token when it determines utterance ended; preceding tokens are finalized. - Use for voice agents to trigger downstream processing. - max_endpoint_delay_ms controls maximum latency (500–3000ms). Real-time error handling URL: /stt/rt/error-handling - All errors are returned as JSON error response before connection close; log error_code & error_message. - If server returns "Cannot continue request (code N)" (503), start a new session immediately. - Audio must be streamed near real-time; long stalls may cause timeouts. Real-time limits & quotas URL: /stt/rt/limits-and-quotas - Requests per minute: 100 - Concurrent requests: 10 - Stream duration: 300 minutes per session (fixed) Manual finalization URL: /stt/rt/manual-finalization - Send {"type":"finalize"} to force finalize current audio; server returns token. - Best practice: finalize after ~200 ms silence; don’t finalize too often. - You may continue streaming after finalize. Real-time transcription (concept + examples) URL: /stt/rt/real-time-transcription - Model emits tokens as provisional (is_final:false) then final (is_final:true). - Audio processing metrics: - final_audio_proc_ms — ms of audio finalized - total_audio_proc_ms — ms of audio processed (final + non-final) - Raw PCM configuration example: { "audio_format":"pcm_s16le","sample_rate":16000,"num_channels":1 } - Sample SDK examples provided for Python/Node (connect, stream audio, handle tokens and events). See SDK sections below. Real-time translation URL: /stt/rt/real-time-translation - Modes: - one_way: translate all spoken audio into a single target language - two_way: translate between language_a and language_b - Tokens include translation_status: "original" or "translation" and source_language for translations. - Spoken tokens include timestamps; translated tokens do not include timestamps. Integrations & community - LiveKit: official plugin; configure STT with Soniox API key; optional base_url override for regional endpoints. - n8n: Soniox node supports Create/Get/Delete operations, webhook support, auto-cleanup options. - Pipecat: SonioxSTTService to add STT to Pipecat pipelines (pipecat-ai[soniox] package). - TanStack AI SDK & Vercel AI SDK: Soniox adapter/provider examples and quick usage. - Twilio: example repo for streaming Twilio calls to Soniox (use TwiML and proxy/server). - LangChain (JS & Python): Soniox document loaders for transcription to feed LLM pipelines. SDKs — Node, Python, React, Web (quick condensed reference) Node SDK (npm @soniox/node) - Install: npm install @soniox/node - Env: export SONIOX_API_KEY= - Key classes & methods: - client = new SonioxNodeClient() - client.files.upload(file, { filename }) - client.stt.transcribe({ model:'stt-async-v4', file, filename, wait:true, cleanup:['file','transcription'] }) - client.stt.list(), client.stt.get(id), client.stt.delete(id), client.stt.getTranscript(id) - client.auth.createTemporaryKey({ usage_type:'transcribe_websocket', expires_in_seconds }) - Real-time: session = client.realtime.stt(config); session.connect(); session.sendStream(iterator, {pace_ms:120, finish:true}); session.on('result',...). - Webhooks helpers: client.webhooks.handleExpress(req) etc. return lazy fetch helpers fetchTranscript() / fetchTranscription(). Python SDK (soniox) - Install: pip install soniox - Clients: - SonioxClient (sync) or AsyncSonioxClient (async) - Basic usage: - client = SonioxClient() - trans = client.stt.transcribe(audio_url='https://...', model='stt-async-v4'); client.stt.wait(trans.id); transcript = client.stt.get_transcript(trans.id) - client.files.upload('audio.mp3'); client.files.list(); client.files.delete(id) - Realtime: with client.realtime.stt.connect(config) as session: ws-streaming logic (examples provided) - Temporary API keys: client.auth.create_temporary_api_key(expires_in_seconds=3600,...) - Webhook helpers exist (unwrap, verify_signature). Web SDK (@soniox/client) - Browser-first: create SonioxClient({ api_key: async ()=>fetch('/tmp-key') }) - recording = client.realtime.record({ model:'stt-rt-v4' }) — high-level recording with events (result, endpoint, connected, error, finalized, finished). - Use SonioxProvider & useRecording in React (@soniox/react) for hooks-based integration. React SDK (@soniox/react) - Install: npm install @soniox/react - SonioxProvider wraps app (apiKey fetcher or pre-built client). - useRecording hook returns reactive state and control methods (start/stop/pause/resume/finalize). - Grouping: groupBy auto-set for translations: one_way → groupBy 'translation' (original + translation), two_way → groupBy 'language' (en/es). - Permission helpers: useMicrophonePermission; useAudioLevel for VU meter. LangChain loaders - @soniox/langchain (JS) and langchain-soniox (Python) provide document loaders that transcribe audio and return Document objects with pageContent and metadata (tokens, speakers, languages). - Options: model, translation, language_hints, context, pollingIntervalMs, timeout. API endpoints (concise reference — required inputs & outputs) 1) Get models - GET /v1/models - Response 200: { models: [ { id, name, context_version, transcription_mode, languages:[{code,name}], one_way_translation, two_way_translation, translation_targets, two_way_translation_pairs, aliased_model_id } ] } 2) Create temporary API key - POST /v1/auth/temporary-api-key - Body JSON: { "usage_type":"transcribe_websocket", "expires_in_seconds": <1..3600>, "client_reference_id": "" } - Response 201: { "api_key":"temp:...", "expires_at":"ISO timestamp" } 3) Create transcription (async) - POST /v1/transcriptions - Required body fields: model (string) - Exactly one audio source: audio_url (public URL) OR file_id (uploaded file uuid) - Optional fields: - language_hints: array of ISO codes - language_hints_strict: bool - enable_speaker_diarization: bool - enable_language_identification: bool - translation: { type: "one_way", target_language } OR { type:"two_way", language_a, language_b } - context: general/text/terms/translation_terms - webhook_url, webhook_auth_header_name, webhook_auth_header_value - client_reference_id - Response 201: transcription metadata: { id, status: queued|processing|completed|error, created_at, model, filename, file_id, audio_url, language_hints, enable_speaker_diarization, enable_language_identification, audio_duration_ms, error_type, error_message, webhook_url, webhook_auth_header_name, webhook_auth_header_value (masked), webhook_status_code, client_reference_id } 4) Delete transcription - DELETE /v1/transcriptions/{transcription_id} - Responses: - 204 No Content (deleted) - 404 transcription_not_found - 409 transcription_invalid_state (e.g., processing) - 401 unauthorized 5) Get transcription - GET /v1/transcriptions/{transcription_id} - Response 200: same transcription metadata as create. 6) Get transcription transcript - GET /v1/transcriptions/{transcription_id}/transcript - Only for completed transcriptions - Response 200: { id, text, tokens: [ { text, start_ms, end_ms, confidence, speaker?, language?, is_audio_event?, translation_status? } ] } - Errors: 404 transcription_not_found, 409 invalid_state if not completed, 401/500 as usual. 7) List transcriptions - GET /v1/transcriptions?limit=&cursor= - Response 200: { transcriptions: [ transcription objects ], next_page_cursor: string|null } Files API: 8) Upload file - POST /v1/files (multipart/form-data) - fields: file (binary) required; client_reference_id optional (max 256) - Response 201: { id (uuid), filename, size, created_at, client_reference_id? } - Errors: 400 invalid_request (file too large > 524,288,000 bytes), 401, 500 9) Get file metadata - GET /v1/files/{file_id} - Response 200: { id, filename, size, created_at, client_reference_id? } - 404 file_not_found, 401, 500 10) Get file temporary URL - GET /v1/files/{file_id}/url - Response 200: { url: "temporary presigned url" } — valid for ~1 hour - 404 file_not_found, 401, 500 11) List files - GET /v1/files?limit=&cursor= - Response 200: { files: [file objects], next_page_cursor: string|null } 12) Delete file - DELETE /v1/files/{file_id} - Response 204 No Content - 404 file_not_found, 401, 500 SDK examples (key snippets) - Create temporary API key (Node): const { api_key, expires_at } = await client.auth.createTemporaryKey({ usage_type:'transcribe_websocket', expires_in_seconds:300 }); - Node SDK async transcription (upload + transcribe + wait + get transcript): const client = new SonioxNodeClient(); const file = await client.files.upload(audioBuffer, { filename: 'audio.mp3' }); const transcription = await client.stt.transcribe({ model:'stt-async-v4', file_id:file.id, wait:true }); const transcript = await client.stt.getTranscript(transcription.id); - Python SDK async transcription (sync): client = SonioxClient() transcription = client.stt.transcribe(audio_url='https://.../coffee_shop.mp3', model='stt-async-v4') client.stt.wait(transcription.id) transcript = client.stt.get_transcript(transcription.id) - WebSocket start message example (JSON shown earlier). Server responses deliver tokens with fields text,start_ms,end_ms,confidence,is_final,speaker,language,translation_status. - Browser direct-stream HTML example: client fetches temporary API key from /temporary-api-key and uses RecordTranscribe (or @soniox/client) to start/stop and render final + non-final tokens. (See direct stream index.html example in repo.) - Proxy stream example: client sends MediaRecorder chunks via WebSocket to proxy; proxy forwards to Soniox WebSocket and relays responses back. Real-time best practices & tips - Buffer audio locally and only attempt to use the WebSocket when you are ready; send buffered audio immediately after sending start message. - Use language_hints when possible to improve accuracy. - Use context to improve domain-specific recognition (terms, general, translation_terms). - For agent workflows: - Use enable_endpoint_detection or manual finalize() to detect end-of-utterance. - Pause session while agent speaks; session.pause() will send keepalives automatically but you are charged for the full open session. - For low-latency translation: use real-time translation config; translation tokens stream after transcription tokens; translations have no timestamps. Errors & recovery summary - Real-time: server returns structured error JSON and closes connection — inspect error_code and message, then reconnect. - Common error causes: invalid API key, invalid model, missing audio format for raw, rate/quota exceeded, timeout, too-long audio. - When 503 "Cannot continue request (code N)" occurs, start a new session and resume streaming. References & repos - Examples: https://github.com/soniox/soniox_examples - Node SDK: https://github.com/soniox/soniox-js, npm @soniox/node - React + Web SDK: @soniox/react, @soniox/client - Python SDK: https://github.com/soniox/soniox-python, pip soniox - Integrations: LiveKit plugin, n8n node, TanStack adapter, Vercel provider, LangChain loaders. If you need a narrow extract (e.g., only API endpoint payload definitions, or only WebSocket quickstart & sample messages, or only SDK usage for specific language), tell me which pages or topics to expand; this summary keeps all pages and main examples condensed. ## Soniox website summary: Understand every word, everywhere. - Soniox is a real-time speech-to-text and speech-translation platform (App + API) built for applications, voice agents, live systems and end users. - Core capabilities: streaming transcription, streaming any-to-any translation, speaker detection, semantic endpointing, domain-aware context, and support for mixed-language speech. - Coverage: 60+ languages, 3,600+ translation pairs. One speech platform. Two ways to use it. - Build with the Soniox API: real-time streaming, async (file) transcription/translation, speaker diarization, endpointing, domain context, regional deployments. - Use the Soniox App: Smart Scribe (meetings/transcripts/summaries), Translator (live translation), Voice Typing (system-wide dictation) — mobile, desktop, web. Build with the Soniox API - Real-time + async models (stt-rt-v4, stt-async-v4). Low-latency streaming, token-level incremental output, manual finalization. - Features: automatic language ID & mid-sentence code-switching, speaker separation, accurate alphanumerics, semantic endpoint detection, request-time context (no fine-tuning required). - SDKs: Python, Node, Web, React, React Native. Docs: https://soniox.com/docs Use the Soniox App - One unified voice workspace: Smart Scribe, Translator, Voice Typing; sync across devices; offline recording with later processing. - Platforms & downloads: - iOS App Store: https://apps.apple.com/us/app/soniox/id1560199731 - Android (Google Play): https://play.google.com/store/apps/details?id=com.soniox.sonioxmobileapp - Desktop (macOS & Windows) and Web available from the Soniox site. Production-ready speech, by design - Accuracy: native-speaker quality in real-world conversations across languages, accents, noisy audio and overlapping speakers. - Multilingual by default: one universal model for 60+ languages; automatic language switching. - Real-time streaming: word-by-word processing and low finalization latency. - Deploy anywhere: one global API with in-region processing (Sovereign Cloud) for latency and data-residency requirements. Built for live, real-world speech - Primary use cases: voice agents & assistants, call centers/support, media & live captions, medical transcription, wearables & IoT, real-time speech analytics and multilingual meetings. - Design goals: low-latency turn-taking, robust handling of interruptions and mixed language, and structured speaker-aware transcripts ready for downstream automation. Privacy and compliance, built right in - Default behavior: audio processed in memory, not stored unless explicitly requested. - Certifications & compliance: SOC 2 Type 2, ISO/IEC 27001:2022, HIPAA-ready, GDPR compliance. - Regional data residency and Sovereign Cloud deployments available (US, EU, Japan; more regions coming). What’s new (high level) - Soniox v4 Real-Time: model optimized for sub-250ms interactions, semantic endpointing, streaming translation. - Soniox v4 Async: human-parity accuracy across 60+ languages for batch/file transcription. - New SDKs (Python, Node, Web, React, React Native) and Soniox App unification and desktop release. - Continuous benchmarking and public compare tools to test Soniox vs. other providers. About us - Mission: make speech AI universal — accurate, multilingual, and production-ready. - Founded: 2020; breakthrough unsupervised learning approaches for large-scale speech models. - Founding team: - Klemen Simonic — Founder, CEO - Ambroz Bizjak — Co‑founder, Chief Architect - Hiring / careers: see site careers pages. Benchmarks - Soniox publishes multi-language benchmarks vs. major providers (OpenAI, Google, AWS, Azure, NVIDIA, Deepgram, AssemblyAI, Speechmatics, ElevenLabs). - Reported results show Soniox achieving top accuracy across many evaluated languages and strong low-latency performance in independent voice-agent benchmarks. Pricing (concise) - Reference rates: async (file) ≈ $0.10 / hour; real-time (streaming) ≈ $0.12 / hour (token-based billing details available in docs and pricing pages). Pay-as-you-go and team/enterprise plans available. SDKs & developer resources - SDKs: Python, Node, Web, React, React Native (works with Next.js). - Utilities: async transcribe(), real-time session API, speaker segmentation, language-detection helpers, unified error handling. - Docs & API reference: https://soniox.com/docs Compare - Soniox Compare: live, side-by-side testing framework to evaluate real audio vs other STT providers (open-source comparison framework available via Soniox resources). Contact - General: info@soniox.com - Technical support: support@soniox.com (or join Discord) - Feedback / product inquiries: hello@soniox.com - Discord: https://discord.gg/rWfnk9uM5j - Docs: https://soniox.com/docs - Locations: - Soniox (US): 1045 Helm Ln, Foster City, CA 94404, United States - Soniox Europe: Cesta v Gorice 34B, 1000 Ljubljana, Slovenia Product summaries - Soniox API: production-ready speech-to-text + translation API for developers (real-time & async, speaker detection, translation, endpointing). - Soniox App: consumer & team product for live transcription, translation, voice typing, summaries (mobile + desktop + web). Key links - Docs: https://soniox.com/docs - iOS App: https://apps.apple.com/us/app/soniox/id1560199731 - Android App: https://play.google.com/store/apps/details?id=com.soniox.sonioxmobileapp - Discord: https://discord.gg/rWfnk9uM5j Supportable claims / short facts - Languages: 60+ supported in one universal model. - Translation: streaming any-to-any translation (3,600+ language pairs). - Compliance: SOC 2 Type 2, ISO/IEC 27001:2022, HIPAA-ready, GDPR. - Core contacts and developer resources listed above.