Types
Soniox Node SDK — Types Reference
AudioData
Audio data types accepted by sendAudio. In Node.js, Buffer is also accepted since Buffer extends Uint8Array.
AudioFormat
Supported audio formats for real-time transcription.
CleanupTarget
Resource types that can be cleaned up after transcription completes.
'file'- The uploaded file'transcription'- The transcription record
ConcurrencyCurrentValues
Live concurrency counts.
Properties
| Property | Type | Description |
|---|---|---|
transcribe_concurrent | number | Current number of concurrent transcription sessions. |
tts_concurrent | number | Current number of concurrent TTS sessions. |
ConcurrencyLimitValues
Configured concurrency limits.
Properties
ConcurrencyLimitsResponse
Current concurrent counts plus configured concurrency limits for the project and its organization. Values are region-scoped.
Properties
| Property | Type | Description |
|---|---|---|
organization | ConcurrencyScopeValues | Organization-level concurrency counts and limits. |
project | ConcurrencyScopeValues | Project-level concurrency counts and limits. |
ConcurrencyScopeValues
Current counts and configured limits for a concurrency scope.
Properties
| Property | Type | Description |
|---|---|---|
current | ConcurrencyCurrentValues | Current live concurrency counts. |
limits | ConcurrencyLimitValues | Configured concurrency limits. |
ContextGeneralEntry
Key-value pair for general context information.
Properties
| Property | Type | Description |
|---|---|---|
key | string | The key describing the context type (e.g., "domain", "topic", "doctor"). |
value | string | The value for the context key. |
ContextTranslationTerm
Custom translation term mapping.
Properties
| Property | Type | Description |
|---|---|---|
source | string | The source term to translate. |
target | string | The target translation for the term. |
CreateTranscriptionOptions
Options for creating a transcription.
Properties
| Property | Type | Description |
|---|---|---|
audio_url? | string | URL of a publicly accessible audio file. Max Length 4096 |
client_reference_id? | string | Optional tracking identifier. Max Length 256 |
context? | TranscriptionContext | Additional context to improve transcription accuracy and formatting of specialized terms. |
enable_language_identification? | boolean | Enable automatic language identification. |
enable_speaker_diarization? | boolean | Enable speaker diarization to identify different speakers. |
file_id? | string | ID of a previously uploaded file. Format uuid |
language_hints? | string[] | Array of expected ISO language codes to bias recognition. |
language_hints_strict? | boolean | When true, model relies more heavily on language hints. |
model | string | Speech-to-text model to use. Max Length 32 |
translation? | TranslationConfig | Translation configuration. |
webhook_auth_header_name? | string | Name of the authentication header sent with webhook notifications. Max Length 256 |
webhook_auth_header_value? | string | Authentication header value sent with webhook notifications. Max Length 256 |
webhook_url? | string | URL to receive webhook notifications when transcription is completed or fails. Max Length 256 |
DeleteAllFilesOptions
Options for purging all files.
Properties
DeleteAllTranscriptionsOptions
Options for deleting all transcriptions.
Properties
ExpressLikeRequest
Express/Connect-style request object
Properties
FastifyLikeRequest
Fastify-style request object
Properties
FileIdentifier
File identifier - either a string ID or an object with an id property.
FilesCountResponse
Total number of files, split by source.
Properties
| Property | Type | Description |
|---|---|---|
playground | number | Number of files uploaded via the Playground. |
public_api | number | Number of files uploaded via Public API. |
total | number | Total number of files across all sources. |
GenerateSpeechOptions
Options for REST TTS generation (generate / generateStream).
Properties
HandleWebhookOptions
Options for the handleWebhook function
Properties
| Property | Type | Description |
|---|---|---|
auth? | WebhookAuthConfig | Optional authentication configuration |
body | unknown | Request body (parsed JSON or raw string) |
headers | WebhookHeaders | Request headers |
method | string | HTTP method of the request |
HonoLikeContext
Hono context object
Properties
| Property | Type |
|---|---|
req | { method: string; header: string | undefined; json: Promise<unknown>; } |
req.method | string |
req.header | string | undefined |
req.json | Promise<unknown> |
HttpErrorCode
Error codes for HTTP client errors
HttpMethod
HTTP methods supported by the client
HttpRequestBody
Request body types
HttpResponseType
Response types
ListFilesOptions
Options for listing files.
Properties
ListFilesResponse<T>
Response from listing files.
Type Parameters
| Type Parameter |
|---|
T |
Properties
| Property | Type | Description |
|---|---|---|
files | T[] | List of uploaded files. |
next_page_cursor | string | null | A pagination token that references the next page of results. When null, no additional results are available. |
ListTranscriptionsOptions
Options for listing transcriptions
Properties
| Property | Type | Description |
|---|---|---|
cursor? | string | Pagination cursor for the next page of results |
limit? | number | Maximum number of transcriptions to return. Default 1000 Minimum 1 Maximum 1000 |
ListTranscriptionsResponse<T>
Response from listing transcriptions.
Type Parameters
| Type Parameter |
|---|
T |
Properties
ListUsageLogsOptions
Options for listing usage logs.
Properties
| Property | Type | Description |
|---|---|---|
cursor? | string | Pagination cursor for the next page of results. |
end_time | string | End of the time window (exclusive), filtering by request end time. Must be an ISO 8601 timestamp in UTC. Example '2026-04-29T09:00:00Z' |
limit? | number | Maximum number of usage log entries to return. Default 1000 Minimum 1 Maximum 1000 |
signal? | AbortSignal | AbortSignal for cancelling the request. |
sort? | UsageLogsSort | Sort order by end_time. Default 'end_time_asc' |
start_time | string | Start of the time window (inclusive), filtering by request end time. Must be an ISO 8601 timestamp in UTC. Example '2026-04-28T09:00:00Z' |
ListUsageLogsResponse
Response from listing usage logs.
Properties
| Property | Type | Description |
|---|---|---|
next_page_cursor | string | null | Pagination cursor for the next page of results. Null if no more pages. |
usage_logs | SonioxUsageLog[] | Per-request usage log entries ordered by end_time and UUID. |
NestJSLikeRequest
NestJS-style request object (uses Express under the hood by default)
Properties
OneWayTranslation
Result of a one-way translation ({ to } or { to, from } mode).
original_text and translation_text flatten the per-segment content
across the whole audio, which is useful when the caller just wants two
parallel strings.
Properties
| Property | Type | Description |
|---|---|---|
duration_ms | number | Total audio duration in milliseconds. Equals the largest end_ms across all original tokens, or 0 when there are no original tokens. |
from? | string | Source language hint that was supplied via from. Undefined when only to was provided and the source language was auto-detected. |
mode | "one_way" | - |
original_text | string | Concatenated text of every original token across all segments. |
segments | TranslationSegment[] | Per-utterance segments in audio order. |
to | string | Target language code (the to value passed in). |
translation_text | string | Concatenated text of every translation token across all segments. |
OneWayTranslationConfig
One-way translation configuration. Translates all spoken languages into a single target language.
Properties
| Property | Type | Description |
|---|---|---|
target_language | string | Target language code for translation (e.g., "fr", "es", "de"). |
type | "one_way" | Translation type. |
QueryParams
Query parameters
RealtimeClientOptions
Real-time API configuration options for the client.
Properties
| Property | Type | Description |
|---|---|---|
api_key | string | API key for real-time sessions. |
default_session_options? | SttSessionOptions | Default session options applied to all real-time STT sessions. Can be overridden per-session. |
stt_defaults? | Partial<SttSessionConfig> | STT session config defaults. Merged as the base layer when opening STT sessions via realtime.stt(config); caller fields override. |
tts_connection_options? | TtsConnectionOptions | Default TTS connection options. |
tts_defaults? | Partial<TtsStreamConfig> | TTS stream config defaults. Merged as the base layer when opening TTS streams via realtime.tts(...); caller fields override. |
tts_ws_url | string | TTS WebSocket URL for real-time connections. Default 'wss://tts-rt.soniox.com/tts-websocket' |
ws_base_url | string | STT WebSocket base URL for real-time connections. Default 'wss://stt-rt.soniox.com/transcribe-websocket' |
RealtimeErrorCode
Error codes for Real-time (WebSocket) API errors
RealtimeEvent
Typed event for async iterator consumption.
RealtimeOptions
Real-time configuration options for the main client.
Properties
| Property | Type | Description |
|---|---|---|
default_session_options? | SttSessionOptions | Default session options applied to all real-time STT sessions. Can be overridden per-session. |
stt_defaults? | Partial<SttSessionConfig> | Default STT session config fields (model, language hints, context, etc.). Merged as the base layer when opening STT sessions via client.realtime.stt(config). Fields on the caller-provided config override these defaults. Equivalent to SonioxConnectionConfig.stt_defaults on the web/react clients. |
tts_connection_options? | TtsConnectionOptions | Default TTS connection options (keepalive interval, connect timeout). |
tts_defaults? | Partial<TtsStreamConfig> | Default TTS stream config fields (model, voice, language, audio_format, etc.). Merged as the base layer when opening TTS streams via client.realtime.tts(...). Fields on the caller-provided TtsStreamInput override these defaults. Equivalent to SonioxConnectionConfig.tts_defaults on the web/react clients. |
tts_ws_url? | string | TTS WebSocket URL for real-time connections. Falls back to SONIOX_TTS_WS_URL environment variable, then to 'wss://tts-rt.soniox.com/tts-websocket'. |
ws_base_url? | string | STT WebSocket base URL for real-time connections. Falls back to SONIOX_WS_URL environment variable, then to 'wss://stt-rt.soniox.com/transcribe-websocket'. |
RealtimeResult
A result message from the real-time WebSocket.
Properties
| Property | Type | Description |
|---|---|---|
final_audio_proc_ms | number | Milliseconds of audio that have been finalized. |
finished? | boolean | Whether this is the final result (session ending). |
tokens | RealtimeToken[] | Tokens in this result. |
total_audio_proc_ms | number | Total milliseconds of audio processed. |
RealtimeSegment
A segment of contiguous real-time tokens grouped by speaker/language.
Properties
| Property | Type | Description |
|---|---|---|
end_ms? | number | End time of the segment in milliseconds (from last token). |
language? | string | Detected language code (if language identification enabled). |
speaker? | string | Speaker identifier (if diarization enabled). |
start_ms? | number | Start time of the segment in milliseconds (from first token). |
text | string | Concatenated text of all tokens in this segment. |
tokens | RealtimeToken[] | Original tokens in this segment. |
RealtimeSegmentBufferOptions
Options for rolling real-time segmentation buffers.
Properties
| Property | Type | Description |
|---|---|---|
final_only? | boolean | When true, only tokens marked as final are buffered. Default true |
group_by? | SegmentGroupKey[] | Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language'] |
max_ms? | number | Maximum time window to keep in milliseconds (requires token timings). |
max_tokens? | number | Maximum number of tokens to keep in the buffer. Default 2000 |
RealtimeSegmentOptions
Options for segmenting real-time tokens.
Properties
| Property | Type | Description |
|---|---|---|
final_only? | boolean | When true, only tokens marked as final are included. Default false |
group_by? | SegmentGroupKey[] | Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language'] |
RealtimeToken
A single token from the real-time transcription.
Properties
RealtimeUtterance
A single utterance built from real-time segments.
Properties
| Property | Type | Description |
|---|---|---|
end_ms? | number | End time of the utterance in milliseconds (from last segment). |
final_audio_proc_ms? | number | Milliseconds of audio that have been finalized at flush time. |
language? | string | Detected language code when consistent across segments. |
segments | RealtimeSegment[] | Segments included in this utterance. |
speaker? | string | Speaker identifier when consistent across segments. |
start_ms? | number | Start time of the utterance in milliseconds (from first segment). |
text | string | Concatenated text of all segments in this utterance. |
tokens | RealtimeToken[] | Tokens included in this utterance. |
total_audio_proc_ms? | number | Total milliseconds of audio processed at flush time. |
RealtimeUtteranceBufferOptions
Options for buffering real-time utterances.
Properties
| Property | Type | Description |
|---|---|---|
final_only? | boolean | When true, only tokens marked as final are buffered. Default true |
group_by? | SegmentGroupKey[] | Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language'] |
max_ms? | number | Maximum time window to keep in milliseconds (requires token timings). |
max_tokens? | number | Maximum number of tokens to keep in the buffer. Default 2000 |
SegmentGroupKey
Fields that can be used to group tokens into segments
SegmentTranscriptOptions
Options for segmenting a transcript
Properties
| Property | Type | Description |
|---|---|---|
group_by? | SegmentGroupKey[] | Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language'] |
SendStreamOptions
Options for streaming audio from an async iterable source.
Properties
SonioxErrorCode
All possible SDK error codes (real-time + HTTP-specific codes)
SonioxFileData
Raw file metadata from the API.
Properties
SonioxLanguage
Properties
SonioxModel
Properties
| Property | Type | Description |
|---|---|---|
aliased_model_id | string | null | If this is an alias, the id of the aliased model. Null for non-alias models. |
context_version | number | null | Version of context supported. |
id | string | Unique identifier of the model. |
languages | SonioxLanguage[] | List of languages supported by the model. |
name | string | Name of the model. |
one_way_translation | string | null | When contains string 'all_languages', any laguage from languages can be used |
supports_language_hints_strict | boolean | TODO: Add documentation |
supports_max_endpoint_delay | boolean | - |
transcription_mode | SonioxTranscriptionMode | Transcription mode of the model. |
translation_targets | SonioxTranslationTarget[] | List of supported one-way translation targets. If list is empty, check for one_way_translation field |
two_way_translation | string | null | When contains string 'all_languages',' any laguage pair from languages can be used |
two_way_translation_pairs | string[] | List of supported two-way translation pairs. If list is empty, check for two_way_translation field |
SonioxNodeClientOptions
Properties
| Property | Type | Description |
|---|---|---|
api_key? | string | API key for authentication. Falls back to SONIOX_API_KEY environment variable if not provided. |
base_domain? | string | Base domain for all Soniox service URLs. A single override that derives all service endpoints from the pattern {service}.{base_domain}. Takes precedence over region. Falls back to SONIOX_BASE_DOMAIN environment variable. Individual URL fields (base_url, tts_api_url, realtime.ws_base_url, realtime.tts_ws_url) still take final precedence. Example 'eu.soniox.com' |
base_url? | string | Base URL for the REST API. Falls back to SONIOX_API_BASE_URL environment variable, then to the region-derived URL, then to 'https://api.soniox.com'. |
http_client? | HttpClient | Custom HTTP client implementation. |
realtime? | RealtimeOptions | Real-time API configuration options. |
region? | SonioxRegion | Deployment region. Determines which regional endpoints are used for both the REST API and real-time WebSocket connections. Leave undefined for the default (US) region. Shorthand for base_domain: '{region}.soniox.com'. base_domain takes precedence when both are provided. See https://soniox.com/docs/stt/data-residency |
stt_defaults? | Partial<SttSessionConfig> | Default STT session config fields applied to every real-time STT session opened via client.realtime.stt(config). Caller-provided fields override. Equivalent to SonioxConnectionConfig.stt_defaults on the web/react clients. Prefer this when you want the same defaults across your whole Node process. |
tts_api_url? | string | TTS REST API URL. Falls back to SONIOX_TTS_API_URL environment variable, then to the region-derived URL, then to 'https://tts-rt.soniox.com'. |
tts_defaults? | Partial<TtsStreamConfig> | Default TTS stream config fields applied to every real-time TTS stream opened via client.realtime.tts(...). Caller-provided fields override. Equivalent to SonioxConnectionConfig.tts_defaults on the web/react clients. |
SonioxTranscriptionData
Raw transcription metadata from the API.
Properties
| Property | Type | Description |
|---|---|---|
audio_duration_ms? | number | null | Duration of the audio in milliseconds. Only available after processing begins. |
audio_url? | string | null | URL of the audio file being transcribed. |
client_reference_id? | string | null | Optional tracking identifier. Max Length 256 |
context? | TranscriptionContext | null | Additional context provided for the transcription. |
created_at | string | UTC timestamp when the transcription was created. Format date-time |
enable_language_identification | boolean | When true, language is detected for each part of the transcription. |
enable_speaker_diarization | boolean | When true, speakers are identified and separated in the transcription output. |
error_message? | string | null | Error message if transcription failed. Null for successful or in-progress transcriptions. |
error_type? | string | null | Error type if transcription failed. Null for successful or in-progress transcriptions. |
file_id? | string | null | ID of the uploaded file being transcribed. Format uuid |
filename | string | Name of the file being transcribed. |
id | string | Unique identifier of the transcription. Format uuid |
language_hints? | string[] | null | Expected languages in the audio. If not specified, languages are automatically detected. |
model | string | Speech-to-text model used. |
status | TranscriptionStatus | Current status of the transcription. |
webhook_auth_header_name? | string | null | Name of the authentication header sent with webhook notifications. |
webhook_auth_header_value? | string | null | Authentication header value. Always returned masked. |
webhook_status_code? | number | null | HTTP status code received from your server when webhook was delivered. Null if not yet sent. |
webhook_url? | string | null | URL to receive webhook notifications when transcription is completed or fails. |
SonioxTranscriptionMode
Transcription mode of the model.
SonioxTranslation
Discriminated translation result returned by SonioxTranslationJob.getTranslation, SonioxTranslationJob.fetchTranslation, and translateFromTranscript.
SonioxTranslationTarget
Properties
SonioxUsageLog
Per-request usage log entry.
Properties
SttSessionConfig
Configuration sent to the Soniox WebSocket API when starting a session.
Properties
| Property | Type | Description |
|---|---|---|
audio_format? | "auto" | AudioFormat | Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default 'auto' |
client_reference_id? | string | Optional tracking identifier (max 256 chars). |
context? | TranscriptionContext | Additional context to improve transcription accuracy. |
enable_endpoint_detection? | boolean | Enable endpoint detection for utterance boundaries. Useful for voice AI agents. |
enable_language_identification? | boolean | Enable automatic language detection. |
enable_speaker_diarization? | boolean | Enable speaker identification. |
endpoint_sensitivity? | number | Controls how aggressively endpoints are detected. Adjusts how likely the model is to emit an endpoint. Higher values make endpoints more likely, which can finalize segments sooner. Lower values make endpoints less likely, which can help the system wait longer before finalizing. Allowed values are between -1.0 and 1.0. The default value is 0.0. |
language_hints? | string[] | Expected languages in the audio (ISO language codes). |
language_hints_strict? | boolean | When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee. |
max_endpoint_delay_ms? | number | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms |
model | string | Speech-to-text model to use. |
num_channels? | number | Number of audio channels (required for raw audio formats). |
sample_rate? | number | Sample rate in Hz (required for PCM formats). |
translation? | TranslationConfig | Translation configuration. |
SttSessionEvents
Event handlers for the STT session.
Properties
SttSessionOptions
SDK-level session options (not sent to the server).
Properties
| Property | Type | Description |
|---|---|---|
connect_timeout_ms? | number | Maximum time to wait for the WebSocket connection to open (milliseconds). If the connection is not established within this time, a ConnectionError with message "Connection timed out" is thrown. Default 20000 |
keepalive_interval_ms? | number | Interval for sending keepalive messages while paused (milliseconds). Default 5000 |
signal? | AbortSignal | AbortSignal for cancellation. |
SttSessionState
Session lifecycle states.
TemporaryApiKeyRequest
Properties
| Property | Type | Description |
|---|---|---|
client_reference_id? | string | Optional tracking identifier string. Does not need to be unique Max Length 256 |
expires_in_seconds | number | Duration in seconds until the temporary API key expires Minimum 1 Maximum 3600 |
max_session_duration_seconds? | number | Maximum connection duration in seconds for WebSocket and TTS HTTP streaming endpoints. Minimum 1 Maximum 18000 |
single_use? | boolean | When true, restricts the temporary API key to a single use. |
usage_type | TemporaryApiKeyUsageType | Intended usage of the temporary API key. |
TemporaryApiKeyResponse
Properties
| Property | Type | Description |
|---|---|---|
api_key | string | Created temporary API key. |
expires_at | string | UTC timestamp indicating when generated temporary API key will expire Format date-time |
TemporaryApiKeyUsageType
TranscribeBaseOptions
Base options shared by all audio source variants.
Properties
| Property | Type | Description |
|---|---|---|
cleanup? | CleanupTarget[] | Resources to clean up after transcription completes or on error/timeout. Only applies when wait: true. Cleanup runs in all cases when wait: true: - After successful completion - After transcription errors (status: 'error') - On timeout or abort This ensures no orphaned resources are left behind. Example // Delete only the uploaded file cleanup: ['file'] // Delete only the transcription record cleanup: ['transcription'] // Delete both file and transcription cleanup: ['file', 'transcription'] |
client_reference_id? | string | Optional tracking identifier. Max Length 256 |
context? | TranscriptionContext | Additional context to improve transcription accuracy and formatting of specialized terms. |
enable_language_identification? | boolean | Enable automatic language identification. |
enable_speaker_diarization? | boolean | Enable speaker diarization to identify different speakers. |
fetch_transcript? | boolean | When true (default), fetches the transcript and attaches it to the result when wait=true and the transcription completes successfully. Set to false to skip fetching the full transcript payload. Default true |
language_hints? | string[] | Array of expected ISO language codes to bias recognition. |
language_hints_strict? | boolean | When true, model relies more heavily on language hints. |
model | string | Speech-to-text model to use. Max Length 32 |
signal? | AbortSignal | AbortSignal to cancel the operation |
timeout_ms? | number | Timeout in milliseconds |
translation? | TranslationConfig | Translation configuration. |
wait? | boolean | When true, waits for transcription to complete before returning. Default false |
wait_options? | WaitOptions | Options for waiting (only used when wait=true). |
webhook_auth_header_name? | string | Name of the authentication header sent with webhook notifications. Max Length 256 |
webhook_auth_header_value? | string | Authentication header value sent with webhook notifications. Max Length 256 |
webhook_query? | string | URLSearchParams | Record<string, string> | Query parameters to append to the webhook URL. Useful for encoding metadata like transcription ID in the webhook callback. Can be a string, URLSearchParams, or Record<string, string>. |
webhook_url? | string | URL to receive webhook notifications when transcription is completed or fails. Max Length 256 |
TranscribeFromFile
Transcribe from a direct file upload (Buffer, Uint8Array, Blob, or ReadableStream)
Type Declaration
| Name | Type | Description |
|---|---|---|
audio_url? | never | - |
file | UploadFileInput | File data to upload and transcribe. |
file_id? | never | - |
filename? | string | - |
TranscribeFromFileId
Transcribe from a previously uploaded file
Type Declaration
| Name | Type | Description |
|---|---|---|
audio_url? | never | - |
file? | never | - |
file_id | string | ID of a previously uploaded file. Format uuid |
filename? | never | - |
TranscribeFromFileIdOptions
Options for transcribing from an uploaded file ID via transcribeFromFileId.
TranscribeFromFileOptions
Options for transcribing from a file via transcribeFromFile.
TranscribeFromUrl
Transcribe from a publicly accessible audio URL
Type Declaration
| Name | Type | Description |
|---|---|---|
audio_url | string | URL of a publicly accessible audio file. Max Length 4096 |
file? | never | - |
file_id? | never | - |
filename? | never | - |
TranscribeFromUrlOptions
Options for transcribing from a URL via transcribeFromUrl.
TranscribeOptions
Options for the unified transcribe method
Exactly one audio source must be provided: file, file_id, or audio_url
TranscriptResponse
Response from getting a transcription transcript.
Properties
| Property | Type | Description |
|---|---|---|
id | string | Unique identifier of the transcription this transcript belongs to. Format uuid |
text | string | Complete transcribed text content. |
tokens | TranscriptToken[] | List of detailed token information with timestamps and metadata. |
TranscriptSegment
A segment of contiguous tokens grouped by speaker and language
Properties
| Property | Type | Description |
|---|---|---|
end_ms? | number | End time of the segment in milliseconds (from last token). Absent for translation-only segments where the underlying tokens carry no timestamps. |
language? | string | Detected language code (if language identification was enabled). |
speaker? | string | Speaker identifier (if speaker diarization was enabled). |
start_ms? | number | Start time of the segment in milliseconds (from first token). Absent for translation-only segments where the underlying tokens carry no timestamps. |
text | string | Concatenated text of all tokens in this segment. |
tokens | TranscriptToken[] | Original tokens in this segment. |
TranscriptToken
A single token from the transcript with timing and confidence information.
Properties
| Property | Type | Description |
|---|---|---|
confidence | number | Confidence score for this token (0.0 to 1.0). |
end_ms? | number | End time of the token in milliseconds. Present on original tokens (translation_status of 'original' or 'none') and absent on translation tokens (translation_status: 'translation'), which do not carry timing. |
is_audio_event? | boolean | null | Whether this token represents an audio event. |
language? | string | null | Language code for this token. For original tokens (translation_status of 'original' or 'none') this is the spoken language. For translation tokens (translation_status: 'translation') this is the target language. Present on every token whenever language identification or translation is configured. |
source_language? | string | null | Source language for translation tokens (translation_status: 'translation'). Identifies the language being translated from. Not set on original or 'none' tokens; their language is in TranscriptToken.language. |
speaker? | string | null | Speaker identifier (if speaker diarization was enabled). |
start_ms? | number | Start time of the token in milliseconds. Present on original tokens (translation_status of 'original' or 'none') and absent on translation tokens (translation_status: 'translation'), which do not carry timing. |
text | string | The text content of this token. |
translation_status? | "none" | "original" | "translation" | null | Translation status for this token. |
TranscriptionContext
Additional context to improve transcription and translation accuracy. All sections are optional - include only what's relevant for your use case.
Properties
| Property | Type | Description |
|---|---|---|
general? | ContextGeneralEntry[] | Structured key-value pairs describing domain, topic, intent, participant names, etc. |
terms? | string[] | Domain-specific or uncommon words to recognize. |
text? | string | Longer free-form background text, prior interaction history, reference documents, or meeting notes. |
translation_terms? | ContextTranslationTerm[] | Custom translations for ambiguous terms. |
TranscriptionIdentifier
Transcription identifier - either a string ID or an object with an id property.
TranscriptionStatus
Status of a transcription request.
TranscriptionsCountResponse
Total number of transcriptions, split by request scope.
Properties
TranslateAudioSource
Audio source for SonioxSttApi.translate. Exactly one of
file, file_id, or audio_url must be provided.
Type Declaration
| Name | Type | Description |
|---|---|---|
audio_url? | never | - |
file | UploadFileInput | File data to upload and translate. |
file_id? | never | - |
filename? | string | - |
| Name | Type | Description |
|---|---|---|
audio_url? | never | - |
file? | never | - |
file_id | string | ID of a previously uploaded file. Format uuid |
filename? | never | - |
| Name | Type | Description |
|---|---|---|
audio_url | string | URL of a publicly accessible audio file. Max Length 4096 |
file? | never | - |
file_id? | never | - |
filename? | never | - |
TranslateBaseOptions
Common (non-mode, non-source) options shared by every translate call.
Properties
| Property | Type | Description |
|---|---|---|
cleanup? | CleanupTarget[] | Resources to clean up after translation completes or on error/timeout. |
client_reference_id? | string | Optional tracking identifier. Max Length 256 |
context? | TranscriptionContext | Additional context to improve transcription and translation accuracy. |
enable_speaker_diarization? | boolean | Enable speaker diarization to identify different speakers. |
fetch_translation? | boolean | When true (default), fetches and reshapes the translation result when wait=true and the job completes successfully. Default true |
model? | string | Speech-to-text model to use. Default 'stt-async-v5' Max Length 32 |
signal? | AbortSignal | AbortSignal to cancel the operation. |
timeout_ms? | number | Timeout in milliseconds. |
wait? | boolean | When true, waits for translation to complete before returning. Default false |
wait_options? | WaitOptions | Options for waiting on completion. |
webhook_auth_header_name? | string | Name of the authentication header sent with webhook notifications. Max Length 256 |
webhook_auth_header_value? | string | Authentication header value sent with webhook notifications. Max Length 256 |
webhook_query? | string | URLSearchParams | Record<string, string> | Query parameters to append to the webhook URL. |
webhook_url? | string | URL to receive webhook notifications when translation is completed or fails. Max Length 256 |
TranslateFromTranscriptMode
Mode parameter accepted by translateFromTranscript.
The async translate() method stores this internally on the returned job;
webhook handlers (and other callers that already have a transcript in hand)
supply it directly.
TranslateMode
Shorthand specification of the translation direction(s) for SonioxSttApi.translate.
Three mutually exclusive shapes:
{ to }— one-way translation intoto. Source language(s) are detected automatically.{ to, from }— one-way translation fromfromtoto. The source language is hinted to the model.{ between: [a, b] }— two-way translation betweenaandb. Each side is translated into the other; speech in any third language is passed through as-is.
TranslateOptions
Options for SonioxSttApi.translate.
Combines a TranslateMode (the translation direction shorthand), a TranslateAudioSource (file, file_id, or audio_url), and TranslateBaseOptions.
TranslationConfig
Translation configuration.
TranslationSegment
A grouped pair of original speech and (optionally) its translation, derived from the underlying transcript tokens.
In one-way mode every segment that originated from speech in the source
language carries both original_* and translation_* fields. In two-way
mode the same is true for the two configured languages; speech in a third
language flows through with translation_status: 'none' and the
translation fields are omitted.
Properties
| Property | Type | Description |
|---|---|---|
end_ms? | number | End time of the segment in milliseconds, taken from the last original token. Absent when the segment has no original tokens. |
from | string | Source language code. Derived from original_tokens[0].language when originals are present, otherwise from translation_tokens[0].source_language. |
original_text | string | Concatenated text of original_tokens. |
original_tokens | TranscriptToken[] | Original tokens (translation_status of 'original' or 'none') for this segment, in order. |
speaker? | string | Speaker identifier (when speaker diarization is enabled). |
start_ms? | number | Start time of the segment in milliseconds, taken from the first original token. Absent when the segment has no original tokens. |
to? | string | Target language code. Omitted when there are no translation tokens (e.g. third-language pass-through under between). |
translation_text? | string | Concatenated text of translation_tokens. Omitted when there are no translation tokens. |
translation_tokens? | TranscriptToken[] | Translation tokens (translation_status: 'translation') for this segment, in order. Omitted when there are no translation tokens. |
TtsAudioFormat
Supported audio formats for Text-to-Speech output.
TtsConnectionEvents
Events emitted by a TTS WebSocket connection.
Properties
| Property | Type | Description |
|---|---|---|
close | () => void | The WebSocket connection was closed. |
error | (error) => void | A connection-level error occurred. Always a RealtimeError subclass (e.g. ConnectionError, NetworkError, AuthError). |
TtsConnectionOptions
Options for creating a TTS connection.
Properties
TtsEvent
Raw JSON event received from the TTS WebSocket server.
Properties
| Property | Type |
|---|---|
audio? | string |
audio_end? | boolean |
error_code? | number |
error_message? | string |
stream_id? | string |
terminated? | boolean |
TtsLanguage
A language supported by a Text-to-Speech model.
Properties
TtsModel
A Text-to-Speech model.
Properties
| Property | Type | Description |
|---|---|---|
aliased_model_id? | string | null | If this is an alias, the id of the aliased model. |
id | string | Unique identifier of the model. |
languages | TtsLanguage[] | Languages supported by this model. |
name | string | Name of the model. |
voices | TtsVoice[] | Voices supported by this model. |
TtsStreamConfig
Fully resolved TTS stream config sent over the WebSocket. All required fields are present after merging input with defaults.
Properties
| Property | Type |
|---|---|
audio_format | string |
bitrate? | number |
language | string |
model | string |
sample_rate? | number |
stream_id | string |
voice | string |
TtsStreamEvents
Events emitted by a TTS stream.
Properties
| Property | Type | Description |
|---|---|---|
audio | (chunk) => void | Decoded audio chunk received. |
audioEnd | () => void | Server marked the final audio payload for this stream. |
error | (error) => void | A stream-level error occurred. Always a RealtimeError subclass mapped from the server error_code / error_message. |
terminated | () => void | Stream has been fully terminated by the server. |
TtsStreamInput
Input for creating a TTS stream. All fields are optional and are merged
with tts_defaults from the resolved connection config. After merging,
model, language, voice, and audio_format must be present.
Properties
| Property | Type | Description |
|---|---|---|
audio_format? | TtsAudioFormat | Output audio format Example 'wav' |
bitrate? | number | Codec bitrate in bps (for compressed formats). |
language? | string | Language code for speech generation. Example 'en' |
model? | string | Text-to-Speech model to use. Example 'tts-rt-v1' |
sample_rate? | number | Output sample rate in Hz. Required for raw PCM formats. |
stream_id? | string | Client-generated stream identifier. Must be unique among active streams on the same connection. Auto-generated if omitted. |
voice? | string | Voice identifier. Example 'Adrian' |
TtsStreamState
Lifecycle states for a TTS stream.
TtsVoice
A Text-to-Speech voice.
Properties
| Property | Type | Description |
|---|---|---|
description | string | Human-readable voice description. |
gender | TtsVoiceGender | Voice gender metadata. |
id | string | Unique identifier of the voice. |
TtsVoiceGender
Voice gender metadata returned by the TTS models API.
TwoWayTranslation
Result of a two-way translation ({ between } mode).
No flat original_text / translation_text strings are exposed because
which side is "original" depends on the segment. Read segments and
filter / format per from / to as needed.
Properties
| Property | Type | Description |
|---|---|---|
duration_ms | number | Total audio duration in milliseconds. Equals the largest end_ms across all original tokens, or 0 when there are no original tokens. |
language_a | string | First configured language (the between[0] value). |
language_b | string | Second configured language (the between[1] value). |
mode | "two_way" | - |
segments | TranslationSegment[] | Per-utterance segments in audio order. |
TwoWayTranslationConfig
Two-way translation configuration. Translates between two specified languages.
Properties
| Property | Type | Description |
|---|---|---|
language_a | string | First language code. |
language_b | string | Second language code. |
type | "two_way" | Translation type. |
UploadFileInput
Supported input types for file upload
UploadFileOptions
Options for uploading a file
Properties
UsageLogsSort
Sort order for usage logs.
WaitOptions
Options for polling/waiting for transcription completion.
Properties
WebhookAuthConfig
Authentication configuration for webhook verification
Properties
| Property | Type | Description |
|---|---|---|
name | string | Expected header name (case-insensitive comparison) |
value | string | Expected header value (exact match) |
WebhookEvent
Webhook event payload sent by Soniox when a transcription completes or fails.
Properties
| Property | Type | Description |
|---|---|---|
id | string | Transcription ID Format uuid |
status | WebhookEventStatus | Transcription result status |
WebhookEventStatus
Webhook event status values
WebhookHandlerResult
Result of webhook handling
Properties
| Property | Type | Description |
|---|---|---|
error? | string | Error message (only present when ok=false) |
event? | WebhookEvent | Parsed webhook event (only present when ok=true) |
ok | boolean | Whether the webhook was handled successfully |
status | number | HTTP status code to return |
WebhookHandlerResultWithFetch
Result of webhook handling with lazy fetch capabilities.
When using client.webhooks.handleExpress() (or other framework handlers),
the result includes helper methods to fetch the transcript or transcription.
Type Declaration
| Name | Type | Description |
|---|---|---|
fetchTranscript | | () => Promise<ISonioxTranscript | null> | undefined | Fetch the transcript for a completed transcription. Only available when ok=true and event.status='completed'. Example const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); } |
fetchTranscription | | () => Promise<ISonioxTranscription | null> | undefined | Fetch the full transcription object. Useful for both completed (metadata) and error (error details) statuses. Example const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'error') { const transcription = await result.fetchTranscription(); console.log(transcription?.error_message); } |
WebhookHeaders
Headers object type - supports both standard headers and record types
HttpClient
Pluggable HTTP client interface
Methods
request()
Perform an HTTP request
Type Parameters
| Type Parameter |
|---|
T |
Parameters
| Parameter | Type | Description |
|---|---|---|
request | HttpRequest | Request configuration |
Returns
Promise<HttpResponse<T>>
Promise resolving to the response
Throws
SonioxHttpError On network errors, timeouts, HTTP errors, or parse errors
HttpErrorDetails
Error details for SonioxHttpError
Properties
| Property | Type | Description |
|---|---|---|
bodyText? | string | Response body text (capped at 4KB) |
cause? | unknown | - |
code | HttpErrorCode | - |
headers? | Record<string, string> | - |
message | string | - |
method | HttpMethod | - |
statusCode? | number | - |
url | string | - |
HttpRequest
HTTP request configuration
Properties
| Property | Type | Description |
|---|---|---|
body? | HttpRequestBody | Request body |
headers? | Record<string, string> | Request headers |
method | HttpMethod | HTTP method |
path | string | URL path (relative to baseUrl) or absolute URL |
query? | QueryParams | Query parameters (will be URL-encoded) |
responseType? | HttpResponseType | Expected response type Default 'json' |
signal? | AbortSignal | Optional AbortSignal for request cancellation If provided along with timeoutMs, both will be respected |
timeoutMs? | number | Request timeout in milliseconds If not specified, uses the client's default timeout |
HttpResponse<T>
HTTP response from the client
Type Parameters
| Type Parameter |
|---|
T |
Properties
| Property | Type | Description |
|---|---|---|
data | T | Parsed response data |
headers | Record<string, string> | Response headers (normalized to lowercase keys) |
status | number | HTTP status code |
ISonioxTranscript
Type contract for SonioxTranscript class.
See
SonioxTranscript for full documentation.
Methods
segments()
Parameters
| Parameter | Type |
|---|---|
options? | SegmentTranscriptOptions |
Returns
Properties
| Property | Type |
|---|---|
id | string |
text | string |
tokens | TranscriptToken[] |
ISonioxTranscription
Type contract for SonioxTranscription class.
See
SonioxTranscription for full documentation.
Extended by
Methods
delete()
Returns
Promise<void>
destroy()
Returns
Promise<void>
getTranscript()
Parameters
| Parameter | Type |
|---|---|
options? | { force?: boolean; signal?: AbortSignal; } |
options.force? | boolean |
options.signal? | AbortSignal |
Returns
Promise<ISonioxTranscript | null>
refresh()
Parameters
| Parameter | Type |
|---|---|
signal? | AbortSignal |
Returns
Promise<ISonioxTranscription>
toJSON()
Returns
wait()
Parameters
| Parameter | Type |
|---|---|
options? | WaitOptions |
Returns
Promise<ISonioxTranscription>
Properties
| Property | Type |
|---|---|
audio_duration_ms | number | null | undefined |
audio_url | string | null | undefined |
client_reference_id | string | null | undefined |
context | | TranscriptionContext | null | undefined |
created_at | string |
enable_language_identification | boolean |
enable_speaker_diarization | boolean |
error_message | string | null | undefined |
error_type | string | null | undefined |
file_id | string | null | undefined |
filename | string |
id | string |
language_hints | string[] | undefined |
model | string |
status | TranscriptionStatus |
transcript | ISonioxTranscript | null | undefined |
webhook_auth_header_name | string | null | undefined |
webhook_auth_header_value | string | null | undefined |
webhook_status_code | number | null | undefined |
webhook_url | string | null | undefined |
ISonioxTranslationJob
Type contract for SonioxTranslationJob class.
Extends
Methods
delete()
Returns
Promise<void>
Inherited from
destroy()
Returns
Promise<void>
Inherited from
fetchTranslation()
Parameters
| Parameter | Type |
|---|---|
options? | { force?: boolean; signal?: AbortSignal; } |
options.force? | boolean |
options.signal? | AbortSignal |
Returns
Promise<SonioxTranslation | null>
getTranscript()
Parameters
| Parameter | Type |
|---|---|
options? | { force?: boolean; signal?: AbortSignal; } |
options.force? | boolean |
options.signal? | AbortSignal |
Returns
Promise<ISonioxTranscript | null>
Inherited from
ISonioxTranscription.getTranscript
getTranslation()
Parameters
| Parameter | Type |
|---|---|
options? | { force?: boolean; signal?: AbortSignal; } |
options.force? | boolean |
options.signal? | AbortSignal |
Returns
Promise<SonioxTranslation | null>
refresh()
Parameters
| Parameter | Type |
|---|---|
signal? | AbortSignal |
Returns
Promise<ISonioxTranslationJob>
Overrides
toJSON()
Returns
Overrides
wait()
Parameters
| Parameter | Type |
|---|---|
options? | WaitOptions |
Returns
Promise<ISonioxTranslationJob>
Overrides
Properties
| Property | Type |
|---|---|
audio_duration_ms | number | null | undefined |
audio_url | string | null | undefined |
client_reference_id | string | null | undefined |
context | | TranscriptionContext | null | undefined |
created_at | string |
enable_language_identification | boolean |
enable_speaker_diarization | boolean |
error_message | string | null | undefined |
error_type | string | null | undefined |
file_id | string | null | undefined |
filename | string |
id | string |
language_hints | string[] | undefined |
model | string |
status | TranscriptionStatus |
transcript | ISonioxTranscript | null | undefined |
translation | SonioxTranslation | null | undefined |
webhook_auth_header_name | string | null | undefined |
webhook_auth_header_value | string | null | undefined |
webhook_status_code | number | null | undefined |
webhook_url | string | null | undefined |
translateFromTranscript()
Reshape a transcript produced by a translation-enabled transcription into a structured SonioxTranslation result.
This is the same logic SonioxTranslationJob.getTranslation() applies.
Use it directly in webhook handlers or anywhere else you already have a
transcript in hand.
Parameters
| Parameter | Type | Description |
|---|---|---|
transcript | TranscriptLike | Transcript (or any object with a tokens array) emitted for a translation-enabled transcription. |
mode | TranslateFromTranscriptMode | Whether to reshape as one-way or two-way; the discriminator tells the helper which result shape to produce. |
Returns
A SonioxTranslation keyed on mode.
Example