Types

AudioData

type AudioData = Buffer | Uint8Array | ArrayBuffer;

Audio data types accepted by sendAudio. In Node.js, Buffer is also accepted since Buffer extends Uint8Array.

AudioFormat

type AudioFormat = 
  | "pcm_s8"
  | "pcm_s8le"
  | "pcm_s8be"
  | "pcm_s16le"
  | "pcm_s16be"
  | "pcm_s24le"
  | "pcm_s24be"
  | "pcm_s32le"
  | "pcm_s32be"
  | "pcm_u8"
  | "pcm_u8le"
  | "pcm_u8be"
  | "pcm_u16le"
  | "pcm_u16be"
  | "pcm_u24le"
  | "pcm_u24be"
  | "pcm_u32le"
  | "pcm_u32be"
  | "pcm_f32le"
  | "pcm_f32be"
  | "pcm_f64le"
  | "pcm_f64be"
  | "mulaw"
  | "alaw"
  | "aac"
  | "aiff"
  | "amr"
  | "asf"
  | "wav"
  | "mp3"
  | "flac"
  | "ogg"
  | "webm";

Supported audio formats for real-time transcription.

CleanupTarget

type CleanupTarget = "file" | "transcription";

Resource types that can be cleaned up after transcription completes.

'file' - The uploaded file
'transcription' - The transcription record

ContextGeneralEntry

type ContextGeneralEntry = {
  key: string;
  value: string;
};

Key-value pair for general context information.

Properties

Property	Type	Description
`key`	`string`	The key describing the context type (e.g., "domain", "topic", "doctor").
`value`	`string`	The value for the context key.

ContextTranslationTerm

type ContextTranslationTerm = {
  source: string;
  target: string;
};

Custom translation term mapping.

Properties

Property	Type	Description
`source`	`string`	The source term to translate.
`target`	`string`	The target translation for the term.

CreateTranscriptionOptions

type CreateTranscriptionOptions = {
  audio_url?: string;
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  file_id?: string;
  language_hints?: string[];
  language_hints_strict?: boolean;
  model: string;
  translation?: TranslationConfig;
  webhook_auth_header_name?: string;
  webhook_auth_header_value?: string;
  webhook_url?: string;
};

Options for creating a transcription.

Properties

Property	Type	Description
`audio_url?`	`string`	URL of a publicly accessible audio file. Max Length 4096
`client_reference_id?`	`string`	Optional tracking identifier. Max Length 256
`context?`	`TranscriptionContext`	Additional context to improve transcription accuracy and formatting of specialized terms.
`enable_language_identification?`	`boolean`	Enable automatic language identification.
`enable_speaker_diarization?`	`boolean`	Enable speaker diarization to identify different speakers.
`file_id?`	`string`	ID of a previously uploaded file. Format uuid
`language_hints?`	`string`[]	Array of expected ISO language codes to bias recognition.
`language_hints_strict?`	`boolean`	When true, model relies more heavily on language hints.
`model`	`string`	Speech-to-text model to use. Max Length 32
`translation?`	`TranslationConfig`	Translation configuration.
`webhook_auth_header_name?`	`string`	Name of the authentication header sent with webhook notifications. Max Length 256
`webhook_auth_header_value?`	`string`	Authentication header value sent with webhook notifications. Max Length 256
`webhook_url?`	`string`	URL to receive webhook notifications when transcription is completed or fails. Max Length 256

DeleteAllFilesOptions

type DeleteAllFilesOptions = {
  signal?: AbortSignal;
};

Options for purging all files.

Properties

Property	Type	Description
`signal?`	`AbortSignal`	AbortSignal for cancelling the delete_all operation.

DeleteAllTranscriptionsOptions

type DeleteAllTranscriptionsOptions = {
  on_progress?: (transcription, index) => void;
  signal?: AbortSignal;
};

Options for deleting all transcriptions.

Properties

Property	Type	Description
`on_progress?`	(`transcription`, `index`) => `void`	Callback invoked before each transcription is deleted. Receives the transcription data and its 0-based index.
`signal?`	`AbortSignal`	AbortSignal for cancelling the delete_all operation.

ExpressLikeRequest

type ExpressLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

Express/Connect-style request object

Properties

Property	Type
`body?`	`unknown`
`headers`	`Record`<`string`, `string` \| `string`[] \| `undefined`>
`method`	`string`

FastifyLikeRequest

type FastifyLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

Fastify-style request object

Properties

Property	Type
`body?`	`unknown`
`headers`	`Record`<`string`, `string` \| `string`[] \| `undefined`>
`method`	`string`

FileIdentifier

type FileIdentifier = 
  | string
  | {
  id: string;
};

File identifier - either a string ID or an object with an id property.

HandleWebhookOptions

type HandleWebhookOptions = {
  auth?: WebhookAuthConfig;
  body: unknown;
  headers: WebhookHeaders;
  method: string;
};

Options for the handleWebhook function

Properties

Property	Type	Description
`auth?`	`WebhookAuthConfig`	Optional authentication configuration
`body`	`unknown`	Request body (parsed JSON or raw string)
`headers`	`WebhookHeaders`	Request headers
`method`	`string`	HTTP method of the request

HonoLikeContext

type HonoLikeContext = {
  req: {
     method: string;
     header: string | undefined;
     json: Promise<unknown>;
  };
};

Hono context object

Properties

Property	Type
`req`	{ `method`: `string`; `header`: `string` \| `undefined`; `json`: `Promise`<`unknown`>; }
`req.method`	`string`
`req.header`	`string` \| `undefined`
`req.json`	`Promise`<`unknown`>

HttpErrorCode

type HttpErrorCode = "network_error" | "timeout" | "aborted" | "http_error" | "parse_error";

Error codes for HTTP client errors

HttpMethod

type HttpMethod = "GET" | "POST" | "PUT" | "PATCH" | "DELETE" | "HEAD";

HTTP methods supported by the client

HttpRequestBody

type HttpRequestBody = 
  | string
  | Record<string, unknown>
  | ArrayBuffer
  | Uint8Array
  | FormData
  | null;

Request body types

HttpResponseType

type HttpResponseType = "json" | "text" | "arrayBuffer";

Response types

ListFilesOptions

type ListFilesOptions = {
  cursor?: string;
  limit?: number;
  signal?: AbortSignal;
};

Options for listing files.

Properties

Property	Type	Description
`cursor?`	`string`	Pagination cursor for the next page of results.
`limit?`	`number`	Maximum number of files to return. Default `1000` Minimum 1 Maximum 1000
`signal?`	`AbortSignal`	AbortSignal for cancelling the request

ListFilesResponse<T>

type ListFilesResponse<T> = {
  files: T[];
  next_page_cursor: string | null;
};

Response from listing files.

Type Parameters

Type Parameter
`T`

Properties

Property	Type	Description
`files`	`T`[]	List of uploaded files.
`next_page_cursor`	`string` \| `null`	A pagination token that references the next page of results. When null, no additional results are available.

ListTranscriptionsOptions

type ListTranscriptionsOptions = {
  cursor?: string;
  limit?: number;
};

Options for listing transcriptions

Properties

Property	Type	Description
`cursor?`	`string`	Pagination cursor for the next page of results
`limit?`	`number`	Maximum number of transcriptions to return. Default `1000` Minimum 1 Maximum 1000

ListTranscriptionsResponse<T>

type ListTranscriptionsResponse<T> = {
  next_page_cursor: string | null;
  transcriptions: T[];
};

Response from listing transcriptions.

Type Parameters

Type Parameter
`T`

Properties

Property	Type	Description
`next_page_cursor`	`string` \| `null`	A pagination token that references the next page of results. When null, no additional results are available. TODO: potentially can be undefined?
`transcriptions`	`T`[]	List of transcriptions.

NestJSLikeRequest

type NestJSLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

NestJS-style request object (uses Express under the hood by default)

Properties

Property	Type
`body?`	`unknown`
`headers`	`Record`<`string`, `string` \| `string`[] \| `undefined`>
`method`	`string`

OneWayTranslationConfig

type OneWayTranslationConfig = {
  target_language: string;
  type: "one_way";
};

One-way translation configuration. Translates all spoken languages into a single target language.

Properties

Property	Type	Description
`target_language`	`string`	Target language code for translation (e.g., "fr", "es", "de").
`type`	`"one_way"`	Translation type.

QueryParams

type QueryParams = Record<string, string | number | boolean | undefined>;

Query parameters

RealtimeClientOptions

type RealtimeClientOptions = {
  api_key: string;
  default_session_options?: SttSessionOptions;
  ws_base_url: string;
};

Real-time API configuration options for the client.

Properties

Property	Type	Description
`api_key`	`string`	API key for real-time sessions.
`default_session_options?`	`SttSessionOptions`	Default session options applied to all real-time sessions. Can be overridden per-session.
`ws_base_url`	`string`	WebSocket base URL for real-time connections. Default `'wss://stt-rt.soniox.com/transcribe-websocket'`

RealtimeErrorCode

type RealtimeErrorCode = 
  | "auth_error"
  | "bad_request"
  | "quota_exceeded"
  | "connection_error"
  | "network_error"
  | "aborted"
  | "state_error"
  | "realtime_error";

Error codes for Real-time (WebSocket) API errors

RealtimeEvent

type RealtimeEvent = 
  | {
  data: RealtimeResult;
  kind: "result";
}
  | {
  kind: "endpoint";
}
  | {
  kind: "finalized";
}
  | {
  kind: "finished";
};

Typed event for async iterator consumption.

RealtimeOptions

type RealtimeOptions = {
  default_session_options?: SttSessionOptions;
  ws_base_url?: string;
};

Real-time configuration options for the main client.

Properties

Property	Type	Description
`default_session_options?`	`SttSessionOptions`	Default session options applied to all real-time sessions. Can be overridden per-session.
`ws_base_url?`	`string`	WebSocket base URL for real-time connections. Falls back to SONIOX_WS_URL environment variable, then to 'wss://stt-rt.soniox.com/transcribe-websocket'.

RealtimeResult

type RealtimeResult = {
  final_audio_proc_ms: number;
  finished?: boolean;
  tokens: RealtimeToken[];
  total_audio_proc_ms: number;
};

A result message from the real-time WebSocket.

Properties

Property	Type	Description
`final_audio_proc_ms`	`number`	Milliseconds of audio that have been finalized.
`finished?`	`boolean`	Whether this is the final result (session ending).
`tokens`	`RealtimeToken`[]	Tokens in this result.
`total_audio_proc_ms`	`number`	Total milliseconds of audio processed.

RealtimeSegment

type RealtimeSegment = {
  end_ms?: number;
  language?: string;
  speaker?: string;
  start_ms?: number;
  text: string;
  tokens: RealtimeToken[];
};

A segment of contiguous real-time tokens grouped by speaker/language.

Properties

Property	Type	Description
`end_ms?`	`number`	End time of the segment in milliseconds (from last token).
`language?`	`string`	Detected language code (if language identification enabled).
`speaker?`	`string`	Speaker identifier (if diarization enabled).
`start_ms?`	`number`	Start time of the segment in milliseconds (from first token).
`text`	`string`	Concatenated text of all tokens in this segment.
`tokens`	`RealtimeToken`[]	Original tokens in this segment.

RealtimeSegmentBufferOptions

type RealtimeSegmentBufferOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
  max_ms?: number;
  max_tokens?: number;
};

Options for rolling real-time segmentation buffers.

Properties

Property	Type	Description
`final_only?`	`boolean`	When true, only tokens marked as final are buffered. Default `true`
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`
`max_ms?`	`number`	Maximum time window to keep in milliseconds (requires token timings).
`max_tokens?`	`number`	Maximum number of tokens to keep in the buffer. Default `2000`

RealtimeSegmentOptions

type RealtimeSegmentOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
};

Options for segmenting real-time tokens.

Properties

Property	Type	Description
`final_only?`	`boolean`	When true, only tokens marked as final are included. Default `false`
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`

RealtimeToken

type RealtimeToken = {
  confidence: number;
  end_ms?: number;
  is_final: boolean;
  language?: string;
  source_language?: string;
  speaker?: string;
  start_ms?: number;
  text: string;
  translation_status?: "none" | "original" | "translation";
};

A single token from the real-time transcription.

Properties

Property	Type	Description
`confidence`	`number`	Confidence score (0.0 to 1.0).
`end_ms?`	`number`	End time in milliseconds relative to audio start.
`is_final`	`boolean`	Whether this is a finalized token.
`language?`	`string`	Detected language code (if language identification enabled).
`source_language?`	`string`	Source language for translated tokens.
`speaker?`	`string`	Speaker identifier (if diarization enabled).
`start_ms?`	`number`	Start time in milliseconds relative to audio start.
`text`	`string`	The transcribed text.
`translation_status?`	`"none"` \| `"original"` \| `"translation"`	Translation status of this token.

RealtimeUtterance

type RealtimeUtterance = {
  end_ms?: number;
  final_audio_proc_ms?: number;
  language?: string;
  segments: RealtimeSegment[];
  speaker?: string;
  start_ms?: number;
  text: string;
  tokens: RealtimeToken[];
  total_audio_proc_ms?: number;
};

A single utterance built from real-time segments.

Properties

Property	Type	Description
`end_ms?`	`number`	End time of the utterance in milliseconds (from last segment).
`final_audio_proc_ms?`	`number`	Milliseconds of audio that have been finalized at flush time.
`language?`	`string`	Detected language code when consistent across segments.
`segments`	`RealtimeSegment`[]	Segments included in this utterance.
`speaker?`	`string`	Speaker identifier when consistent across segments.
`start_ms?`	`number`	Start time of the utterance in milliseconds (from first segment).
`text`	`string`	Concatenated text of all segments in this utterance.
`tokens`	`RealtimeToken`[]	Tokens included in this utterance.
`total_audio_proc_ms?`	`number`	Total milliseconds of audio processed at flush time.

RealtimeUtteranceBufferOptions

type RealtimeUtteranceBufferOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
  max_ms?: number;
  max_tokens?: number;
};

Options for buffering real-time utterances.

Properties

Property	Type	Description
`final_only?`	`boolean`	When true, only tokens marked as final are buffered. Default `true`
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`
`max_ms?`	`number`	Maximum time window to keep in milliseconds (requires token timings).
`max_tokens?`	`number`	Maximum number of tokens to keep in the buffer. Default `2000`

SegmentGroupKey

type SegmentGroupKey = "speaker" | "language";

Fields that can be used to group tokens into segments

SegmentTranscriptOptions

type SegmentTranscriptOptions = {
  group_by?: SegmentGroupKey[];
};

Options for segmenting a transcript

Properties

Property	Type	Description
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`

SendStreamOptions

type SendStreamOptions = {
  finish?: boolean;
  pace_ms?: number;
};

Options for streaming audio from an async iterable source.

Properties

Property	Type	Description
`finish?`	`boolean`	When true, calls finish() automatically after the stream ends. Default `false`
`pace_ms?`	`number`	Delay in milliseconds between sending chunks. Useful for simulating real-time pace when streaming pre-recorded files. Not needed for live audio sources.

SonioxErrorCode

type SonioxErrorCode = 
  | RealtimeErrorCode
  | "soniox_error"
  | HttpErrorCode;

All possible SDK error codes (core real-time + HTTP-specific codes)

SonioxFileData

type SonioxFileData = {
  client_reference_id?: string | null;
  created_at: string;
  filename: string;
  id: string;
  size: number;
};

Raw file metadata from the API.

Properties

Property	Type	Description
`client_reference_id?`	`string` \| `null`	Optional tracking identifier string.
`created_at`	`string`	UTC timestamp indicating when the file was uploaded. Format date-time
`filename`	`string`	Name of the file.
`id`	`string`	Unique identifier of the file. Format uuid
`size`	`number`	Size of the file in bytes.

SonioxLanguage

type SonioxLanguage = {
  code: string;
  name: string;
};

Properties

Property	Type	Description
`code`	`string`	2-letter language code.
`name`	`string`	Language name.

SonioxModel

type SonioxModel = {
  aliased_model_id: string | null;
  context_version: number | null;
  id: string;
  languages: SonioxLanguage[];
  name: string;
  one_way_translation: string | null;
  supports_language_hints_strict: boolean;
  supports_max_endpoint_delay: boolean;
  transcription_mode: SonioxTranscriptionMode;
  translation_targets: SonioxTranslationTarget[];
  two_way_translation: string | null;
  two_way_translation_pairs: string[];
};

Properties

Property	Type	Description
`aliased_model_id`	`string` \| `null`	If this is an alias, the id of the aliased model. Null for non-alias models.
`context_version`	`number` \| `null`	Version of context supported.
`id`	`string`	Unique identifier of the model.
`languages`	`SonioxLanguage`[]	List of languages supported by the model.
`name`	`string`	Name of the model.
`one_way_translation`	`string` \| `null`	When contains string 'all_languages', any laguage from languages can be used
`supports_language_hints_strict`	`boolean`	TODO: Add documentation
`supports_max_endpoint_delay`	`boolean`	-
`transcription_mode`	`SonioxTranscriptionMode`	Transcription mode of the model.
`translation_targets`	`SonioxTranslationTarget`[]	List of supported one-way translation targets. If list is empty, check for one_way_translation field
`two_way_translation`	`string` \| `null`	When contains string 'all_languages',' any laguage pair from languages can be used
`two_way_translation_pairs`	`string`[]	List of supported two-way translation pairs. If list is empty, check for two_way_translation field

SonioxNodeClientOptions

type SonioxNodeClientOptions = {
  api_key?: string;
  base_url?: string;
  http_client?: HttpClient;
  realtime?: RealtimeOptions;
};

Properties

Property	Type	Description
`api_key?`	`string`	API key for authentication. Falls back to SONIOX_API_KEY environment variable if not provided.
`base_url?`	`string`	Base URL for the REST API. Falls back to SONIOX_API_BASE_URL environment variable, then to 'https://api.soniox.com'.
`http_client?`	`HttpClient`	Custom HTTP client implementation.
`realtime?`	`RealtimeOptions`	Real-time API configuration options.

SonioxTranscriptionData

type SonioxTranscriptionData = {
  audio_duration_ms?: number | null;
  audio_url?: string | null;
  client_reference_id?: string | null;
  context?: TranscriptionContext | null;
  created_at: string;
  enable_language_identification: boolean;
  enable_speaker_diarization: boolean;
  error_message?: string | null;
  error_type?: string | null;
  file_id?: string | null;
  filename: string;
  id: string;
  language_hints?: string[] | null;
  model: string;
  status: TranscriptionStatus;
  webhook_auth_header_name?: string | null;
  webhook_auth_header_value?: string | null;
  webhook_status_code?: number | null;
  webhook_url?: string | null;
};

Raw transcription metadata from the API.

Properties

Property	Type	Description
`audio_duration_ms?`	`number` \| `null`	Duration of the audio in milliseconds. Only available after processing begins.
`audio_url?`	`string` \| `null`	URL of the audio file being transcribed.
`client_reference_id?`	`string` \| `null`	Optional tracking identifier. Max Length 256
`context?`	`TranscriptionContext` \| `null`	Additional context provided for the transcription.
`created_at`	`string`	UTC timestamp when the transcription was created. Format date-time
`enable_language_identification`	`boolean`	When true, language is detected for each part of the transcription.
`enable_speaker_diarization`	`boolean`	When true, speakers are identified and separated in the transcription output.
`error_message?`	`string` \| `null`	Error message if transcription failed. Null for successful or in-progress transcriptions.
`error_type?`	`string` \| `null`	Error type if transcription failed. Null for successful or in-progress transcriptions.
`file_id?`	`string` \| `null`	ID of the uploaded file being transcribed. Format uuid
`filename`	`string`	Name of the file being transcribed.
`id`	`string`	Unique identifier of the transcription. Format uuid
`language_hints?`	`string`[] \| `null`	Expected languages in the audio. If not specified, languages are automatically detected.
`model`	`string`	Speech-to-text model used.
`status`	`TranscriptionStatus`	Current status of the transcription.
`webhook_auth_header_name?`	`string` \| `null`	Name of the authentication header sent with webhook notifications.
`webhook_auth_header_value?`	`string` \| `null`	Authentication header value. Always returned masked.
`webhook_status_code?`	`number` \| `null`	HTTP status code received from your server when webhook was delivered. Null if not yet sent.
`webhook_url?`	`string` \| `null`	URL to receive webhook notifications when transcription is completed or fails.

SonioxTranscriptionMode

type SonioxTranscriptionMode = "real_time" | "async";

Transcription mode of the model.

SonioxTranslationTarget

type SonioxTranslationTarget = {
  exclude_source_languages: string[];
  source_languages: string[];
  target_language: string;
};

Properties

Property	Type
`exclude_source_languages`	`string`[]
`source_languages`	`string`[]
`target_language`	`string`

SttSessionConfig

type SttSessionConfig = {
  audio_format?: "auto" | AudioFormat;
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_endpoint_detection?: boolean;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  language_hints?: string[];
  language_hints_strict?: boolean;
  max_endpoint_delay_ms?: number;
  model: string;
  num_channels?: number;
  sample_rate?: number;
  translation?: TranslationConfig;
};

Configuration sent to the Soniox WebSocket API when starting a session.

Properties

Property	Type	Description
`audio_format?`	`"auto"` \| `AudioFormat`	Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default `'auto'`
`client_reference_id?`	`string`	Optional tracking identifier (max 256 chars).
`context?`	`TranscriptionContext`	Additional context to improve transcription accuracy.
`enable_endpoint_detection?`	`boolean`	Enable endpoint detection for utterance boundaries. Useful for voice AI agents.
`enable_language_identification?`	`boolean`	Enable automatic language detection.
`enable_speaker_diarization?`	`boolean`	Enable speaker identification.
`language_hints?`	`string`[]	Expected languages in the audio (ISO language codes).
`language_hints_strict?`	`boolean`	When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee.
`max_endpoint_delay_ms?`	`number`	Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms
`model`	`string`	Speech-to-text model to use.
`num_channels?`	`number`	Number of audio channels (required for raw audio formats).
`sample_rate?`	`number`	Sample rate in Hz (required for PCM formats).
`translation?`	`TranslationConfig`	Translation configuration.

SttSessionEvents

type SttSessionEvents = {
  connected: () => void;
  disconnected: (reason?) => void;
  endpoint: () => void;
  error: (error) => void;
  finalized: () => void;
  finished: () => void;
  result: (result) => void;
  state_change: (update) => void;
  token: (token) => void;
};

Event handlers for the STT session.

Properties

Property	Type	Description
`connected`	() => `void`	Session connected and ready.
`disconnected`	(`reason?`) => `void`	Session disconnected.
`endpoint`	() => `void`	Endpoint detected (<end> token).
`error`	(`error`) => `void`	Error occurred.
`finalized`	() => `void`	Finalization complete (<fin> token).
`finished`	() => `void`	Session finished (server signaled end of stream).
`result`	(`result`) => `void`	Parsed result received.
`state_change`	(`update`) => `void`	Session state transition.
`token`	(`token`) => `void`	Individual token received.

SttSessionOptions

type SttSessionOptions = {
  keepalive_interval_ms?: number;
  signal?: AbortSignal;
};

SDK-level session options (not sent to the server).

Properties

Property	Type	Description
`keepalive_interval_ms?`	`number`	Interval for sending keepalive messages while paused (milliseconds). Default `5000`
`signal?`	`AbortSignal`	AbortSignal for cancellation.

SttSessionState

type SttSessionState = 
  | "idle"
  | "connecting"
  | "connected"
  | "finishing"
  | "finished"
  | "canceled"
  | "closed"
  | "error";

Session lifecycle states.

TemporaryApiKeyRequest

type TemporaryApiKeyRequest = {
  client_reference_id?: string;
  expires_in_seconds: number;
  usage_type: TemporaryApiKeyUsageType;
};

Properties

Property	Type	Description
`client_reference_id?`	`string`	Optional tracking identifier string. Does not need to be unique Max Length 256
`expires_in_seconds`	`number`	Duration in seconds until the temporary API key expires Minimum 1 Maximum 3600
`usage_type`	`TemporaryApiKeyUsageType`	Intended usage of the temporary API key.

TemporaryApiKeyResponse

type TemporaryApiKeyResponse = {
  api_key: string;
  expires_at: string;
};

Properties

Property	Type	Description
`api_key`	`string`	Created temporary API key.
`expires_at`	`string`	UTC timestamp indicating when generated temporary API key will expire Format date-time

TemporaryApiKeyUsageType

type TemporaryApiKeyUsageType = "transcribe_websocket";

TranscribeBaseOptions

type TranscribeBaseOptions = {
  cleanup?: CleanupTarget[];
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  fetch_transcript?: boolean;
  language_hints?: string[];
  language_hints_strict?: boolean;
  model: string;
  signal?: AbortSignal;
  timeout_ms?: number;
  translation?: TranslationConfig;
  wait?: boolean;
  wait_options?: WaitOptions;
  webhook_auth_header_name?: string;
  webhook_auth_header_value?: string;
  webhook_query?: string | URLSearchParams | Record<string, string>;
  webhook_url?: string;
};

Base options shared by all audio source variants.

Properties

Property	Type	Description
`cleanup?`	`CleanupTarget`[]	Resources to clean up after transcription completes or on error/timeout. Only applies when `wait: true`. Cleanup runs in all cases when `wait: true`: - After successful completion - After transcription errors (status: 'error') - On timeout or abort This ensures no orphaned resources are left behind. Example `// Delete only the uploaded file cleanup: ['file'] // Delete only the transcription record cleanup: ['transcription'] // Delete both file and transcription cleanup: ['file', 'transcription']`
`client_reference_id?`	`string`	Optional tracking identifier. Max Length 256
`context?`	`TranscriptionContext`	Additional context to improve transcription accuracy and formatting of specialized terms.
`enable_language_identification?`	`boolean`	Enable automatic language identification.
`enable_speaker_diarization?`	`boolean`	Enable speaker diarization to identify different speakers.
`fetch_transcript?`	`boolean`	When true (default), fetches the transcript and attaches it to the result when wait=true and the transcription completes successfully. Set to false to skip fetching the full transcript payload. Default `true`
`language_hints?`	`string`[]	Array of expected ISO language codes to bias recognition.
`language_hints_strict?`	`boolean`	When true, model relies more heavily on language hints.
`model`	`string`	Speech-to-text model to use. Max Length 32
`signal?`	`AbortSignal`	AbortSignal to cancel the operation
`timeout_ms?`	`number`	Timeout in milliseconds
`translation?`	`TranslationConfig`	Translation configuration.
`wait?`	`boolean`	When true, waits for transcription to complete before returning. Default `false`
`wait_options?`	`WaitOptions`	Options for waiting (only used when wait=true).
`webhook_auth_header_name?`	`string`	Name of the authentication header sent with webhook notifications. Max Length 256
`webhook_auth_header_value?`	`string`	Authentication header value sent with webhook notifications. Max Length 256
`webhook_query?`	`string` \| `URLSearchParams` \| `Record`<`string`, `string`>	Query parameters to append to the webhook URL. Useful for encoding metadata like transcription ID in the webhook callback. Can be a string, URLSearchParams, or Record<string, string>.
`webhook_url?`	`string`	URL to receive webhook notifications when transcription is completed or fails. Max Length 256

TranscribeFromFile

type TranscribeFromFile = TranscribeBaseOptions & {
  audio_url?: never;
  file: UploadFileInput;
  file_id?: never;
  filename?: string;
};

Transcribe from a direct file upload (Buffer, Uint8Array, Blob, or ReadableStream)

Type Declaration

Name	Type	Description
`audio_url?`	`never`	-
`file`	`UploadFileInput`	File data to upload and transcribe.
`file_id?`	`never`	-
`filename?`	`string`	-

TranscribeFromFileId

type TranscribeFromFileId = TranscribeBaseOptions & {
  audio_url?: never;
  file?: never;
  file_id: string;
  filename?: never;
};

Transcribe from a previously uploaded file

Type Declaration

Name	Type	Description
`audio_url?`	`never`	-
`file?`	`never`	-
`file_id`	`string`	ID of a previously uploaded file. Format uuid
`filename?`	`never`	-

TranscribeFromFileIdOptions

type TranscribeFromFileIdOptions = Omit<TranscribeFromFileId, "file_id">;

Options for transcribing from an uploaded file ID via transcribeFromFileId.

TranscribeFromFileOptions

type TranscribeFromFileOptions = Omit<TranscribeFromFile, "file">;

Options for transcribing from a file via transcribeFromFile.

TranscribeFromUrl

type TranscribeFromUrl = TranscribeBaseOptions & {
  audio_url: string;
  file?: never;
  file_id?: never;
  filename?: never;
};

Transcribe from a publicly accessible audio URL

Type Declaration

Name	Type	Description
`audio_url`	`string`	URL of a publicly accessible audio file. Max Length 4096
`file?`	`never`	-
`file_id?`	`never`	-
`filename?`	`never`	-

TranscribeFromUrlOptions

type TranscribeFromUrlOptions = Omit<TranscribeFromUrl, "audio_url">;

Options for transcribing from a URL via transcribeFromUrl.

TranscribeOptions

type TranscribeOptions = 
  | TranscribeFromFile
  | TranscribeFromFileId
  | TranscribeFromUrl;

Options for the unified transcribe method Exactly one audio source must be provided: file, file_id, or audio_url

TranscriptResponse

type TranscriptResponse = {
  id: string;
  text: string;
  tokens: TranscriptToken[];
};

Response from getting a transcription transcript.

Properties

Property	Type	Description
`id`	`string`	Unique identifier of the transcription this transcript belongs to. Format uuid
`text`	`string`	Complete transcribed text content.
`tokens`	`TranscriptToken`[]	List of detailed token information with timestamps and metadata.

TranscriptSegment

type TranscriptSegment = {
  end_ms: number;
  language?: string;
  speaker?: string;
  start_ms: number;
  text: string;
  tokens: TranscriptToken[];
};

A segment of contiguous tokens grouped by speaker and language

Properties

Property	Type	Description
`end_ms`	`number`	End time of the segment in milliseconds (from last token).
`language?`	`string`	Detected language code (if language identification was enabled).
`speaker?`	`string`	Speaker identifier (if speaker diarization was enabled).
`start_ms`	`number`	Start time of the segment in milliseconds (from first token).
`text`	`string`	Concatenated text of all tokens in this segment.
`tokens`	`TranscriptToken`[]	Original tokens in this segment.

TranscriptToken

type TranscriptToken = {
  confidence: number;
  end_ms: number;
  is_audio_event?: boolean | null;
  language?: string | null;
  speaker?: string | null;
  start_ms: number;
  text: string;
  translation_status?: "none" | "original" | "translation" | null;
};

A single token from the transcript with timing and confidence information.

Properties

Property	Type	Description
`confidence`	`number`	Confidence score for this token (0.0 to 1.0).
`end_ms`	`number`	End time of the token in milliseconds.
`is_audio_event?`	`boolean` \| `null`	Whether this token represents an audio event.
`language?`	`string` \| `null`	Detected language code (if language identification was enabled).
`speaker?`	`string` \| `null`	Speaker identifier (if speaker diarization was enabled).
`start_ms`	`number`	Start time of the token in milliseconds.
`text`	`string`	The text content of this token.
`translation_status?`	`"none"` \| `"original"` \| `"translation"` \| `null`	Translation status for this token.

TranscriptionContext

type TranscriptionContext = {
  general?: ContextGeneralEntry[];
  terms?: string[];
  text?: string;
  translation_terms?: ContextTranslationTerm[];
};

Additional context to improve transcription and translation accuracy. All sections are optional - include only what's relevant for your use case.

Properties

Property	Type	Description
`general?`	`ContextGeneralEntry`[]	Structured key-value pairs describing domain, topic, intent, participant names, etc.
`terms?`	`string`[]	Domain-specific or uncommon words to recognize.
`text?`	`string`	Longer free-form background text, prior interaction history, reference documents, or meeting notes.
`translation_terms?`	`ContextTranslationTerm`[]	Custom translations for ambiguous terms.

TranscriptionIdentifier

type TranscriptionIdentifier = 
  | string
  | {
  id: string;
};

Transcription identifier - either a string ID or an object with an id property.

TranscriptionStatus

type TranscriptionStatus = "queued" | "processing" | "completed" | "error";

Status of a transcription request.

TranslationConfig

type TranslationConfig = 
  | OneWayTranslationConfig
  | TwoWayTranslationConfig;

Translation configuration.

TwoWayTranslationConfig

type TwoWayTranslationConfig = {
  language_a: string;
  language_b: string;
  type: "two_way";
};

Two-way translation configuration. Translates between two specified languages.

Properties

Property	Type	Description
`language_a`	`string`	First language code.
`language_b`	`string`	Second language code.
`type`	`"two_way"`	Translation type.

UploadFileInput

type UploadFileInput = 
  | Buffer
  | Uint8Array
  | Blob
  | ReadableStream<Uint8Array>
  | NodeJS.ReadableStream;

Supported input types for file upload

UploadFileOptions

type UploadFileOptions = {
  client_reference_id?: string;
  filename?: string;
  signal?: AbortSignal;
  timeout_ms?: number;
};

Options for uploading a file

Properties

Property	Type	Description
`client_reference_id?`	`string`	Optional tracking identifier string. Does not need to be unique Max Length 256
`filename?`	`string`	Custom filename for the uploaded file
`signal?`	`AbortSignal`	AbortSignal for cancelling the upload
`timeout_ms?`	`number`	Request timeout in milliseconds

WaitOptions

type WaitOptions = {
  interval_ms?: number;
  on_status_change?: (status, transcription) => void;
  signal?: AbortSignal;
  timeout_ms?: number;
};

Options for polling/waiting for transcription completion.

Properties

Property	Type	Description
`interval_ms?`	`number`	Polling interval in milliseconds. Default `1000` Minimum 1000
`on_status_change?`	(`status`, `transcription`) => `void`	Callback invoked when status changes.
`signal?`	`AbortSignal`	AbortSignal to cancel waiting.
`timeout_ms?`	`number`	Maximum time to wait in milliseconds. Default `300000 (5 minutes)`

WebhookAuthConfig

type WebhookAuthConfig = {
  name: string;
  value: string;
};

Authentication configuration for webhook verification

Properties

Property	Type	Description
`name`	`string`	Expected header name (case-insensitive comparison)
`value`	`string`	Expected header value (exact match)

WebhookEvent

type WebhookEvent = {
  id: string;
  status: WebhookEventStatus;
};

Webhook event payload sent by Soniox when a transcription completes or fails.

Properties

Property	Type	Description
`id`	`string`	Transcription ID Format uuid
`status`	`WebhookEventStatus`	Transcription result status

WebhookEventStatus

type WebhookEventStatus = "completed" | "error";

Webhook event status values

WebhookHandlerResult

type WebhookHandlerResult = {
  error?: string;
  event?: WebhookEvent;
  ok: boolean;
  status: number;
};

Result of webhook handling

Properties

Property	Type	Description
`error?`	`string`	Error message (only present when ok=false)
`event?`	`WebhookEvent`	Parsed webhook event (only present when ok=true)
`ok`	`boolean`	Whether the webhook was handled successfully
`status`	`number`	HTTP status code to return

WebhookHandlerResultWithFetch

type WebhookHandlerResultWithFetch = WebhookHandlerResult & {
  fetchTranscript:   | () => Promise<ISonioxTranscript | null>
     | undefined;
  fetchTranscription:   | () => Promise<ISonioxTranscription | null>
     | undefined;
};

Result of webhook handling with lazy fetch capabilities.

When using client.webhooks.handleExpress() (or other framework handlers), the result includes helper methods to fetch the transcript or transcription.

Type Declaration

Name	Type	Description
`fetchTranscript`	\| () => `Promise`<`ISonioxTranscript` \| `null`> \| `undefined`	Fetch the transcript for a completed transcription. Only available when `ok=true` and `event.status='completed'`. Example `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); }`
`fetchTranscription`	\| () => `Promise`<`ISonioxTranscription` \| `null`> \| `undefined`	Fetch the full transcription object. Useful for both completed (metadata) and error (error details) statuses. Example `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'error') { const transcription = await result.fetchTranscription(); console.log(transcription?.error_message); }`

WebhookHeaders

type WebhookHeaders = 
  | Headers
  | Record<string, string | string[] | undefined>
  | {
  get: string | null;
};

Headers object type - supports both standard headers and record types

HttpClient

Pluggable HTTP client interface

Methods

request()

request<T>(request): Promise<HttpResponse<T>>;

Perform an HTTP request

Type Parameters

Type Parameter
`T`

Parameters

Parameter	Type	Description
`request`	`HttpRequest`	Request configuration

Returns

Promise<HttpResponse<T>>

Promise resolving to the response

Throws

SonioxHttpError On network errors, timeouts, HTTP errors, or parse errors

HttpErrorDetails

Error details for SonioxHttpError

Properties

Property	Type	Description
`bodyText?`	`string`	Response body text (capped at 4KB)
`cause?`	`unknown`	-
`code`	`HttpErrorCode`	-
`headers?`	`Record`<`string`, `string`>	-
`message`	`string`	-
`method`	`HttpMethod`	-
`statusCode?`	`number`	-
`url`	`string`	-

HttpRequest

HTTP request configuration

Properties

Property	Type	Description
`body?`	`HttpRequestBody`	Request body
`headers?`	`Record`<`string`, `string`>	Request headers
`method`	`HttpMethod`	HTTP method
`path`	`string`	URL path (relative to baseUrl) or absolute URL
`query?`	`QueryParams`	Query parameters (will be URL-encoded)
`responseType?`	`HttpResponseType`	Expected response type Default `'json'`
`signal?`	`AbortSignal`	Optional AbortSignal for request cancellation If provided along with timeoutMs, both will be respected
`timeoutMs?`	`number`	Request timeout in milliseconds If not specified, uses the client's default timeout

HttpResponse<T>

HTTP response from the client

Type Parameters

Type Parameter
`T`

Properties

Property	Type	Description
`data`	`T`	Parsed response data
`headers`	`Record`<`string`, `string`>	Response headers (normalized to lowercase keys)
`status`	`number`	HTTP status code

ISonioxTranscript

Type contract for SonioxTranscript class.

See

SonioxTranscript for full documentation.

Methods

segments()

segments(options?): TranscriptSegment[];

Parameters

Parameter	Type
`options?`	`SegmentTranscriptOptions`

Returns

TranscriptSegment[]

Properties

Property	Type
`id`	`string`
`text`	`string`
`tokens`	`TranscriptToken`[]

ISonioxTranscription

Type contract for SonioxTranscription class.

See

SonioxTranscription for full documentation.

Methods

delete()

delete(): Promise<void>;

Returns

Promise<void>

destroy()

destroy(): Promise<void>;

Returns

Promise<void>

getTranscript()

getTranscript(options?): Promise<ISonioxTranscript | null>;

Parameters

Parameter	Type
`options?`	{ `force?`: `boolean`; `signal?`: `AbortSignal`; }
`options.force?`	`boolean`
`options.signal?`	`AbortSignal`

Returns

Promise<ISonioxTranscript | null>

refresh()

refresh(signal?): Promise<ISonioxTranscription>;

Parameters

Parameter	Type
`signal?`	`AbortSignal`

Returns

Promise<ISonioxTranscription>

toJSON()

toJSON(): SonioxTranscriptionData;

Returns

SonioxTranscriptionData

wait()

wait(options?): Promise<ISonioxTranscription>;

Parameters

Parameter	Type
`options?`	`WaitOptions`

Returns

Promise<ISonioxTranscription>

Properties

Property	Type
`audio_duration_ms`	`number` \| `null` \| `undefined`
`audio_url`	`string` \| `null` \| `undefined`
`client_reference_id`	`string` \| `null` \| `undefined`
`context`	\| `TranscriptionContext` \| `null` \| `undefined`
`created_at`	`string`
`enable_language_identification`	`boolean`
`enable_speaker_diarization`	`boolean`
`error_message`	`string` \| `null` \| `undefined`
`error_type`	`string` \| `null` \| `undefined`
`file_id`	`string` \| `null` \| `undefined`
`filename`	`string`
`id`	`string`
`language_hints`	`string`[] \| `undefined`
`model`	`string`
`status`	`TranscriptionStatus`
`transcript`	`ISonioxTranscript` \| `null` \| `undefined`
`webhook_auth_header_name`	`string` \| `null` \| `undefined`
`webhook_auth_header_value`	`string` \| `null` \| `undefined`
`webhook_status_code`	`number` \| `null` \| `undefined`
`webhook_url`	`string` \| `null` \| `undefined`

On this page