Soniox

Types

Soniox Node SDK — Types Reference

AudioData

type AudioData = Buffer | Uint8Array | ArrayBuffer;

Audio data types accepted by sendAudio. In Node.js, Buffer is also accepted since Buffer extends Uint8Array.


AudioFormat

type AudioFormat = 
  | "pcm_s8"
  | "pcm_s8le"
  | "pcm_s8be"
  | "pcm_s16le"
  | "pcm_s16be"
  | "pcm_s24le"
  | "pcm_s24be"
  | "pcm_s32le"
  | "pcm_s32be"
  | "pcm_u8"
  | "pcm_u8le"
  | "pcm_u8be"
  | "pcm_u16le"
  | "pcm_u16be"
  | "pcm_u24le"
  | "pcm_u24be"
  | "pcm_u32le"
  | "pcm_u32be"
  | "pcm_f32le"
  | "pcm_f32be"
  | "pcm_f64le"
  | "pcm_f64be"
  | "mulaw"
  | "alaw"
  | "aac"
  | "aiff"
  | "amr"
  | "asf"
  | "wav"
  | "mp3"
  | "flac"
  | "ogg"
  | "webm";

Supported audio formats for real-time transcription.


CleanupTarget

type CleanupTarget = "file" | "transcription";

Resource types that can be cleaned up after transcription completes.

  • 'file' - The uploaded file
  • 'transcription' - The transcription record

ContextGeneralEntry

type ContextGeneralEntry = {
  key: string;
  value: string;
};

Key-value pair for general context information.

Properties

PropertyTypeDescription
keystringThe key describing the context type (e.g., "domain", "topic", "doctor").
valuestringThe value for the context key.

ContextTranslationTerm

type ContextTranslationTerm = {
  source: string;
  target: string;
};

Custom translation term mapping.

Properties

PropertyTypeDescription
sourcestringThe source term to translate.
targetstringThe target translation for the term.

CreateTranscriptionOptions

type CreateTranscriptionOptions = {
  audio_url?: string;
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  file_id?: string;
  language_hints?: string[];
  language_hints_strict?: boolean;
  model: string;
  translation?: TranslationConfig;
  webhook_auth_header_name?: string;
  webhook_auth_header_value?: string;
  webhook_url?: string;
};

Options for creating a transcription.

Properties

PropertyTypeDescription
audio_url?stringURL of a publicly accessible audio file. Max Length 4096
client_reference_id?stringOptional tracking identifier. Max Length 256
context?TranscriptionContextAdditional context to improve transcription accuracy and formatting of specialized terms.
enable_language_identification?booleanEnable automatic language identification.
enable_speaker_diarization?booleanEnable speaker diarization to identify different speakers.
file_id?stringID of a previously uploaded file. Format uuid
language_hints?string[]Array of expected ISO language codes to bias recognition.
language_hints_strict?booleanWhen true, model relies more heavily on language hints.
modelstringSpeech-to-text model to use. Max Length 32
translation?TranslationConfigTranslation configuration.
webhook_auth_header_name?stringName of the authentication header sent with webhook notifications. Max Length 256
webhook_auth_header_value?stringAuthentication header value sent with webhook notifications. Max Length 256
webhook_url?stringURL to receive webhook notifications when transcription is completed or fails. Max Length 256

DeleteAllFilesOptions

type DeleteAllFilesOptions = {
  signal?: AbortSignal;
};

Options for purging all files.

Properties

PropertyTypeDescription
signal?AbortSignalAbortSignal for cancelling the delete_all operation.

DeleteAllTranscriptionsOptions

type DeleteAllTranscriptionsOptions = {
  on_progress?: (transcription, index) => void;
  signal?: AbortSignal;
};

Options for deleting all transcriptions.

Properties

PropertyTypeDescription
on_progress?(transcription, index) => voidCallback invoked before each transcription is deleted. Receives the transcription data and its 0-based index.
signal?AbortSignalAbortSignal for cancelling the delete_all operation.

ExpressLikeRequest

type ExpressLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

Express/Connect-style request object

Properties

PropertyType
body?unknown
headersRecord<string, string | string[] | undefined>
methodstring

FastifyLikeRequest

type FastifyLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

Fastify-style request object

Properties

PropertyType
body?unknown
headersRecord<string, string | string[] | undefined>
methodstring

FileIdentifier

type FileIdentifier = 
  | string
  | {
  id: string;
};

File identifier - either a string ID or an object with an id property.


HandleWebhookOptions

type HandleWebhookOptions = {
  auth?: WebhookAuthConfig;
  body: unknown;
  headers: WebhookHeaders;
  method: string;
};

Options for the handleWebhook function

Properties

PropertyTypeDescription
auth?WebhookAuthConfigOptional authentication configuration
bodyunknownRequest body (parsed JSON or raw string)
headersWebhookHeadersRequest headers
methodstringHTTP method of the request

HonoLikeContext

type HonoLikeContext = {
  req: {
     method: string;
     header: string | undefined;
     json: Promise<unknown>;
  };
};

Hono context object

Properties

PropertyType
req{ method: string; header: string | undefined; json: Promise<unknown>; }
req.methodstring
req.headerstring | undefined
req.jsonPromise<unknown>

HttpErrorCode

type HttpErrorCode = "network_error" | "timeout" | "aborted" | "http_error" | "parse_error";

Error codes for HTTP client errors


HttpMethod

type HttpMethod = "GET" | "POST" | "PUT" | "PATCH" | "DELETE" | "HEAD";

HTTP methods supported by the client


HttpRequestBody

type HttpRequestBody = 
  | string
  | Record<string, unknown>
  | ArrayBuffer
  | Uint8Array
  | FormData
  | null;

Request body types


HttpResponseType

type HttpResponseType = "json" | "text" | "arrayBuffer";

Response types


ListFilesOptions

type ListFilesOptions = {
  cursor?: string;
  limit?: number;
  signal?: AbortSignal;
};

Options for listing files.

Properties

PropertyTypeDescription
cursor?stringPagination cursor for the next page of results.
limit?numberMaximum number of files to return. Default 1000 Minimum 1 Maximum 1000
signal?AbortSignalAbortSignal for cancelling the request

ListFilesResponse<T>

type ListFilesResponse<T> = {
  files: T[];
  next_page_cursor: string | null;
};

Response from listing files.

Type Parameters

Type Parameter
T

Properties

PropertyTypeDescription
filesT[]List of uploaded files.
next_page_cursorstring | nullA pagination token that references the next page of results. When null, no additional results are available.

ListTranscriptionsOptions

type ListTranscriptionsOptions = {
  cursor?: string;
  limit?: number;
};

Options for listing transcriptions

Properties

PropertyTypeDescription
cursor?stringPagination cursor for the next page of results
limit?numberMaximum number of transcriptions to return. Default 1000 Minimum 1 Maximum 1000

ListTranscriptionsResponse<T>

type ListTranscriptionsResponse<T> = {
  next_page_cursor: string | null;
  transcriptions: T[];
};

Response from listing transcriptions.

Type Parameters

Type Parameter
T

Properties

PropertyTypeDescription
next_page_cursorstring | nullA pagination token that references the next page of results. When null, no additional results are available. TODO: potentially can be undefined?
transcriptionsT[]List of transcriptions.

NestJSLikeRequest

type NestJSLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

NestJS-style request object (uses Express under the hood by default)

Properties

PropertyType
body?unknown
headersRecord<string, string | string[] | undefined>
methodstring

OneWayTranslationConfig

type OneWayTranslationConfig = {
  target_language: string;
  type: "one_way";
};

One-way translation configuration. Translates all spoken languages into a single target language.

Properties

PropertyTypeDescription
target_languagestringTarget language code for translation (e.g., "fr", "es", "de").
type"one_way"Translation type.

QueryParams

type QueryParams = Record<string, string | number | boolean | undefined>;

Query parameters


RealtimeClientOptions

type RealtimeClientOptions = {
  api_key: string;
  default_session_options?: SttSessionOptions;
  ws_base_url: string;
};

Real-time API configuration options for the client.

Properties

PropertyTypeDescription
api_keystringAPI key for real-time sessions.
default_session_options?SttSessionOptionsDefault session options applied to all real-time sessions. Can be overridden per-session.
ws_base_urlstringWebSocket base URL for real-time connections. Default 'wss://stt-rt.soniox.com/transcribe-websocket'

RealtimeErrorCode

type RealtimeErrorCode = 
  | "auth_error"
  | "bad_request"
  | "quota_exceeded"
  | "connection_error"
  | "network_error"
  | "aborted"
  | "state_error"
  | "realtime_error";

Error codes for Real-time (WebSocket) API errors


RealtimeEvent

type RealtimeEvent = 
  | {
  data: RealtimeResult;
  kind: "result";
}
  | {
  kind: "endpoint";
}
  | {
  kind: "finalized";
}
  | {
  kind: "finished";
};

Typed event for async iterator consumption.


RealtimeOptions

type RealtimeOptions = {
  default_session_options?: SttSessionOptions;
  ws_base_url?: string;
};

Real-time configuration options for the main client.

Properties

PropertyTypeDescription
default_session_options?SttSessionOptionsDefault session options applied to all real-time sessions. Can be overridden per-session.
ws_base_url?stringWebSocket base URL for real-time connections. Falls back to SONIOX_WS_URL environment variable, then to 'wss://stt-rt.soniox.com/transcribe-websocket'.

RealtimeResult

type RealtimeResult = {
  final_audio_proc_ms: number;
  finished?: boolean;
  tokens: RealtimeToken[];
  total_audio_proc_ms: number;
};

A result message from the real-time WebSocket.

Properties

PropertyTypeDescription
final_audio_proc_msnumberMilliseconds of audio that have been finalized.
finished?booleanWhether this is the final result (session ending).
tokensRealtimeToken[]Tokens in this result.
total_audio_proc_msnumberTotal milliseconds of audio processed.

RealtimeSegment

type RealtimeSegment = {
  end_ms?: number;
  language?: string;
  speaker?: string;
  start_ms?: number;
  text: string;
  tokens: RealtimeToken[];
};

A segment of contiguous real-time tokens grouped by speaker/language.

Properties

PropertyTypeDescription
end_ms?numberEnd time of the segment in milliseconds (from last token).
language?stringDetected language code (if language identification enabled).
speaker?stringSpeaker identifier (if diarization enabled).
start_ms?numberStart time of the segment in milliseconds (from first token).
textstringConcatenated text of all tokens in this segment.
tokensRealtimeToken[]Original tokens in this segment.

RealtimeSegmentBufferOptions

type RealtimeSegmentBufferOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
  max_ms?: number;
  max_tokens?: number;
};

Options for rolling real-time segmentation buffers.

Properties

PropertyTypeDescription
final_only?booleanWhen true, only tokens marked as final are buffered. Default true
group_by?SegmentGroupKey[]Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language']
max_ms?numberMaximum time window to keep in milliseconds (requires token timings).
max_tokens?numberMaximum number of tokens to keep in the buffer. Default 2000

RealtimeSegmentOptions

type RealtimeSegmentOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
};

Options for segmenting real-time tokens.

Properties

PropertyTypeDescription
final_only?booleanWhen true, only tokens marked as final are included. Default false
group_by?SegmentGroupKey[]Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language']

RealtimeToken

type RealtimeToken = {
  confidence: number;
  end_ms?: number;
  is_final: boolean;
  language?: string;
  source_language?: string;
  speaker?: string;
  start_ms?: number;
  text: string;
  translation_status?: "none" | "original" | "translation";
};

A single token from the real-time transcription.

Properties

PropertyTypeDescription
confidencenumberConfidence score (0.0 to 1.0).
end_ms?numberEnd time in milliseconds relative to audio start.
is_finalbooleanWhether this is a finalized token.
language?stringDetected language code (if language identification enabled).
source_language?stringSource language for translated tokens.
speaker?stringSpeaker identifier (if diarization enabled).
start_ms?numberStart time in milliseconds relative to audio start.
textstringThe transcribed text.
translation_status?"none" | "original" | "translation"Translation status of this token.

RealtimeUtterance

type RealtimeUtterance = {
  end_ms?: number;
  final_audio_proc_ms?: number;
  language?: string;
  segments: RealtimeSegment[];
  speaker?: string;
  start_ms?: number;
  text: string;
  tokens: RealtimeToken[];
  total_audio_proc_ms?: number;
};

A single utterance built from real-time segments.

Properties

PropertyTypeDescription
end_ms?numberEnd time of the utterance in milliseconds (from last segment).
final_audio_proc_ms?numberMilliseconds of audio that have been finalized at flush time.
language?stringDetected language code when consistent across segments.
segmentsRealtimeSegment[]Segments included in this utterance.
speaker?stringSpeaker identifier when consistent across segments.
start_ms?numberStart time of the utterance in milliseconds (from first segment).
textstringConcatenated text of all segments in this utterance.
tokensRealtimeToken[]Tokens included in this utterance.
total_audio_proc_ms?numberTotal milliseconds of audio processed at flush time.

RealtimeUtteranceBufferOptions

type RealtimeUtteranceBufferOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
  max_ms?: number;
  max_tokens?: number;
};

Options for buffering real-time utterances.

Properties

PropertyTypeDescription
final_only?booleanWhen true, only tokens marked as final are buffered. Default true
group_by?SegmentGroupKey[]Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language']
max_ms?numberMaximum time window to keep in milliseconds (requires token timings).
max_tokens?numberMaximum number of tokens to keep in the buffer. Default 2000

SegmentGroupKey

type SegmentGroupKey = "speaker" | "language";

Fields that can be used to group tokens into segments


SegmentTranscriptOptions

type SegmentTranscriptOptions = {
  group_by?: SegmentGroupKey[];
};

Options for segmenting a transcript

Properties

PropertyTypeDescription
group_by?SegmentGroupKey[]Fields to group by. A new segment starts when any of these fields changes Default ['speaker', 'language']

SendStreamOptions

type SendStreamOptions = {
  finish?: boolean;
  pace_ms?: number;
};

Options for streaming audio from an async iterable source.

Properties

PropertyTypeDescription
finish?booleanWhen true, calls finish() automatically after the stream ends. Default false
pace_ms?numberDelay in milliseconds between sending chunks. Useful for simulating real-time pace when streaming pre-recorded files. Not needed for live audio sources.

SonioxErrorCode

type SonioxErrorCode = 
  | RealtimeErrorCode
  | "soniox_error"
  | HttpErrorCode;

All possible SDK error codes (core real-time + HTTP-specific codes)


SonioxFileData

type SonioxFileData = {
  client_reference_id?: string | null;
  created_at: string;
  filename: string;
  id: string;
  size: number;
};

Raw file metadata from the API.

Properties

PropertyTypeDescription
client_reference_id?string | nullOptional tracking identifier string.
created_atstringUTC timestamp indicating when the file was uploaded. Format date-time
filenamestringName of the file.
idstringUnique identifier of the file. Format uuid
sizenumberSize of the file in bytes.

SonioxLanguage

type SonioxLanguage = {
  code: string;
  name: string;
};

Properties

PropertyTypeDescription
codestring2-letter language code.
namestringLanguage name.

SonioxModel

type SonioxModel = {
  aliased_model_id: string | null;
  context_version: number | null;
  id: string;
  languages: SonioxLanguage[];
  name: string;
  one_way_translation: string | null;
  supports_language_hints_strict: boolean;
  supports_max_endpoint_delay: boolean;
  transcription_mode: SonioxTranscriptionMode;
  translation_targets: SonioxTranslationTarget[];
  two_way_translation: string | null;
  two_way_translation_pairs: string[];
};

Properties

PropertyTypeDescription
aliased_model_idstring | nullIf this is an alias, the id of the aliased model. Null for non-alias models.
context_versionnumber | nullVersion of context supported.
idstringUnique identifier of the model.
languagesSonioxLanguage[]List of languages supported by the model.
namestringName of the model.
one_way_translationstring | nullWhen contains string 'all_languages', any laguage from languages can be used
supports_language_hints_strictbooleanTODO: Add documentation
supports_max_endpoint_delayboolean-
transcription_modeSonioxTranscriptionModeTranscription mode of the model.
translation_targetsSonioxTranslationTarget[]List of supported one-way translation targets. If list is empty, check for one_way_translation field
two_way_translationstring | nullWhen contains string 'all_languages',' any laguage pair from languages can be used
two_way_translation_pairsstring[]List of supported two-way translation pairs. If list is empty, check for two_way_translation field

SonioxNodeClientOptions

type SonioxNodeClientOptions = {
  api_key?: string;
  base_url?: string;
  http_client?: HttpClient;
  realtime?: RealtimeOptions;
};

Properties

PropertyTypeDescription
api_key?stringAPI key for authentication. Falls back to SONIOX_API_KEY environment variable if not provided.
base_url?stringBase URL for the REST API. Falls back to SONIOX_API_BASE_URL environment variable, then to 'https://api.soniox.com'.
http_client?HttpClientCustom HTTP client implementation.
realtime?RealtimeOptionsReal-time API configuration options.

SonioxTranscriptionData

type SonioxTranscriptionData = {
  audio_duration_ms?: number | null;
  audio_url?: string | null;
  client_reference_id?: string | null;
  context?: TranscriptionContext | null;
  created_at: string;
  enable_language_identification: boolean;
  enable_speaker_diarization: boolean;
  error_message?: string | null;
  error_type?: string | null;
  file_id?: string | null;
  filename: string;
  id: string;
  language_hints?: string[] | null;
  model: string;
  status: TranscriptionStatus;
  webhook_auth_header_name?: string | null;
  webhook_auth_header_value?: string | null;
  webhook_status_code?: number | null;
  webhook_url?: string | null;
};

Raw transcription metadata from the API.

Properties

PropertyTypeDescription
audio_duration_ms?number | nullDuration of the audio in milliseconds. Only available after processing begins.
audio_url?string | nullURL of the audio file being transcribed.
client_reference_id?string | nullOptional tracking identifier. Max Length 256
context?TranscriptionContext | nullAdditional context provided for the transcription.
created_atstringUTC timestamp when the transcription was created. Format date-time
enable_language_identificationbooleanWhen true, language is detected for each part of the transcription.
enable_speaker_diarizationbooleanWhen true, speakers are identified and separated in the transcription output.
error_message?string | nullError message if transcription failed. Null for successful or in-progress transcriptions.
error_type?string | nullError type if transcription failed. Null for successful or in-progress transcriptions.
file_id?string | nullID of the uploaded file being transcribed. Format uuid
filenamestringName of the file being transcribed.
idstringUnique identifier of the transcription. Format uuid
language_hints?string[] | nullExpected languages in the audio. If not specified, languages are automatically detected.
modelstringSpeech-to-text model used.
statusTranscriptionStatusCurrent status of the transcription.
webhook_auth_header_name?string | nullName of the authentication header sent with webhook notifications.
webhook_auth_header_value?string | nullAuthentication header value. Always returned masked.
webhook_status_code?number | nullHTTP status code received from your server when webhook was delivered. Null if not yet sent.
webhook_url?string | nullURL to receive webhook notifications when transcription is completed or fails.

SonioxTranscriptionMode

type SonioxTranscriptionMode = "real_time" | "async";

Transcription mode of the model.


SonioxTranslationTarget

type SonioxTranslationTarget = {
  exclude_source_languages: string[];
  source_languages: string[];
  target_language: string;
};

Properties

PropertyType
exclude_source_languagesstring[]
source_languagesstring[]
target_languagestring

SttSessionConfig

type SttSessionConfig = {
  audio_format?: "auto" | AudioFormat;
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_endpoint_detection?: boolean;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  language_hints?: string[];
  language_hints_strict?: boolean;
  model: string;
  num_channels?: number;
  sample_rate?: number;
  translation?: TranslationConfig;
};

Configuration sent to the Soniox WebSocket API when starting a session.

Properties

PropertyTypeDescription
audio_format?"auto" | AudioFormatAudio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default 'auto'
client_reference_id?stringOptional tracking identifier (max 256 chars).
context?TranscriptionContextAdditional context to improve transcription accuracy.
enable_endpoint_detection?booleanEnable endpoint detection for utterance boundaries. Useful for voice AI agents.
enable_language_identification?booleanEnable automatic language detection.
enable_speaker_diarization?booleanEnable speaker identification.
language_hints?string[]Expected languages in the audio (ISO language codes).
language_hints_strict?booleanWhen true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee.
modelstringSpeech-to-text model to use.
num_channels?numberNumber of audio channels (required for raw audio formats).
sample_rate?numberSample rate in Hz (required for PCM formats).
translation?TranslationConfigTranslation configuration.

SttSessionEvents

type SttSessionEvents = {
  connected: () => void;
  disconnected: (reason?) => void;
  endpoint: () => void;
  error: (error) => void;
  finalized: () => void;
  finished: () => void;
  result: (result) => void;
  state_change: (update) => void;
  token: (token) => void;
};

Event handlers for the STT session.

Properties

PropertyTypeDescription
connected() => voidSession connected and ready.
disconnected(reason?) => voidSession disconnected.
endpoint() => voidEndpoint detected (<end> token).
error(error) => voidError occurred.
finalized() => voidFinalization complete (<fin> token).
finished() => voidSession finished (server signaled end of stream).
result(result) => voidParsed result received.
state_change(update) => voidSession state transition.
token(token) => voidIndividual token received.

SttSessionOptions

type SttSessionOptions = {
  keepalive_interval_ms?: number;
  signal?: AbortSignal;
};

SDK-level session options (not sent to the server).

Properties

PropertyTypeDescription
keepalive_interval_ms?numberInterval for sending keepalive messages while paused (milliseconds). Default 5000
signal?AbortSignalAbortSignal for cancellation.

SttSessionState

type SttSessionState = 
  | "idle"
  | "connecting"
  | "connected"
  | "finishing"
  | "finished"
  | "canceled"
  | "closed"
  | "error";

Session lifecycle states.


TemporaryApiKeyRequest

type TemporaryApiKeyRequest = {
  client_reference_id?: string;
  expires_in_seconds: number;
  usage_type: TemporaryApiKeyUsageType;
};

Properties

PropertyTypeDescription
client_reference_id?stringOptional tracking identifier string. Does not need to be unique Max Length 256
expires_in_secondsnumberDuration in seconds until the temporary API key expires Minimum 1 Maximum 3600
usage_typeTemporaryApiKeyUsageTypeIntended usage of the temporary API key.

TemporaryApiKeyResponse

type TemporaryApiKeyResponse = {
  api_key: string;
  expires_at: string;
};

Properties

PropertyTypeDescription
api_keystringCreated temporary API key.
expires_atstringUTC timestamp indicating when generated temporary API key will expire Format date-time

TemporaryApiKeyUsageType

type TemporaryApiKeyUsageType = "transcribe_websocket";

TranscribeBaseOptions

type TranscribeBaseOptions = {
  cleanup?: CleanupTarget[];
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  fetch_transcript?: boolean;
  language_hints?: string[];
  language_hints_strict?: boolean;
  model: string;
  signal?: AbortSignal;
  timeout_ms?: number;
  translation?: TranslationConfig;
  wait?: boolean;
  wait_options?: WaitOptions;
  webhook_auth_header_name?: string;
  webhook_auth_header_value?: string;
  webhook_query?: string | URLSearchParams | Record<string, string>;
  webhook_url?: string;
};

Base options shared by all audio source variants.

Properties

PropertyTypeDescription
cleanup?CleanupTarget[]Resources to clean up after transcription completes or on error/timeout. Only applies when wait: true. Cleanup runs in all cases when wait: true: - After successful completion - After transcription errors (status: 'error') - On timeout or abort This ensures no orphaned resources are left behind. Example // Delete only the uploaded file cleanup: ['file'] // Delete only the transcription record cleanup: ['transcription'] // Delete both file and transcription cleanup: ['file', 'transcription']
client_reference_id?stringOptional tracking identifier. Max Length 256
context?TranscriptionContextAdditional context to improve transcription accuracy and formatting of specialized terms.
enable_language_identification?booleanEnable automatic language identification.
enable_speaker_diarization?booleanEnable speaker diarization to identify different speakers.
fetch_transcript?booleanWhen true (default), fetches the transcript and attaches it to the result when wait=true and the transcription completes successfully. Set to false to skip fetching the full transcript payload. Default true
language_hints?string[]Array of expected ISO language codes to bias recognition.
language_hints_strict?booleanWhen true, model relies more heavily on language hints.
modelstringSpeech-to-text model to use. Max Length 32
signal?AbortSignalAbortSignal to cancel the operation
timeout_ms?numberTimeout in milliseconds
translation?TranslationConfigTranslation configuration.
wait?booleanWhen true, waits for transcription to complete before returning. Default false
wait_options?WaitOptionsOptions for waiting (only used when wait=true).
webhook_auth_header_name?stringName of the authentication header sent with webhook notifications. Max Length 256
webhook_auth_header_value?stringAuthentication header value sent with webhook notifications. Max Length 256
webhook_query?string | URLSearchParams | Record<string, string>Query parameters to append to the webhook URL. Useful for encoding metadata like transcription ID in the webhook callback. Can be a string, URLSearchParams, or Record<string, string>.
webhook_url?stringURL to receive webhook notifications when transcription is completed or fails. Max Length 256

TranscribeFromFile

type TranscribeFromFile = TranscribeBaseOptions & {
  audio_url?: never;
  file: UploadFileInput;
  file_id?: never;
  filename?: string;
};

Transcribe from a direct file upload (Buffer, Uint8Array, Blob, or ReadableStream)

Type Declaration

NameTypeDescription
audio_url?never-
fileUploadFileInputFile data to upload and transcribe.
file_id?never-
filename?string-

TranscribeFromFileId

type TranscribeFromFileId = TranscribeBaseOptions & {
  audio_url?: never;
  file?: never;
  file_id: string;
  filename?: never;
};

Transcribe from a previously uploaded file

Type Declaration

NameTypeDescription
audio_url?never-
file?never-
file_idstringID of a previously uploaded file. Format uuid
filename?never-

TranscribeFromFileIdOptions

type TranscribeFromFileIdOptions = Omit<TranscribeFromFileId, "file_id">;

Options for transcribing from an uploaded file ID via transcribeFromFileId.


TranscribeFromFileOptions

type TranscribeFromFileOptions = Omit<TranscribeFromFile, "file">;

Options for transcribing from a file via transcribeFromFile.


TranscribeFromUrl

type TranscribeFromUrl = TranscribeBaseOptions & {
  audio_url: string;
  file?: never;
  file_id?: never;
  filename?: never;
};

Transcribe from a publicly accessible audio URL

Type Declaration

NameTypeDescription
audio_urlstringURL of a publicly accessible audio file. Max Length 4096
file?never-
file_id?never-
filename?never-

TranscribeFromUrlOptions

type TranscribeFromUrlOptions = Omit<TranscribeFromUrl, "audio_url">;

Options for transcribing from a URL via transcribeFromUrl.


TranscribeOptions

type TranscribeOptions = 
  | TranscribeFromFile
  | TranscribeFromFileId
  | TranscribeFromUrl;

Options for the unified transcribe method Exactly one audio source must be provided: file, file_id, or audio_url


TranscriptResponse

type TranscriptResponse = {
  id: string;
  text: string;
  tokens: TranscriptToken[];
};

Response from getting a transcription transcript.

Properties

PropertyTypeDescription
idstringUnique identifier of the transcription this transcript belongs to. Format uuid
textstringComplete transcribed text content.
tokensTranscriptToken[]List of detailed token information with timestamps and metadata.

TranscriptSegment

type TranscriptSegment = {
  end_ms: number;
  language?: string;
  speaker?: string;
  start_ms: number;
  text: string;
  tokens: TranscriptToken[];
};

A segment of contiguous tokens grouped by speaker and language

Properties

PropertyTypeDescription
end_msnumberEnd time of the segment in milliseconds (from last token).
language?stringDetected language code (if language identification was enabled).
speaker?stringSpeaker identifier (if speaker diarization was enabled).
start_msnumberStart time of the segment in milliseconds (from first token).
textstringConcatenated text of all tokens in this segment.
tokensTranscriptToken[]Original tokens in this segment.

TranscriptToken

type TranscriptToken = {
  confidence: number;
  end_ms: number;
  is_audio_event?: boolean | null;
  language?: string | null;
  speaker?: string | null;
  start_ms: number;
  text: string;
  translation_status?: "none" | "original" | "translation" | null;
};

A single token from the transcript with timing and confidence information.

Properties

PropertyTypeDescription
confidencenumberConfidence score for this token (0.0 to 1.0).
end_msnumberEnd time of the token in milliseconds.
is_audio_event?boolean | nullWhether this token represents an audio event.
language?string | nullDetected language code (if language identification was enabled).
speaker?string | nullSpeaker identifier (if speaker diarization was enabled).
start_msnumberStart time of the token in milliseconds.
textstringThe text content of this token.
translation_status?"none" | "original" | "translation" | nullTranslation status for this token.

TranscriptionContext

type TranscriptionContext = {
  general?: ContextGeneralEntry[];
  terms?: string[];
  text?: string;
  translation_terms?: ContextTranslationTerm[];
};

Additional context to improve transcription and translation accuracy. All sections are optional - include only what's relevant for your use case.

Properties

PropertyTypeDescription
general?ContextGeneralEntry[]Structured key-value pairs describing domain, topic, intent, participant names, etc.
terms?string[]Domain-specific or uncommon words to recognize.
text?stringLonger free-form background text, prior interaction history, reference documents, or meeting notes.
translation_terms?ContextTranslationTerm[]Custom translations for ambiguous terms.

TranscriptionIdentifier

type TranscriptionIdentifier = 
  | string
  | {
  id: string;
};

Transcription identifier - either a string ID or an object with an id property.


TranscriptionStatus

type TranscriptionStatus = "queued" | "processing" | "completed" | "error";

Status of a transcription request.


TranslationConfig

type TranslationConfig = 
  | OneWayTranslationConfig
  | TwoWayTranslationConfig;

Translation configuration.


TwoWayTranslationConfig

type TwoWayTranslationConfig = {
  language_a: string;
  language_b: string;
  type: "two_way";
};

Two-way translation configuration. Translates between two specified languages.

Properties

PropertyTypeDescription
language_astringFirst language code.
language_bstringSecond language code.
type"two_way"Translation type.

UploadFileInput

type UploadFileInput = 
  | Buffer
  | Uint8Array
  | Blob
  | ReadableStream<Uint8Array>
  | NodeJS.ReadableStream;

Supported input types for file upload


UploadFileOptions

type UploadFileOptions = {
  client_reference_id?: string;
  filename?: string;
  signal?: AbortSignal;
  timeout_ms?: number;
};

Options for uploading a file

Properties

PropertyTypeDescription
client_reference_id?stringOptional tracking identifier string. Does not need to be unique Max Length 256
filename?stringCustom filename for the uploaded file
signal?AbortSignalAbortSignal for cancelling the upload
timeout_ms?numberRequest timeout in milliseconds

WaitOptions

type WaitOptions = {
  interval_ms?: number;
  on_status_change?: (status, transcription) => void;
  signal?: AbortSignal;
  timeout_ms?: number;
};

Options for polling/waiting for transcription completion.

Properties

PropertyTypeDescription
interval_ms?numberPolling interval in milliseconds. Default 1000 Minimum 1000
on_status_change?(status, transcription) => voidCallback invoked when status changes.
signal?AbortSignalAbortSignal to cancel waiting.
timeout_ms?numberMaximum time to wait in milliseconds. Default 300000 (5 minutes)

WebhookAuthConfig

type WebhookAuthConfig = {
  name: string;
  value: string;
};

Authentication configuration for webhook verification

Properties

PropertyTypeDescription
namestringExpected header name (case-insensitive comparison)
valuestringExpected header value (exact match)

WebhookEvent

type WebhookEvent = {
  id: string;
  status: WebhookEventStatus;
};

Webhook event payload sent by Soniox when a transcription completes or fails.

Properties

PropertyTypeDescription
idstringTranscription ID Format uuid
statusWebhookEventStatusTranscription result status

WebhookEventStatus

type WebhookEventStatus = "completed" | "error";

Webhook event status values


WebhookHandlerResult

type WebhookHandlerResult = {
  error?: string;
  event?: WebhookEvent;
  ok: boolean;
  status: number;
};

Result of webhook handling

Properties

PropertyTypeDescription
error?stringError message (only present when ok=false)
event?WebhookEventParsed webhook event (only present when ok=true)
okbooleanWhether the webhook was handled successfully
statusnumberHTTP status code to return

WebhookHandlerResultWithFetch

type WebhookHandlerResultWithFetch = WebhookHandlerResult & {
  fetchTranscript:   | () => Promise<ISonioxTranscript | null>
     | undefined;
  fetchTranscription:   | () => Promise<ISonioxTranscription | null>
     | undefined;
};

Result of webhook handling with lazy fetch capabilities.

When using client.webhooks.handleExpress() (or other framework handlers), the result includes helper methods to fetch the transcript or transcription.

Type Declaration

NameTypeDescription
fetchTranscript| () => Promise<ISonioxTranscript | null> | undefinedFetch the transcript for a completed transcription. Only available when ok=true and event.status='completed'. Example const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); }
fetchTranscription| () => Promise<ISonioxTranscription | null> | undefinedFetch the full transcription object. Useful for both completed (metadata) and error (error details) statuses. Example const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'error') { const transcription = await result.fetchTranscription(); console.log(transcription?.error_message); }

WebhookHeaders

type WebhookHeaders = 
  | Headers
  | Record<string, string | string[] | undefined>
  | {
  get: string | null;
};

Headers object type - supports both standard headers and record types


HttpClient

Pluggable HTTP client interface

Methods

request()

request<T>(request): Promise<HttpResponse<T>>;

Perform an HTTP request

Type Parameters

Type Parameter
T

Parameters

ParameterTypeDescription
requestHttpRequestRequest configuration

Returns

Promise<HttpResponse<T>>

Promise resolving to the response

Throws

SonioxHttpError On network errors, timeouts, HTTP errors, or parse errors


HttpErrorDetails

Error details for SonioxHttpError

Properties

PropertyTypeDescription
bodyText?stringResponse body text (capped at 4KB)
cause?unknown-
codeHttpErrorCode-
headers?Record<string, string>-
messagestring-
methodHttpMethod-
statusCode?number-
urlstring-

HttpRequest

HTTP request configuration

Properties

PropertyTypeDescription
body?HttpRequestBodyRequest body
headers?Record<string, string>Request headers
methodHttpMethodHTTP method
pathstringURL path (relative to baseUrl) or absolute URL
query?QueryParamsQuery parameters (will be URL-encoded)
responseType?HttpResponseTypeExpected response type Default 'json'
signal?AbortSignalOptional AbortSignal for request cancellation If provided along with timeoutMs, both will be respected
timeoutMs?numberRequest timeout in milliseconds If not specified, uses the client's default timeout

HttpResponse<T>

HTTP response from the client

Type Parameters

Type Parameter
T

Properties

PropertyTypeDescription
dataTParsed response data
headersRecord<string, string>Response headers (normalized to lowercase keys)
statusnumberHTTP status code

ISonioxTranscript

Type contract for SonioxTranscript class.

See

SonioxTranscript for full documentation.

Methods

segments()

segments(options?): TranscriptSegment[];

Parameters

ParameterType
options?SegmentTranscriptOptions

Returns

TranscriptSegment[]

Properties

PropertyType
idstring
textstring
tokensTranscriptToken[]

ISonioxTranscription

Type contract for SonioxTranscription class.

See

SonioxTranscription for full documentation.

Methods

delete()

delete(): Promise<void>;

Returns

Promise<void>


destroy()

destroy(): Promise<void>;

Returns

Promise<void>


getTranscript()

getTranscript(options?): Promise<ISonioxTranscript | null>;

Parameters

ParameterType
options?{ force?: boolean; signal?: AbortSignal; }
options.force?boolean
options.signal?AbortSignal

Returns

Promise<ISonioxTranscript | null>


refresh()

refresh(signal?): Promise<ISonioxTranscription>;

Parameters

ParameterType
signal?AbortSignal

Returns

Promise<ISonioxTranscription>


toJSON()

toJSON(): SonioxTranscriptionData;

Returns

SonioxTranscriptionData


wait()

wait(options?): Promise<ISonioxTranscription>;

Parameters

ParameterType
options?WaitOptions

Returns

Promise<ISonioxTranscription>

Properties

PropertyType
audio_duration_msnumber | null | undefined
audio_urlstring | null | undefined
client_reference_idstring | null | undefined
context| TranscriptionContext | null | undefined
created_atstring
enable_language_identificationboolean
enable_speaker_diarizationboolean
error_messagestring | null | undefined
error_typestring | null | undefined
file_idstring | null | undefined
filenamestring
idstring
language_hintsstring[] | undefined
modelstring
statusTranscriptionStatus
transcriptISonioxTranscript | null | undefined
webhook_auth_header_namestring | null | undefined
webhook_auth_header_valuestring | null | undefined
webhook_status_codenumber | null | undefined
webhook_urlstring | null | undefined

On this page

AudioData
AudioFormat
CleanupTarget
ContextGeneralEntry
ContextTranslationTerm
CreateTranscriptionOptions
DeleteAllFilesOptions
DeleteAllTranscriptionsOptions
ExpressLikeRequest
FastifyLikeRequest
FileIdentifier
HandleWebhookOptions
HonoLikeContext
HttpErrorCode
HttpMethod
HttpRequestBody
HttpResponseType
ListFilesOptions
ListFilesResponse<T>
ListTranscriptionsOptions
ListTranscriptionsResponse<T>
NestJSLikeRequest
OneWayTranslationConfig
QueryParams
RealtimeClientOptions
RealtimeErrorCode
RealtimeEvent
RealtimeOptions
RealtimeResult
RealtimeSegment
RealtimeSegmentBufferOptions
RealtimeSegmentOptions
RealtimeToken
RealtimeUtterance
RealtimeUtteranceBufferOptions
SegmentGroupKey
SegmentTranscriptOptions
SendStreamOptions
SonioxErrorCode
SonioxFileData
SonioxLanguage
SonioxModel
SonioxNodeClientOptions
SonioxTranscriptionData
SonioxTranscriptionMode
SonioxTranslationTarget
SttSessionConfig
SttSessionEvents
SttSessionOptions
SttSessionState
TemporaryApiKeyRequest
TemporaryApiKeyResponse
TemporaryApiKeyUsageType
TranscribeBaseOptions
TranscribeFromFile
TranscribeFromFileId
TranscribeFromFileIdOptions
TranscribeFromFileOptions
TranscribeFromUrl
TranscribeFromUrlOptions
TranscribeOptions
TranscriptResponse
TranscriptSegment
TranscriptToken
TranscriptionContext
TranscriptionIdentifier
TranscriptionStatus
TranslationConfig
TwoWayTranslationConfig
UploadFileInput
UploadFileOptions
WaitOptions
WebhookAuthConfig
WebhookEvent
WebhookEventStatus
WebhookHandlerResult
WebhookHandlerResultWithFetch
WebhookHeaders
HttpClient
HttpErrorDetails
HttpRequest
HttpResponse<T>
ISonioxTranscript
ISonioxTranscription