Types

AudioData

type AudioData = Buffer | Uint8Array | ArrayBuffer;

Audio data types accepted by sendAudio. In Node.js, Buffer is also accepted since Buffer extends Uint8Array.

AudioFormat

type AudioFormat = 
  | "pcm_s8"
  | "pcm_s8le"
  | "pcm_s8be"
  | "pcm_s16le"
  | "pcm_s16be"
  | "pcm_s24le"
  | "pcm_s24be"
  | "pcm_s32le"
  | "pcm_s32be"
  | "pcm_u8"
  | "pcm_u8le"
  | "pcm_u8be"
  | "pcm_u16le"
  | "pcm_u16be"
  | "pcm_u24le"
  | "pcm_u24be"
  | "pcm_u32le"
  | "pcm_u32be"
  | "pcm_f32le"
  | "pcm_f32be"
  | "pcm_f64le"
  | "pcm_f64be"
  | "mulaw"
  | "alaw"
  | "aac"
  | "aiff"
  | "amr"
  | "asf"
  | "wav"
  | "mp3"
  | "flac"
  | "ogg"
  | "webm";

Supported audio formats for real-time transcription.

CleanupTarget

type CleanupTarget = "file" | "transcription";

Resource types that can be cleaned up after transcription completes.

'file' - The uploaded file
'transcription' - The transcription record

ConcurrencyCurrentValues

type ConcurrencyCurrentValues = {
  transcribe_concurrent: number;
  tts_concurrent: number;
};

Live concurrency counts.

Properties

Property	Type	Description
`transcribe_concurrent`	`number`	Current number of concurrent transcription sessions.
`tts_concurrent`	`number`	Current number of concurrent TTS sessions.

ConcurrencyLimitValues

type ConcurrencyLimitValues = {
  transcribe_concurrent: number | null;
  tts_concurrent: number | null;
};

Configured concurrency limits.

Properties

Property	Type	Description
`transcribe_concurrent`	`number` \| `null`	Configured transcription concurrency limit. Null means no configured limit.
`tts_concurrent`	`number` \| `null`	Configured TTS concurrency limit. Null means no configured limit.

ConcurrencyLimitsResponse

type ConcurrencyLimitsResponse = {
  organization: ConcurrencyScopeValues;
  project: ConcurrencyScopeValues;
};

Current concurrent counts plus configured concurrency limits for the project and its organization. Values are region-scoped.

Properties

Property	Type	Description
`organization`	`ConcurrencyScopeValues`	Organization-level concurrency counts and limits.
`project`	`ConcurrencyScopeValues`	Project-level concurrency counts and limits.

ConcurrencyScopeValues

type ConcurrencyScopeValues = {
  current: ConcurrencyCurrentValues;
  limits: ConcurrencyLimitValues;
};

Current counts and configured limits for a concurrency scope.

Properties

Property	Type	Description
`current`	`ConcurrencyCurrentValues`	Current live concurrency counts.
`limits`	`ConcurrencyLimitValues`	Configured concurrency limits.

ContextGeneralEntry

type ContextGeneralEntry = {
  key: string;
  value: string;
};

Key-value pair for general context information.

Properties

Property	Type	Description
`key`	`string`	The key describing the context type (e.g., "domain", "topic", "doctor").
`value`	`string`	The value for the context key.

ContextTranslationTerm

type ContextTranslationTerm = {
  source: string;
  target: string;
};

Custom translation term mapping.

Properties

Property	Type	Description
`source`	`string`	The source term to translate.
`target`	`string`	The target translation for the term.

CreateTranscriptionOptions

type CreateTranscriptionOptions = {
  audio_url?: string;
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  file_id?: string;
  language_hints?: string[];
  language_hints_strict?: boolean;
  model: string;
  translation?: TranslationConfig;
  webhook_auth_header_name?: string;
  webhook_auth_header_value?: string;
  webhook_url?: string;
};

Options for creating a transcription.

Properties

Property	Type	Description
`audio_url?`	`string`	URL of a publicly accessible audio file. Max Length 4096
`client_reference_id?`	`string`	Optional tracking identifier. Max Length 256
`context?`	`TranscriptionContext`	Additional context to improve transcription accuracy and formatting of specialized terms.
`enable_language_identification?`	`boolean`	Enable automatic language identification.
`enable_speaker_diarization?`	`boolean`	Enable speaker diarization to identify different speakers.
`file_id?`	`string`	ID of a previously uploaded file. Format uuid
`language_hints?`	`string`[]	Array of expected ISO language codes to bias recognition.
`language_hints_strict?`	`boolean`	When true, model relies more heavily on language hints.
`model`	`string`	Speech-to-text model to use. Max Length 32
`translation?`	`TranslationConfig`	Translation configuration.
`webhook_auth_header_name?`	`string`	Name of the authentication header sent with webhook notifications. Max Length 256
`webhook_auth_header_value?`	`string`	Authentication header value sent with webhook notifications. Max Length 256
`webhook_url?`	`string`	URL to receive webhook notifications when transcription is completed or fails. Max Length 256

CreateVoiceInput

type CreateVoiceInput = UploadFileInput;

Supported input types for the reference audio clip.

CreateVoiceOptions

type CreateVoiceOptions = {
  file: CreateVoiceInput;
  filename?: string;
  name: string;
  signal?: AbortSignal;
  timeout_ms?: number;
};

Options for creating a voice.

Properties

Property	Type	Description
`file`	`CreateVoiceInput`	The reference audio clip for the voice. Keep it short (up to 20 seconds) and within 10 MB.
`filename?`	`string`	Custom filename for the uploaded reference clip.
`name`	`string`	A name for the voice, unique within your project. Min Length 1 Max Length 128
`signal?`	`AbortSignal`	AbortSignal for cancelling the request.
`timeout_ms?`	`number`	Request timeout in milliseconds.

DeleteAllFilesOptions

type DeleteAllFilesOptions = {
  signal?: AbortSignal;
};

Options for purging all files.

Properties

Property	Type	Description
`signal?`	`AbortSignal`	AbortSignal for cancelling the delete_all operation.

DeleteAllTranscriptionsOptions

type DeleteAllTranscriptionsOptions = {
  on_progress?: (transcription, index) => void;
  signal?: AbortSignal;
};

Options for deleting all transcriptions.

Properties

Property	Type	Description
`on_progress?`	(`transcription`, `index`) => `void`	Callback invoked before each transcription is deleted. Receives the transcription data and its 0-based index.
`signal?`	`AbortSignal`	AbortSignal for cancelling the delete_all operation.

DeleteAllVoicesOptions

type DeleteAllVoicesOptions = {
  signal?: AbortSignal;
};

Options for deleting all voices.

Properties

Property	Type	Description
`signal?`	`AbortSignal`	AbortSignal for cancelling the delete_all operation.

ExpressLikeRequest

type ExpressLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

Express/Connect-style request object

Properties

Property	Type
`body?`	`unknown`
`headers`	`Record`<`string`, `string` \| `string`[] \| `undefined`>
`method`	`string`

FastifyLikeRequest

type FastifyLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

Fastify-style request object

Properties

Property	Type
`body?`	`unknown`
`headers`	`Record`<`string`, `string` \| `string`[] \| `undefined`>
`method`	`string`

FileIdentifier

type FileIdentifier = 
  | string
  | {
  id: string;
};

File identifier - either a string ID or an object with an id property.

FilesCountResponse

type FilesCountResponse = {
  playground: number;
  public_api: number;
  total: number;
};

Total number of files, split by source.

Properties

Property	Type	Description
`playground`	`number`	Number of files uploaded via the Playground.
`public_api`	`number`	Number of files uploaded via Public API.
`total`	`number`	Total number of files across all sources.

GenerateSpeechOptions

type GenerateSpeechOptions = {
  audio_format?: string;
  bitrate?: number;
  language?: string;
  model?: string;
  sample_rate?: number;
  signal?: AbortSignal;
  speed?: number;
  text: string;
  voice: string;
};

Options for REST TTS generation (generate / generateStream).

Properties

Property	Type	Description
`audio_format?`	`string`	Output audio format Default `'wav'`
`bitrate?`	`number`	Codec bitrate in bps (for compressed formats).
`language?`	`string`	Language code. Default `'en'`
`model?`	`string`	Text-to-Speech model to use. Default `'tts-rt-v1'`
`sample_rate?`	`number`	Output sample rate in Hz. Required for raw PCM formats.
`signal?`	`AbortSignal`	Optional AbortSignal for cancellation.
`speed?`	`number`	Speaking rate. `1.0` is the normal rate; values below `1.0` slow speech down and values above `1.0` speed it up. Supported range is `0.7`-`1.3`. Defaults to `1.0` when omitted.
`text`	`string`	Input text to generate as speech.
`voice`	`string`	Voice identifier.

HandleWebhookOptions

type HandleWebhookOptions = {
  auth?: WebhookAuthConfig;
  body: unknown;
  headers: WebhookHeaders;
  method: string;
};

Options for the handleWebhook function

Properties

Property	Type	Description
`auth?`	`WebhookAuthConfig`	Optional authentication configuration
`body`	`unknown`	Request body (parsed JSON or raw string)
`headers`	`WebhookHeaders`	Request headers
`method`	`string`	HTTP method of the request

HonoLikeContext

type HonoLikeContext = {
  req: {
     method: string;
     header: string | undefined;
     json: Promise<unknown>;
  };
};

Hono context object

Properties

Property	Type
`req`	{ `method`: `string`; `header`: `string` \| `undefined`; `json`: `Promise`<`unknown`>; }
`req.method`	`string`
`req.header`	`string` \| `undefined`
`req.json`	`Promise`<`unknown`>

ListFilesOptions

type ListFilesOptions = {
  cursor?: string;
  limit?: number;
  signal?: AbortSignal;
};

Options for listing files.

Properties

Property	Type	Description
`cursor?`	`string`	Pagination cursor for the next page of results.
`limit?`	`number`	Maximum number of files to return. Default `1000` Minimum 1 Maximum 1000
`signal?`	`AbortSignal`	AbortSignal for cancelling the request

ListFilesResponse<T>

type ListFilesResponse<T> = {
  files: T[];
  next_page_cursor: string | null;
};

Response from listing files.

Type Parameters

Type Parameter
`T`

Properties

Property	Type	Description
`files`	`T`[]	List of uploaded files.
`next_page_cursor`	`string` \| `null`	A pagination token that references the next page of results. When null, no additional results are available.

ListTranscriptionsOptions

type ListTranscriptionsOptions = {
  cursor?: string;
  limit?: number;
};

Options for listing transcriptions

Properties

Property	Type	Description
`cursor?`	`string`	Pagination cursor for the next page of results
`limit?`	`number`	Maximum number of transcriptions to return. Default `1000` Minimum 1 Maximum 1000

ListTranscriptionsResponse<T>

type ListTranscriptionsResponse<T> = {
  next_page_cursor: string | null;
  transcriptions: T[];
};

Response from listing transcriptions.

Type Parameters

Type Parameter
`T`

Properties

Property	Type	Description
`next_page_cursor`	`string` \| `null`	A pagination token that references the next page of results. When null, no additional results are available. TODO: potentially can be undefined?
`transcriptions`	`T`[]	List of transcriptions.

ListUsageLogsOptions

type ListUsageLogsOptions = {
  cursor?: string;
  end_time: string;
  limit?: number;
  signal?: AbortSignal;
  sort?: UsageLogsSort;
  start_time: string;
};

Options for listing usage logs.

Properties

Property	Type	Description
`cursor?`	`string`	Pagination cursor for the next page of results.
`end_time`	`string`	End of the time window (exclusive), filtering by request end time. Must be an ISO 8601 timestamp in UTC. Example `'2026-04-29T09:00:00Z'`
`limit?`	`number`	Maximum number of usage log entries to return. Default `1000` Minimum 1 Maximum 1000
`signal?`	`AbortSignal`	AbortSignal for cancelling the request.
`sort?`	`UsageLogsSort`	Sort order by end_time. Default `'end_time_asc'`
`start_time`	`string`	Start of the time window (inclusive), filtering by request end time. Must be an ISO 8601 timestamp in UTC. Example `'2026-04-28T09:00:00Z'`

ListUsageLogsResponse

type ListUsageLogsResponse = {
  next_page_cursor: string | null;
  usage_logs: SonioxUsageLog[];
};

Response from listing usage logs.

Properties

Property	Type	Description
`next_page_cursor`	`string` \| `null`	Pagination cursor for the next page of results. Null if no more pages.
`usage_logs`	`SonioxUsageLog`[]	Per-request usage log entries ordered by end_time and UUID.

ListVoicesOptions

type ListVoicesOptions = {
  cursor?: string;
  limit?: number;
  signal?: AbortSignal;
};

Options for listing voices.

Properties

Property	Type	Description
`cursor?`	`string`	Pagination cursor for the next page of results.
`limit?`	`number`	Maximum number of voices to return.
`signal?`	`AbortSignal`	AbortSignal for cancelling the request.

ListVoicesResponse<T>

type ListVoicesResponse<T> = {
  next_page_cursor: string | null;
  voices: T[];
};

Response from listing voices.

Type Parameters

Type Parameter
`T`

Properties

Property	Type	Description
`next_page_cursor`	`string` \| `null`	A pagination token that references the next page of results. When null, no additional results are available.
`voices`	`T`[]	List of voices.

NestJSLikeRequest

type NestJSLikeRequest = {
  body?: unknown;
  headers: Record<string, string | string[] | undefined>;
  method: string;
};

NestJS-style request object (uses Express under the hood by default)

Properties

Property	Type
`body?`	`unknown`
`headers`	`Record`<`string`, `string` \| `string`[] \| `undefined`>
`method`	`string`

OneWayTranslation

type OneWayTranslation = {
  duration_ms: number;
  from?: string;
  mode: "one_way";
  original_text: string;
  segments: TranslationSegment[];
  to: string;
  translation_text: string;
};

Result of a one-way translation ({ to } or { to, from } mode).

original_text and translation_text flatten the per-segment content across the whole audio, which is useful when the caller just wants two parallel strings.

Properties

Property	Type	Description
`duration_ms`	`number`	Total audio duration in milliseconds. Equals the largest `end_ms` across all original tokens, or `0` when there are no original tokens.
`from?`	`string`	Source language hint that was supplied via `from`. Undefined when only `to` was provided and the source language was auto-detected.
`mode`	`"one_way"`	-
`original_text`	`string`	Concatenated text of every original token across all segments.
`segments`	`TranslationSegment`[]	Per-utterance segments in audio order.
`to`	`string`	Target language code (the `to` value passed in).
`translation_text`	`string`	Concatenated text of every translation token across all segments.

OneWayTranslationConfig

type OneWayTranslationConfig = {
  target_language: string;
  type: "one_way";
};

One-way translation configuration. Translates all spoken languages into a single target language.

Properties

Property	Type	Description
`target_language`	`string`	Target language code for translation (e.g., "fr", "es", "de").
`type`	`"one_way"`	Translation type.

RealtimeClientOptions

type RealtimeClientOptions = {
  api_key: string;
  default_session_options?: SttSessionOptions;
  stt_defaults?: Partial<SttSessionConfig>;
  tts_connection_options?: TtsConnectionOptions;
  tts_defaults?: Partial<TtsStreamConfig>;
  tts_ws_url: string;
  ws_base_url: string;
};

Real-time API configuration options for the client.

Properties

Property	Type	Description
`api_key`	`string`	API key for real-time sessions.
`default_session_options?`	`SttSessionOptions`	Default session options applied to all real-time STT sessions. Can be overridden per-session.
`stt_defaults?`	`Partial`<`SttSessionConfig`>	STT session config defaults. Merged as the base layer when opening STT sessions via `realtime.stt(config)`; caller fields override.
`tts_connection_options?`	`TtsConnectionOptions`	Default TTS connection options.
`tts_defaults?`	`Partial`<`TtsStreamConfig`>	TTS stream config defaults. Merged as the base layer when opening TTS streams via `realtime.tts(...)`; caller fields override.
`tts_ws_url`	`string`	TTS WebSocket URL for real-time connections. Default `'wss://tts-rt.soniox.com/tts-websocket'`
`ws_base_url`	`string`	STT WebSocket base URL for real-time connections. Default `'wss://stt-rt.soniox.com/transcribe-websocket'`

RealtimeErrorCode

type RealtimeErrorCode = 
  | "auth_error"
  | "bad_request"
  | "quota_exceeded"
  | "connection_error"
  | "network_error"
  | "aborted"
  | "state_error"
  | "realtime_error";

Error codes for Real-time (WebSocket) API errors

RealtimeEvent

type RealtimeEvent = 
  | {
  data: RealtimeResult;
  kind: "result";
}
  | {
  kind: "endpoint";
}
  | {
  kind: "finalized";
}
  | {
  kind: "finished";
};

Typed event for async iterator consumption.

RealtimeOptions

type RealtimeOptions = {
  default_session_options?: SttSessionOptions;
  stt_defaults?: Partial<SttSessionConfig>;
  tts_connection_options?: TtsConnectionOptions;
  tts_defaults?: Partial<TtsStreamConfig>;
  tts_ws_url?: string;
  ws_base_url?: string;
};

Real-time configuration options for the main client.

Properties

Property	Type	Description
`default_session_options?`	`SttSessionOptions`	Default session options applied to all real-time STT sessions. Can be overridden per-session.
`stt_defaults?`	`Partial`<`SttSessionConfig`>	Default STT session config fields (model, language hints, context, etc.). Merged as the base layer when opening STT sessions via `client.realtime.stt(config)`. Fields on the caller-provided `config` override these defaults. Equivalent to SonioxConnectionConfig.stt_defaults on the web/react clients.
`tts_connection_options?`	`TtsConnectionOptions`	Default TTS connection options (keepalive interval, connect timeout).
`tts_defaults?`	`Partial`<`TtsStreamConfig`>	Default TTS stream config fields (model, voice, language, audio_format, etc.). Merged as the base layer when opening TTS streams via `client.realtime.tts(...)`. Fields on the caller-provided TtsStreamInput override these defaults. Equivalent to SonioxConnectionConfig.tts_defaults on the web/react clients.
`tts_ws_url?`	`string`	TTS WebSocket URL for real-time connections. Falls back to SONIOX_TTS_WS_URL environment variable, then to 'wss://tts-rt.soniox.com/tts-websocket'.
`ws_base_url?`	`string`	STT WebSocket base URL for real-time connections. Falls back to SONIOX_WS_URL environment variable, then to 'wss://stt-rt.soniox.com/transcribe-websocket'.

RealtimeResult

type RealtimeResult = {
  final_audio_proc_ms: number;
  finished?: boolean;
  tokens: RealtimeToken[];
  total_audio_proc_ms: number;
};

A result message from the real-time WebSocket.

Properties

Property	Type	Description
`final_audio_proc_ms`	`number`	Milliseconds of audio that have been finalized.
`finished?`	`boolean`	Whether this is the final result (session ending).
`tokens`	`RealtimeToken`[]	Tokens in this result.
`total_audio_proc_ms`	`number`	Total milliseconds of audio processed.

RealtimeSegment

type RealtimeSegment = {
  end_ms?: number;
  language?: string;
  speaker?: string;
  start_ms?: number;
  text: string;
  tokens: RealtimeToken[];
};

A segment of contiguous real-time tokens grouped by speaker/language.

Properties

Property	Type	Description
`end_ms?`	`number`	End time of the segment in milliseconds (from last token).
`language?`	`string`	Detected language code (if language identification enabled).
`speaker?`	`string`	Speaker identifier (if diarization enabled).
`start_ms?`	`number`	Start time of the segment in milliseconds (from first token).
`text`	`string`	Concatenated text of all tokens in this segment.
`tokens`	`RealtimeToken`[]	Original tokens in this segment.

RealtimeSegmentBufferOptions

type RealtimeSegmentBufferOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
  max_ms?: number;
  max_tokens?: number;
};

Options for rolling real-time segmentation buffers.

Properties

Property	Type	Description
`final_only?`	`boolean`	When true, only tokens marked as final are buffered. Default `true`
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`
`max_ms?`	`number`	Maximum time window to keep in milliseconds (requires token timings).
`max_tokens?`	`number`	Maximum number of tokens to keep in the buffer. Default `2000`

RealtimeSegmentOptions

type RealtimeSegmentOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
};

Options for segmenting real-time tokens.

Properties

Property	Type	Description
`final_only?`	`boolean`	When true, only tokens marked as final are included. Default `false`
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`

RealtimeToken

type RealtimeToken = {
  confidence: number;
  end_ms?: number;
  is_final: boolean;
  language?: string;
  source_language?: string;
  speaker?: string;
  start_ms?: number;
  text: string;
  translation_status?: "none" | "original" | "translation";
};

A single token from the real-time transcription.

Properties

Property	Type	Description
`confidence`	`number`	Confidence score (0.0 to 1.0).
`end_ms?`	`number`	End time in milliseconds relative to audio start.
`is_final`	`boolean`	Whether this is a finalized token.
`language?`	`string`	Detected language code (if language identification enabled).
`source_language?`	`string`	Source language for translated tokens.
`speaker?`	`string`	Speaker identifier (if diarization enabled).
`start_ms?`	`number`	Start time in milliseconds relative to audio start.
`text`	`string`	The transcribed text.
`translation_status?`	`"none"` \| `"original"` \| `"translation"`	Translation status of this token.

RealtimeUtterance

type RealtimeUtterance = {
  end_ms?: number;
  final_audio_proc_ms?: number;
  language?: string;
  segments: RealtimeSegment[];
  speaker?: string;
  start_ms?: number;
  text: string;
  tokens: RealtimeToken[];
  total_audio_proc_ms?: number;
};

A single utterance built from real-time segments.

Properties

Property	Type	Description
`end_ms?`	`number`	End time of the utterance in milliseconds (from last segment).
`final_audio_proc_ms?`	`number`	Milliseconds of audio that have been finalized at flush time.
`language?`	`string`	Detected language code when consistent across segments.
`segments`	`RealtimeSegment`[]	Segments included in this utterance.
`speaker?`	`string`	Speaker identifier when consistent across segments.
`start_ms?`	`number`	Start time of the utterance in milliseconds (from first segment).
`text`	`string`	Concatenated text of all segments in this utterance.
`tokens`	`RealtimeToken`[]	Tokens included in this utterance.
`total_audio_proc_ms?`	`number`	Total milliseconds of audio processed at flush time.

RealtimeUtteranceBufferOptions

type RealtimeUtteranceBufferOptions = {
  final_only?: boolean;
  group_by?: SegmentGroupKey[];
  max_ms?: number;
  max_tokens?: number;
};

Options for buffering real-time utterances.

Properties

Property	Type	Description
`final_only?`	`boolean`	When true, only tokens marked as final are buffered. Default `true`
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`
`max_ms?`	`number`	Maximum time window to keep in milliseconds (requires token timings).
`max_tokens?`	`number`	Maximum number of tokens to keep in the buffer. Default `2000`

RecomputeVoiceOptions

type RecomputeVoiceOptions = {
  model?: string | null;
  signal?: AbortSignal;
};

Options for recomputing a voice.

Properties

Property	Type	Description
`model?`	`string` \| `null`	The model to prepare this voice for. If omitted, the voice is prepared for every available model it is not ready for yet.
`signal?`	`AbortSignal`	AbortSignal for cancelling the request.

SegmentGroupKey

type SegmentGroupKey = "speaker" | "language";

Fields that can be used to group tokens into segments

SegmentTranscriptOptions

type SegmentTranscriptOptions = {
  group_by?: SegmentGroupKey[];
};

Options for segmenting a transcript

Properties

Property	Type	Description
`group_by?`	`SegmentGroupKey`[]	Fields to group by. A new segment starts when any of these fields changes Default `['speaker', 'language']`

SendStreamOptions

type SendStreamOptions = {
  finish?: boolean;
  pace_ms?: number;
};

Options for streaming audio from an async iterable source.

Properties

Property	Type	Description
`finish?`	`boolean`	When true, calls finish() automatically after the stream ends. Default `false`
`pace_ms?`	`number`	Delay in milliseconds between sending chunks. Useful for simulating real-time pace when streaming pre-recorded files. Not needed for live audio sources.

SonioxErrorCode

type SonioxErrorCode = 
  | RealtimeErrorCode
  | "soniox_error"
  | HttpErrorCode;

All possible SDK error codes (real-time + HTTP-specific codes)

SonioxFileData

type SonioxFileData = {
  client_reference_id?: string | null;
  created_at: string;
  filename: string;
  id: string;
  size: number;
};

Raw file metadata from the API.

Properties

Property	Type	Description
`client_reference_id?`	`string` \| `null`	Optional tracking identifier string.
`created_at`	`string`	UTC timestamp indicating when the file was uploaded. Format date-time
`filename`	`string`	Name of the file.
`id`	`string`	Unique identifier of the file. Format uuid
`size`	`number`	Size of the file in bytes.

SonioxLanguage

type SonioxLanguage = {
  code: string;
  name: string;
};

Properties

Property	Type	Description
`code`	`string`	2-letter language code.
`name`	`string`	Language name.

SonioxModel

type SonioxModel = {
  aliased_model_id: string | null;
  context_version: number | null;
  id: string;
  languages: SonioxLanguage[];
  name: string;
  one_way_translation: string | null;
  supports_language_hints_strict: boolean;
  supports_max_endpoint_delay: boolean;
  transcription_mode: SonioxTranscriptionMode;
  translation_targets: SonioxTranslationTarget[];
  two_way_translation: string | null;
  two_way_translation_pairs: string[];
};

Properties

Property	Type	Description
`aliased_model_id`	`string` \| `null`	If this is an alias, the id of the aliased model. Null for non-alias models.
`context_version`	`number` \| `null`	Version of context supported.
`id`	`string`	Unique identifier of the model.
`languages`	`SonioxLanguage`[]	List of languages supported by the model.
`name`	`string`	Name of the model.
`one_way_translation`	`string` \| `null`	When contains string 'all_languages', any laguage from languages can be used
`supports_language_hints_strict`	`boolean`	TODO: Add documentation
`supports_max_endpoint_delay`	`boolean`	-
`transcription_mode`	`SonioxTranscriptionMode`	Transcription mode of the model.
`translation_targets`	`SonioxTranslationTarget`[]	List of supported one-way translation targets. If list is empty, check for one_way_translation field
`two_way_translation`	`string` \| `null`	When contains string 'all_languages',' any laguage pair from languages can be used
`two_way_translation_pairs`	`string`[]	List of supported two-way translation pairs. If list is empty, check for two_way_translation field

SonioxNodeClientOptions

type SonioxNodeClientOptions = {
  api_key?: string;
  base_domain?: string;
  base_url?: string;
  http_client?: HttpClient;
  realtime?: RealtimeOptions;
  region?: SonioxRegion;
  stt_defaults?: Partial<SttSessionConfig>;
  tts_api_url?: string;
  tts_defaults?: Partial<TtsStreamConfig>;
};

Properties

Property	Type	Description
`api_key?`	`string`	API key for authentication. Falls back to SONIOX_API_KEY environment variable if not provided.
`base_domain?`	`string`	Base domain for all Soniox service URLs. A single override that derives all service endpoints from the pattern `{service}.{base_domain}`. Takes precedence over `region`. Falls back to SONIOX_BASE_DOMAIN environment variable. Individual URL fields (`base_url`, `tts_api_url`, `realtime.ws_base_url`, `realtime.tts_ws_url`) still take final precedence. Example `'eu.soniox.com'`
`base_url?`	`string`	Base URL for the REST API. Falls back to SONIOX_API_BASE_URL environment variable, then to the region-derived URL, then to 'https://api.soniox.com'.
`http_client?`	`HttpClient`	Custom HTTP client implementation.
`realtime?`	`RealtimeOptions`	Real-time API configuration options.
`region?`	`SonioxRegion`	Deployment region. Determines which regional endpoints are used for both the REST API and real-time WebSocket connections. Leave `undefined` for the default (US) region. Shorthand for `base_domain: '{region}.soniox.com'`. `base_domain` takes precedence when both are provided. See https://soniox.com/docs/stt/data-residency
`stt_defaults?`	`Partial`<`SttSessionConfig`>	Default STT session config fields applied to every real-time STT session opened via `client.realtime.stt(config)`. Caller-provided fields override. Equivalent to SonioxConnectionConfig.stt_defaults on the web/react clients. Prefer this when you want the same defaults across your whole Node process.
`tts_api_url?`	`string`	TTS REST API URL. Falls back to SONIOX_TTS_API_URL environment variable, then to the region-derived URL, then to 'https://tts-rt.soniox.com'.
`tts_defaults?`	`Partial`<`TtsStreamConfig`>	Default TTS stream config fields applied to every real-time TTS stream opened via `client.realtime.tts(...)`. Caller-provided fields override. Equivalent to SonioxConnectionConfig.tts_defaults on the web/react clients.

SonioxTranscriptionData

type SonioxTranscriptionData = {
  audio_duration_ms?: number | null;
  audio_url?: string | null;
  client_reference_id?: string | null;
  context?: TranscriptionContext | null;
  created_at: string;
  enable_language_identification: boolean;
  enable_speaker_diarization: boolean;
  error_message?: string | null;
  error_type?: string | null;
  file_id?: string | null;
  filename: string;
  id: string;
  language_hints?: string[] | null;
  model: string;
  status: TranscriptionStatus;
  webhook_auth_header_name?: string | null;
  webhook_auth_header_value?: string | null;
  webhook_status_code?: number | null;
  webhook_url?: string | null;
};

Raw transcription metadata from the API.

Properties

Property	Type	Description
`audio_duration_ms?`	`number` \| `null`	Duration of the audio in milliseconds. Only available after processing begins.
`audio_url?`	`string` \| `null`	URL of the audio file being transcribed.
`client_reference_id?`	`string` \| `null`	Optional tracking identifier. Max Length 256
`context?`	`TranscriptionContext` \| `null`	Additional context provided for the transcription.
`created_at`	`string`	UTC timestamp when the transcription was created. Format date-time
`enable_language_identification`	`boolean`	When true, language is detected for each part of the transcription.
`enable_speaker_diarization`	`boolean`	When true, speakers are identified and separated in the transcription output.
`error_message?`	`string` \| `null`	Error message if transcription failed. Null for successful or in-progress transcriptions.
`error_type?`	`string` \| `null`	Error type if transcription failed. Null for successful or in-progress transcriptions.
`file_id?`	`string` \| `null`	ID of the uploaded file being transcribed. Format uuid
`filename`	`string`	Name of the file being transcribed.
`id`	`string`	Unique identifier of the transcription. Format uuid
`language_hints?`	`string`[] \| `null`	Expected languages in the audio. If not specified, languages are automatically detected.
`model`	`string`	Speech-to-text model used.
`status`	`TranscriptionStatus`	Current status of the transcription.
`webhook_auth_header_name?`	`string` \| `null`	Name of the authentication header sent with webhook notifications.
`webhook_auth_header_value?`	`string` \| `null`	Authentication header value. Always returned masked.
`webhook_status_code?`	`number` \| `null`	HTTP status code received from your server when webhook was delivered. Null if not yet sent.
`webhook_url?`	`string` \| `null`	URL to receive webhook notifications when transcription is completed or fails.

SonioxTranslation

type SonioxTranslation = 
  | OneWayTranslation
  | TwoWayTranslation;

Discriminated translation result returned by SonioxTranslationJob.getTranslation, SonioxTranslationJob.fetchTranslation, and translateFromTranscript.

SonioxTranslationTarget

type SonioxTranslationTarget = {
  exclude_source_languages: string[];
  source_languages: string[];
  target_language: string;
};

Properties

Property	Type
`exclude_source_languages`	`string`[]
`source_languages`	`string`[]
`target_language`	`string`

SonioxUsageLog

type SonioxUsageLog = {
  client_reference_id?: string | null;
  cost_usd: string;
  end_time: string;
  input_audio_cost_usd: string;
  input_audio_duration_ms: number;
  input_audio_tokens: number;
  input_cost_usd: string;
  input_text_cost_usd: string;
  input_text_tokens: number;
  model: string;
  output_audio_cost_usd: string;
  output_audio_duration_ms: number;
  output_audio_tokens: number;
  output_cost_usd: string;
  output_text_cost_usd: string;
  output_text_tokens: number;
  request_scope: string;
  start_time: string;
  uuid: string;
};

Per-request usage log entry.

Properties

Property	Type	Description
`client_reference_id?`	`string` \| `null`	Optional tracking identifier provided by the caller.
`cost_usd`	`string`	Total request cost in USD, represented as a decimal string.
`end_time`	`string`	UTC timestamp indicating when the request ended. Format date-time
`input_audio_cost_usd`	`string`	Input audio cost in USD, represented as a decimal string.
`input_audio_duration_ms`	`number`	Input audio duration in milliseconds.
`input_audio_tokens`	`number`	Number of input audio tokens.
`input_cost_usd`	`string`	Input cost in USD, represented as a decimal string.
`input_text_cost_usd`	`string`	Input text cost in USD, represented as a decimal string.
`input_text_tokens`	`number`	Number of input text tokens.
`model`	`string`	Model used for the request.
`output_audio_cost_usd`	`string`	Output audio cost in USD, represented as a decimal string.
`output_audio_duration_ms`	`number`	Output audio duration in milliseconds.
`output_audio_tokens`	`number`	Number of output audio tokens.
`output_cost_usd`	`string`	Output cost in USD, represented as a decimal string.
`output_text_cost_usd`	`string`	Output text cost in USD, represented as a decimal string.
`output_text_tokens`	`number`	Number of output text tokens.
`request_scope`	`string`	Request scope.
`start_time`	`string`	UTC timestamp indicating when the request started. Format date-time
`uuid`	`string`	Unique identifier of the request. Format uuid

SonioxVoiceData

type SonioxVoiceData = {
  created_at: string;
  filename: string;
  id: string;
  models: VoiceModelStatusEntry[];
  name: string;
};

Raw voice metadata from the API.

Properties

Property	Type	Description
`created_at`	`string`	UTC timestamp indicating when the voice was created. Format date-time
`filename`	`string`	Original file name of the uploaded audio clip.
`id`	`string`	Unique identifier of the voice. Format uuid
`models`	`VoiceModelStatusEntry`[]	Voice status for each available model. A model with status `not_computed` is not prepared yet (e.g. it was released after the voice was created); call recompute to prepare the voice for it.
`name`	`string`	Name of the voice.

SttSessionConfig

type SttSessionConfig = {
  audio_format?: "auto" | AudioFormat;
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_endpoint_detection?: boolean;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  endpoint_latency_adjustment_level?: number;
  endpoint_sensitivity?: number;
  language_hints?: string[];
  language_hints_strict?: boolean;
  max_endpoint_delay_ms?: number;
  model: string;
  num_channels?: number;
  sample_rate?: number;
  translation?: TranslationConfig;
};

Configuration sent to the Soniox WebSocket API when starting a session.

Properties

Property	Type	Description
`audio_format?`	`"auto"` \| `AudioFormat`	Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default `'auto'`
`client_reference_id?`	`string`	Optional tracking identifier (max 256 chars).
`context?`	`TranscriptionContext`	Additional context to improve transcription accuracy.
`enable_endpoint_detection?`	`boolean`	Enable endpoint detection for utterance boundaries. Useful for voice AI agents.
`enable_language_identification?`	`boolean`	Enable automatic language detection.
`enable_speaker_diarization?`	`boolean`	Enable speaker identification.
`endpoint_latency_adjustment_level?`	`number`	Reduces endpoint latency compared to the default endpointing behavior. Higher values reduce endpoint latency more aggressively, which means endpoints are returned sooner and more endpoints may be emitted. This can split long speech into more segments and may slightly reduce word recognition accuracy because speech is finalized earlier. Allowed values are 0, 1, 2, and 3. The default value is 0 (default semantic endpointing behavior).
`endpoint_sensitivity?`	`number`	Controls how aggressively endpoints are detected. Adjusts how likely the model is to emit an endpoint. Higher values make endpoints more likely, which can finalize segments sooner. Lower values make endpoints less likely, which can help the system wait longer before finalizing. Allowed values are between -1.0 and 1.0. The default value is 0.0.
`language_hints?`	`string`[]	Expected languages in the audio (ISO language codes).
`language_hints_strict?`	`boolean`	When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee.
`max_endpoint_delay_ms?`	`number`	Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms
`model`	`string`	Speech-to-text model to use.
`num_channels?`	`number`	Number of audio channels (required for raw audio formats).
`sample_rate?`	`number`	Sample rate in Hz (required for PCM formats).
`translation?`	`TranslationConfig`	Translation configuration.

SttSessionEvents

type SttSessionEvents = {
  connected: () => void;
  disconnected: (reason?) => void;
  endpoint: () => void;
  error: (error) => void;
  finalized: () => void;
  finished: () => void;
  result: (result) => void;
  state_change: (update) => void;
  token: (token) => void;
};

Event handlers for the STT session.

Properties

Property	Type	Description
`connected`	() => `void`	Session connected and ready.
`disconnected`	(`reason?`) => `void`	Session disconnected.
`endpoint`	() => `void`	Endpoint detected (<end> token).
`error`	(`error`) => `void`	Error occurred.
`finalized`	() => `void`	Finalization complete (<fin> token).
`finished`	() => `void`	Session finished (server signaled end of stream).
`result`	(`result`) => `void`	Parsed result received.
`state_change`	(`update`) => `void`	Session state transition.
`token`	(`token`) => `void`	Individual token received.

SttSessionOptions

type SttSessionOptions = {
  connect_timeout_ms?: number;
  keepalive_interval_ms?: number;
  signal?: AbortSignal;
};

SDK-level session options (not sent to the server).

Properties

Property	Type	Description
`connect_timeout_ms?`	`number`	Maximum time to wait for the WebSocket connection to open (milliseconds). If the connection is not established within this time, a ConnectionError with message "Connection timed out" is thrown. Default `20000`
`keepalive_interval_ms?`	`number`	Interval for sending keepalive messages while paused (milliseconds). Default `5000`
`signal?`	`AbortSignal`	AbortSignal for cancellation.

TemporaryApiKeyRequest

type TemporaryApiKeyRequest = {
  client_reference_id?: string;
  expires_in_seconds: number;
  max_session_duration_seconds?: number;
  single_use?: boolean;
  usage_type: TemporaryApiKeyUsageType;
};

Properties

Property	Type	Description
`client_reference_id?`	`string`	Optional tracking identifier string. Does not need to be unique Max Length 256
`expires_in_seconds`	`number`	Duration in seconds until the temporary API key expires Minimum 1 Maximum 3600
`max_session_duration_seconds?`	`number`	Maximum connection duration in seconds for WebSocket and TTS HTTP streaming endpoints. Minimum 1 Maximum 18000
`single_use?`	`boolean`	When true, restricts the temporary API key to a single use.
`usage_type`	`TemporaryApiKeyUsageType`	Intended usage of the temporary API key.

TemporaryApiKeyResponse

type TemporaryApiKeyResponse = {
  api_key: string;
  expires_at: string;
};

Properties

Property	Type	Description
`api_key`	`string`	Created temporary API key.
`expires_at`	`string`	UTC timestamp indicating when generated temporary API key will expire Format date-time

TemporaryApiKeyUsageType

type TemporaryApiKeyUsageType = "transcribe_websocket" | "tts_rt";

TranscribeBaseOptions

type TranscribeBaseOptions = {
  cleanup?: CleanupTarget[];
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_language_identification?: boolean;
  enable_speaker_diarization?: boolean;
  fetch_transcript?: boolean;
  language_hints?: string[];
  language_hints_strict?: boolean;
  model: string;
  signal?: AbortSignal;
  timeout_ms?: number;
  translation?: TranslationConfig;
  wait?: boolean;
  wait_options?: WaitOptions;
  webhook_auth_header_name?: string;
  webhook_auth_header_value?: string;
  webhook_query?: string | URLSearchParams | Record<string, string>;
  webhook_url?: string;
};

Base options shared by all audio source variants.

Properties

Property	Type	Description
`cleanup?`	`CleanupTarget`[]	Resources to clean up after transcription completes or on error/timeout. Only applies when `wait: true`. Cleanup runs in all cases when `wait: true`: - After successful completion - After transcription errors (status: 'error') - On timeout or abort This ensures no orphaned resources are left behind. Example `// Delete only the uploaded file cleanup: ['file'] // Delete only the transcription record cleanup: ['transcription'] // Delete both file and transcription cleanup: ['file', 'transcription']`
`client_reference_id?`	`string`	Optional tracking identifier. Max Length 256
`context?`	`TranscriptionContext`	Additional context to improve transcription accuracy and formatting of specialized terms.
`enable_language_identification?`	`boolean`	Enable automatic language identification.
`enable_speaker_diarization?`	`boolean`	Enable speaker diarization to identify different speakers.
`fetch_transcript?`	`boolean`	When true (default), fetches the transcript and attaches it to the result when wait=true and the transcription completes successfully. Set to false to skip fetching the full transcript payload. Default `true`
`language_hints?`	`string`[]	Array of expected ISO language codes to bias recognition.
`language_hints_strict?`	`boolean`	When true, model relies more heavily on language hints.
`model`	`string`	Speech-to-text model to use. Max Length 32
`signal?`	`AbortSignal`	AbortSignal to cancel the operation
`timeout_ms?`	`number`	Timeout in milliseconds
`translation?`	`TranslationConfig`	Translation configuration.
`wait?`	`boolean`	When true, waits for transcription to complete before returning. Default `false`
`wait_options?`	`WaitOptions`	Options for waiting (only used when wait=true).
`webhook_auth_header_name?`	`string`	Name of the authentication header sent with webhook notifications. Max Length 256
`webhook_auth_header_value?`	`string`	Authentication header value sent with webhook notifications. Max Length 256
`webhook_query?`	`string` \| `URLSearchParams` \| `Record`<`string`, `string`>	Query parameters to append to the webhook URL. Useful for encoding metadata like transcription ID in the webhook callback. Can be a string, URLSearchParams, or Record<string, string>.
`webhook_url?`	`string`	URL to receive webhook notifications when transcription is completed or fails. Max Length 256

TranscribeFromFile

type TranscribeFromFile = TranscribeBaseOptions & {
  audio_url?: never;
  file: UploadFileInput;
  file_id?: never;
  filename?: string;
};

Transcribe from a direct file upload (Buffer, Uint8Array, Blob, or ReadableStream)

Type Declaration

Name	Type	Description
`audio_url?`	`never`	-
`file`	`UploadFileInput`	File data to upload and transcribe.
`file_id?`	`never`	-
`filename?`	`string`	-

TranscribeFromFileId

type TranscribeFromFileId = TranscribeBaseOptions & {
  audio_url?: never;
  file?: never;
  file_id: string;
  filename?: never;
};

Transcribe from a previously uploaded file

Type Declaration

Name	Type	Description
`audio_url?`	`never`	-
`file?`	`never`	-
`file_id`	`string`	ID of a previously uploaded file. Format uuid
`filename?`	`never`	-

TranscribeFromFileIdOptions

type TranscribeFromFileIdOptions = Omit<TranscribeFromFileId, "file_id">;

Options for transcribing from an uploaded file ID via transcribeFromFileId.

TranscribeFromFileOptions

type TranscribeFromFileOptions = Omit<TranscribeFromFile, "file">;

Options for transcribing from a file via transcribeFromFile.

TranscribeFromUrl

type TranscribeFromUrl = TranscribeBaseOptions & {
  audio_url: string;
  file?: never;
  file_id?: never;
  filename?: never;
};

Transcribe from a publicly accessible audio URL

Type Declaration

Name	Type	Description
`audio_url`	`string`	URL of a publicly accessible audio file. Max Length 4096
`file?`	`never`	-
`file_id?`	`never`	-
`filename?`	`never`	-

TranscribeFromUrlOptions

type TranscribeFromUrlOptions = Omit<TranscribeFromUrl, "audio_url">;

Options for transcribing from a URL via transcribeFromUrl.

TranscribeOptions

type TranscribeOptions = 
  | TranscribeFromFile
  | TranscribeFromFileId
  | TranscribeFromUrl;

Options for the unified transcribe method Exactly one audio source must be provided: file, file_id, or audio_url

TranscriptResponse

type TranscriptResponse = {
  id: string;
  text: string;
  tokens: TranscriptToken[];
};

Response from getting a transcription transcript.

Properties

Property	Type	Description
`id`	`string`	Unique identifier of the transcription this transcript belongs to. Format uuid
`text`	`string`	Complete transcribed text content.
`tokens`	`TranscriptToken`[]	List of detailed token information with timestamps and metadata.

TranscriptSegment

type TranscriptSegment = {
  end_ms?: number;
  language?: string;
  speaker?: string;
  start_ms?: number;
  text: string;
  tokens: TranscriptToken[];
};

A segment of contiguous tokens grouped by speaker and language

Properties

Property	Type	Description
`end_ms?`	`number`	End time of the segment in milliseconds (from last token). Absent for translation-only segments where the underlying tokens carry no timestamps.
`language?`	`string`	Detected language code (if language identification was enabled).
`speaker?`	`string`	Speaker identifier (if speaker diarization was enabled).
`start_ms?`	`number`	Start time of the segment in milliseconds (from first token). Absent for translation-only segments where the underlying tokens carry no timestamps.
`text`	`string`	Concatenated text of all tokens in this segment.
`tokens`	`TranscriptToken`[]	Original tokens in this segment.

TranscriptToken

type TranscriptToken = {
  confidence: number;
  end_ms?: number;
  is_audio_event?: boolean | null;
  language?: string | null;
  source_language?: string | null;
  speaker?: string | null;
  start_ms?: number;
  text: string;
  translation_status?: "none" | "original" | "translation" | null;
};

A single token from the transcript with timing and confidence information.

Properties

Property	Type	Description
`confidence`	`number`	Confidence score for this token (0.0 to 1.0).
`end_ms?`	`number`	End time of the token in milliseconds. Present on original tokens (`translation_status` of `'original'` or `'none'`) and absent on translation tokens (`translation_status: 'translation'`), which do not carry timing.
`is_audio_event?`	`boolean` \| `null`	Whether this token represents an audio event.
`language?`	`string` \| `null`	Language code for this token. For original tokens (`translation_status` of `'original'` or `'none'`) this is the spoken language. For translation tokens (`translation_status: 'translation'`) this is the target language. Present on every token whenever language identification or translation is configured.
`source_language?`	`string` \| `null`	Source language for translation tokens (`translation_status: 'translation'`). Identifies the language being translated from. Not set on original or `'none'` tokens; their language is in TranscriptToken.language.
`speaker?`	`string` \| `null`	Speaker identifier (if speaker diarization was enabled).
`start_ms?`	`number`	Start time of the token in milliseconds. Present on original tokens (`translation_status` of `'original'` or `'none'`) and absent on translation tokens (`translation_status: 'translation'`), which do not carry timing.
`text`	`string`	The text content of this token.
`translation_status?`	`"none"` \| `"original"` \| `"translation"` \| `null`	Translation status for this token.

TranscriptionContext

type TranscriptionContext = {
  general?: ContextGeneralEntry[];
  terms?: string[];
  text?: string;
  translation_terms?: ContextTranslationTerm[];
};

Additional context to improve transcription and translation accuracy. All sections are optional - include only what's relevant for your use case.

Properties

Property	Type	Description
`general?`	`ContextGeneralEntry`[]	Structured key-value pairs describing domain, topic, intent, participant names, etc.
`terms?`	`string`[]	Domain-specific or uncommon words to recognize.
`text?`	`string`	Longer free-form background text, prior interaction history, reference documents, or meeting notes.
`translation_terms?`	`ContextTranslationTerm`[]	Custom translations for ambiguous terms.

TranscriptionIdentifier

type TranscriptionIdentifier = 
  | string
  | {
  id: string;
};

Transcription identifier - either a string ID or an object with an id property.

TranscriptionsCountResponse

type TranscriptionsCountResponse = {
  playground: number;
  public_api: number;
  total: number;
};

Total number of transcriptions, split by request scope.

Properties

Property	Type	Description
`playground`	`number`	Number of transcriptions created via the Playground.
`public_api`	`number`	Number of transcriptions created via Public API.
`total`	`number`	Total number of transcriptions across all scopes.

TranslateAudioSource

type TranslateAudioSource = 
  | {
  audio_url?: never;
  file: UploadFileInput;
  file_id?: never;
  filename?: string;
}
  | {
  audio_url?: never;
  file?: never;
  file_id: string;
  filename?: never;
}
  | {
  audio_url: string;
  file?: never;
  file_id?: never;
  filename?: never;
};

Audio source for SonioxSttApi.translate. Exactly one of file, file_id, or audio_url must be provided.

Type Declaration

{
  audio_url?: never;
  file: UploadFileInput;
  file_id?: never;
  filename?: string;
}

Name	Type	Description
`audio_url?`	`never`	-
`file`	`UploadFileInput`	File data to upload and translate.
`file_id?`	`never`	-
`filename?`	`string`	-

{
  audio_url?: never;
  file?: never;
  file_id: string;
  filename?: never;
}

Name	Type	Description
`audio_url?`	`never`	-
`file?`	`never`	-
`file_id`	`string`	ID of a previously uploaded file. Format uuid
`filename?`	`never`	-

{
  audio_url: string;
  file?: never;
  file_id?: never;
  filename?: never;
}

Name	Type	Description
`audio_url`	`string`	URL of a publicly accessible audio file. Max Length 4096
`file?`	`never`	-
`file_id?`	`never`	-
`filename?`	`never`	-

TranslateBaseOptions

type TranslateBaseOptions = {
  cleanup?: CleanupTarget[];
  client_reference_id?: string;
  context?: TranscriptionContext;
  enable_speaker_diarization?: boolean;
  fetch_translation?: boolean;
  model?: string;
  signal?: AbortSignal;
  timeout_ms?: number;
  wait?: boolean;
  wait_options?: WaitOptions;
  webhook_auth_header_name?: string;
  webhook_auth_header_value?: string;
  webhook_query?: string | URLSearchParams | Record<string, string>;
  webhook_url?: string;
};

Common (non-mode, non-source) options shared by every translate call.

Properties

Property	Type	Description
`cleanup?`	`CleanupTarget`[]	Resources to clean up after translation completes or on error/timeout.
`client_reference_id?`	`string`	Optional tracking identifier. Max Length 256
`context?`	`TranscriptionContext`	Additional context to improve transcription and translation accuracy.
`enable_speaker_diarization?`	`boolean`	Enable speaker diarization to identify different speakers.
`fetch_translation?`	`boolean`	When true (default), fetches and reshapes the translation result when `wait=true` and the job completes successfully. Default `true`
`model?`	`string`	Speech-to-text model to use. Default `'stt-async-v5'` Max Length 32
`signal?`	`AbortSignal`	AbortSignal to cancel the operation.
`timeout_ms?`	`number`	Timeout in milliseconds.
`wait?`	`boolean`	When true, waits for translation to complete before returning. Default `false`
`wait_options?`	`WaitOptions`	Options for waiting on completion.
`webhook_auth_header_name?`	`string`	Name of the authentication header sent with webhook notifications. Max Length 256
`webhook_auth_header_value?`	`string`	Authentication header value sent with webhook notifications. Max Length 256
`webhook_query?`	`string` \| `URLSearchParams` \| `Record`<`string`, `string`>	Query parameters to append to the webhook URL.
`webhook_url?`	`string`	URL to receive webhook notifications when translation is completed or fails. Max Length 256

TranslateFromTranscriptMode

type TranslateFromTranscriptMode = 
  | {
  from?: string;
  to: string;
  type: "one_way";
}
  | {
  language_a: string;
  language_b: string;
  type: "two_way";
};

Mode parameter accepted by translateFromTranscript.

The async translate() method stores this internally on the returned job; webhook handlers (and other callers that already have a transcript in hand) supply it directly.

TranslateMode

type TranslateMode = 
  | {
  between?: never;
  from?: never;
  to: string;
}
  | {
  between?: never;
  from: string;
  to: string;
}
  | {
  between: [string, string];
  from?: never;
  to?: never;
};

Shorthand specification of the translation direction(s) for SonioxSttApi.translate.

Three mutually exclusive shapes:

{ to } — one-way translation into to. Source language(s) are detected automatically.
{ to, from } — one-way translation from from to to. The source language is hinted to the model.
{ between: [a, b] } — two-way translation between a and b. Each side is translated into the other; speech in any third language is passed through as-is.

TranslateOptions

type TranslateOptions = TranslateMode & TranslateAudioSource & TranslateBaseOptions;

Options for SonioxSttApi.translate.

Combines a TranslateMode (the translation direction shorthand), a TranslateAudioSource (file, file_id, or audio_url), and TranslateBaseOptions.

TranslationSegment

type TranslationSegment = {
  end_ms?: number;
  from: string;
  original_text: string;
  original_tokens: TranscriptToken[];
  speaker?: string;
  start_ms?: number;
  to?: string;
  translation_text?: string;
  translation_tokens?: TranscriptToken[];
};

A grouped pair of original speech and (optionally) its translation, derived from the underlying transcript tokens.

In one-way mode every segment that originated from speech in the source language carries both original_* and translation_* fields. In two-way mode the same is true for the two configured languages; speech in a third language flows through with translation_status: 'none' and the translation fields are omitted.

Properties

Property	Type	Description
`end_ms?`	`number`	End time of the segment in milliseconds, taken from the last original token. Absent when the segment has no original tokens.
`from`	`string`	Source language code. Derived from `original_tokens[0].language` when originals are present, otherwise from `translation_tokens[0].source_language`.
`original_text`	`string`	Concatenated text of `original_tokens`.
`original_tokens`	`TranscriptToken`[]	Original tokens (`translation_status` of `'original'` or `'none'`) for this segment, in order.
`speaker?`	`string`	Speaker identifier (when speaker diarization is enabled).
`start_ms?`	`number`	Start time of the segment in milliseconds, taken from the first original token. Absent when the segment has no original tokens.
`to?`	`string`	Target language code. Omitted when there are no translation tokens (e.g. third-language pass-through under `between`).
`translation_text?`	`string`	Concatenated text of `translation_tokens`. Omitted when there are no translation tokens.
`translation_tokens?`	`TranscriptToken`[]	Translation tokens (`translation_status: 'translation'`) for this segment, in order. Omitted when there are no translation tokens.

TtsAudioFormat

type TtsAudioFormat = 
  | "pcm_f32le"
  | "pcm_s16le"
  | "pcm_mulaw"
  | "pcm_alaw"
  | "wav"
  | "aac"
  | "mp3"
  | "opus"
  | "flac"
  | string & {
};

Supported audio formats for Text-to-Speech output.

TtsConnectionEvents

type TtsConnectionEvents = {
  close: () => void;
  error: (error) => void;
};

Events emitted by a TTS WebSocket connection.

Properties

Property	Type	Description
`close`	() => `void`	The WebSocket connection was closed.
`error`	(`error`) => `void`	A connection-level error occurred. Always a RealtimeError subclass (e.g. ConnectionError, NetworkError, AuthError).

TtsConnectionOptions

type TtsConnectionOptions = {
  connect_timeout_ms?: number;
  keepalive_interval_ms?: number;
};

Options for creating a TTS connection.

Properties

Property	Type	Description
`connect_timeout_ms?`	`number`	Maximum time to wait for the WebSocket connection to open (milliseconds). Default `20000`
`keepalive_interval_ms?`	`number`	Interval for sending keepalive messages (milliseconds). Default `5000` Minimum 1000

TtsEvent

type TtsEvent = {
  audio?: string;
  audio_end?: boolean;
  error_code?: number;
  error_message?: string;
  stream_id?: string;
  terminated?: boolean;
  timestamps?: TtsTimestamps;
};

Raw JSON event received from the TTS WebSocket server.

Properties

Property	Type
`audio?`	`string`
`audio_end?`	`boolean`
`error_code?`	`number`
`error_message?`	`string`
`stream_id?`	`string`
`terminated?`	`boolean`
`timestamps?`	`TtsTimestamps`

TtsLanguage

type TtsLanguage = {
  code: string;
  name: string;
};

A language supported by a Text-to-Speech model.

Properties

Property	Type	Description
`code`	`string`	ISO language code.
`name`	`string`	Human-readable language name.

TtsModel

type TtsModel = {
  aliased_model_id?: string | null;
  id: string;
  languages: TtsLanguage[];
  name: string;
  voices: TtsVoice[];
};

A Text-to-Speech model.

Properties

Property	Type	Description
`aliased_model_id?`	`string` \| `null`	If this is an alias, the id of the aliased model.
`id`	`string`	Unique identifier of the model.
`languages`	`TtsLanguage`[]	Languages supported by this model.
`name`	`string`	Name of the model.
`voices`	`TtsVoice`[]	Voices supported by this model.

TtsStreamConfig

type TtsStreamConfig = {
  audio_format: string;
  bitrate?: number;
  language: string;
  model: string;
  return_timestamps?: boolean;
  sample_rate?: number;
  speed?: number;
  stream_id: string;
  voice: string;
};

Fully resolved TTS stream config sent over the WebSocket. All required fields are present after merging input with defaults.

Properties

Property	Type
`audio_format`	`string`
`bitrate?`	`number`
`language`	`string`
`model`	`string`
`return_timestamps?`	`boolean`
`sample_rate?`	`number`
`speed?`	`number`
`stream_id`	`string`
`voice`	`string`

TtsStreamEvents

type TtsStreamEvents = {
  audio: (chunk, timestamps?) => void;
  audioEnd: () => void;
  error: (error) => void;
  terminated: () => void;
};

Events emitted by a TTS stream.

Properties

Property	Type	Description
`audio`	(`chunk`, `timestamps?`) => `void`	Decoded audio chunk received. When `return_timestamps` is enabled, the second argument carries the character-level alignment for this frame (it is `undefined` for audio-only frames).
`audioEnd`	() => `void`	Server marked the final audio payload for this stream.
`error`	(`error`) => `void`	A stream-level error occurred. Always a RealtimeError subclass mapped from the server `error_code` / `error_message`.
`terminated`	() => `void`	Stream has been fully terminated by the server.

TtsStreamInput

type TtsStreamInput = {
  audio_format?: TtsAudioFormat;
  bitrate?: number;
  language?: string;
  model?: string;
  return_timestamps?: boolean;
  sample_rate?: number;
  speed?: number;
  stream_id?: string;
  voice?: string;
};

Input for creating a TTS stream. All fields are optional and are merged with tts_defaults from the resolved connection config. After merging, model, language, voice, and audio_format must be present.

Properties

Property	Type	Description
`audio_format?`	`TtsAudioFormat`	Output audio format Example `'wav'`
`bitrate?`	`number`	Codec bitrate in bps (for compressed formats).
`language?`	`string`	Language code for speech generation. Example `'en'`
`model?`	`string`	Text-to-Speech model to use. Example `'tts-rt-v1'`
`return_timestamps?`	`boolean`	Request character-level audio timestamps in the responses. When enabled, audio frames may carry a TtsTimestamps payload aligning each character of the spoken text to its start/end time in the audio. WebSocket (realtime) only — the REST endpoint streams raw audio bytes and ignores this flag. Timestamps map to the model's preprocessed text, not the raw input. Defaults to `false` when omitted.
`sample_rate?`	`number`	Output sample rate in Hz. Required for raw PCM formats.
`speed?`	`number`	Speaking rate. `1.0` is the normal rate; values below `1.0` slow speech down and values above `1.0` speed it up. Supported range is `0.7`-`1.3`. Defaults to `1.0` when omitted.
`stream_id?`	`string`	Client-generated stream identifier. Must be unique among active streams on the same connection. Auto-generated if omitted.
`voice?`	`string`	Voice identifier. Example `'Adrian'`

TtsStreamState

type TtsStreamState = "active" | "finishing" | "ended" | "error";

Lifecycle states for a TTS stream.

TtsVoice

type TtsVoice = {
  description: string;
  gender: TtsVoiceGender;
  id: string;
};

A Text-to-Speech voice.

Properties

Property	Type	Description
`description`	`string`	Human-readable voice description.
`gender`	`TtsVoiceGender`	Voice gender metadata.
`id`	`string`	Unique identifier of the voice.

TtsVoiceGender

type TtsVoiceGender = "male" | "female" | "neutral";

Voice gender metadata returned by the TTS models API.

TwoWayTranslation

type TwoWayTranslation = {
  duration_ms: number;
  language_a: string;
  language_b: string;
  mode: "two_way";
  segments: TranslationSegment[];
};

Result of a two-way translation ({ between } mode).

No flat original_text / translation_text strings are exposed because which side is "original" depends on the segment. Read segments and filter / format per from / to as needed.

Properties

Property	Type	Description
`duration_ms`	`number`	Total audio duration in milliseconds. Equals the largest `end_ms` across all original tokens, or `0` when there are no original tokens.
`language_a`	`string`	First configured language (the `between[0]` value).
`language_b`	`string`	Second configured language (the `between[1]` value).
`mode`	`"two_way"`	-
`segments`	`TranslationSegment`[]	Per-utterance segments in audio order.

TwoWayTranslationConfig

type TwoWayTranslationConfig = {
  language_a: string;
  language_b: string;
  type: "two_way";
};

Two-way translation configuration. Translates between two specified languages.

Properties

Property	Type	Description
`language_a`	`string`	First language code.
`language_b`	`string`	Second language code.
`type`	`"two_way"`	Translation type.

UploadFileInput

type UploadFileInput = 
  | Buffer
  | Uint8Array
  | Blob
  | ReadableStream<Uint8Array>
  | NodeJS.ReadableStream;

Supported input types for file upload

UploadFileOptions

type UploadFileOptions = {
  client_reference_id?: string;
  filename?: string;
  signal?: AbortSignal;
  timeout_ms?: number;
};

Options for uploading a file

Properties

Property	Type	Description
`client_reference_id?`	`string`	Optional tracking identifier string. Does not need to be unique Max Length 256
`filename?`	`string`	Custom filename for the uploaded file
`signal?`	`AbortSignal`	AbortSignal for cancelling the upload
`timeout_ms?`	`number`	Request timeout in milliseconds

UsageLogsSort

type UsageLogsSort = "end_time_asc" | "end_time_desc";

Sort order for usage logs.

VoiceIdentifier

type VoiceIdentifier = 
  | string
  | {
  id: string;
};

Voice identifier - either a string ID or an object with an id property.

VoiceModelStatus

type VoiceModelStatus = "not_computed" | "processing" | "ready" | "failed";

Processing status of a voice for a specific model.

not_computed: Not prepared for this model yet (e.g. the model was released after the voice was created). Call recompute to prepare it.
processing: Still being processed for this model. Wait and check again.
ready: Usable with this model.
failed: Processing failed permanently for this model. Fix the reference clip and create a new voice.

VoiceModelStatusEntry

type VoiceModelStatusEntry = {
  error_message?: string | null;
  error_type?: string | null;
  model: string;
  status: VoiceModelStatus;
};

Voice status for a single model.

Properties

Property	Type	Description
`error_message?`	`string` \| `null`	Human-readable error message when status is `failed` (e.g. the reference audio is too long). `null` otherwise.
`error_type?`	`string` \| `null`	Machine-readable error category when status is `failed`. Stable across releases — safe to use in control flow. `null` otherwise.
`model`	`string`	Name of the model.
`status`	`VoiceModelStatus`	Has to be `ready` for the voice to be usable with this model.

VoicesCountResponse

type VoicesCountResponse = {
  total: number;
};

Total number of voices in your project.

Properties

Property	Type	Description
`total`	`number`	Total number of voices in your project.

WaitOptions

type WaitOptions = {
  interval_ms?: number;
  on_status_change?: (status, transcription) => void;
  signal?: AbortSignal;
  timeout_ms?: number;
};

Options for polling/waiting for transcription completion.

Properties

Property	Type	Description
`interval_ms?`	`number`	Polling interval in milliseconds. Default `1000` Minimum 1000
`on_status_change?`	(`status`, `transcription`) => `void`	Callback invoked when status changes.
`signal?`	`AbortSignal`	AbortSignal to cancel waiting.
`timeout_ms?`	`number`	Maximum time to wait in milliseconds. Default `300000 (5 minutes)`

WebhookAuthConfig

type WebhookAuthConfig = {
  name: string;
  value: string;
};

Authentication configuration for webhook verification

Properties

Property	Type	Description
`name`	`string`	Expected header name (case-insensitive comparison)
`value`	`string`	Expected header value (exact match)

WebhookEvent

type WebhookEvent = {
  id: string;
  status: WebhookEventStatus;
};

Webhook event payload sent by Soniox when a transcription completes or fails.

Properties

Property	Type	Description
`id`	`string`	Transcription ID Format uuid
`status`	`WebhookEventStatus`	Transcription result status

WebhookEventStatus

type WebhookEventStatus = "completed" | "error";

Webhook event status values

WebhookHandlerResult

type WebhookHandlerResult = {
  error?: string;
  event?: WebhookEvent;
  ok: boolean;
  status: number;
};

Result of webhook handling

Properties

Property	Type	Description
`error?`	`string`	Error message (only present when ok=false)
`event?`	`WebhookEvent`	Parsed webhook event (only present when ok=true)
`ok`	`boolean`	Whether the webhook was handled successfully
`status`	`number`	HTTP status code to return

WebhookHandlerResultWithFetch

type WebhookHandlerResultWithFetch = WebhookHandlerResult & {
  fetchTranscript:   | () => Promise<ISonioxTranscript | null>
     | undefined;
  fetchTranscription:   | () => Promise<ISonioxTranscription | null>
     | undefined;
};

Result of webhook handling with lazy fetch capabilities.

When using client.webhooks.handleExpress() (or other framework handlers), the result includes helper methods to fetch the transcript or transcription.

Type Declaration

Name	Type	Description
`fetchTranscript`	\| () => `Promise`<`ISonioxTranscript` \| `null`> \| `undefined`	Fetch the transcript for a completed transcription. Only available when `ok=true` and `event.status='completed'`. Example `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'completed') { const transcript = await result.fetchTranscript(); console.log(transcript?.text); }`
`fetchTranscription`	\| () => `Promise`<`ISonioxTranscription` \| `null`> \| `undefined`	Fetch the full transcription object. Useful for both completed (metadata) and error (error details) statuses. Example `const result = soniox.webhooks.handleExpress(req); if (result.ok && result.event.status === 'error') { const transcription = await result.fetchTranscription(); console.log(transcription?.error_message); }`

WebhookHeaders

type WebhookHeaders = 
  | Headers
  | Record<string, string | string[] | undefined>
  | {
  get: string | null;
};

Headers object type - supports both standard headers and record types

HttpClient

Pluggable HTTP client interface

Methods

request()

request<T>(request): Promise<HttpResponse<T>>;

Perform an HTTP request

Type Parameters

Type Parameter
`T`

Parameters

Parameter	Type	Description
`request`	`HttpRequest`	Request configuration

Returns

Promise<HttpResponse<T>>

Promise resolving to the response

Throws

SonioxHttpError On network errors, timeouts, HTTP errors, or parse errors

HttpErrorDetails

Error details for SonioxHttpError

Properties

Property	Type	Description
`bodyText?`	`string`	Response body text (capped at 4KB)
`cause?`	`unknown`	-
`code`	`HttpErrorCode`	-
`headers?`	`Record`<`string`, `string`>	-
`message`	`string`	-
`method`	`HttpMethod`	-
`statusCode?`	`number`	-
`url`	`string`	-

HttpRequest

HTTP request configuration

Properties

Property	Type	Description
`body?`	`HttpRequestBody`	Request body
`headers?`	`Record`<`string`, `string`>	Request headers
`method`	`HttpMethod`	HTTP method
`path`	`string`	URL path (relative to baseUrl) or absolute URL
`query?`	`QueryParams`	Query parameters (will be URL-encoded)
`responseType?`	`HttpResponseType`	Expected response type Default `'json'`
`signal?`	`AbortSignal`	Optional AbortSignal for request cancellation If provided along with timeoutMs, both will be respected
`timeoutMs?`	`number`	Request timeout in milliseconds If not specified, uses the client's default timeout

HttpResponse<T>

HTTP response from the client

Type Parameters

Type Parameter
`T`

Properties

Property	Type	Description
`data`	`T`	Parsed response data
`headers`	`Record`<`string`, `string`>	Response headers (normalized to lowercase keys)
`status`	`number`	HTTP status code

ISonioxTranscript

Type contract for SonioxTranscript class.

See

SonioxTranscript for full documentation.

Methods

segments()

segments(options?): TranscriptSegment[];

Parameters

Parameter	Type
`options?`	`SegmentTranscriptOptions`

Returns

TranscriptSegment[]

Properties

Property	Type
`id`	`string`
`text`	`string`
`tokens`	`TranscriptToken`[]

ISonioxTranscription

Type contract for SonioxTranscription class.

See

SonioxTranscription for full documentation.

Extended by

ISonioxTranslationJob

Methods

delete()

delete(): Promise<void>;

Returns

Promise<void>

destroy()

destroy(): Promise<void>;

Returns

Promise<void>

getTranscript()

getTranscript(options?): Promise<ISonioxTranscript | null>;

Parameters

Parameter	Type
`options?`	{ `force?`: `boolean`; `signal?`: `AbortSignal`; }
`options.force?`	`boolean`
`options.signal?`	`AbortSignal`

Returns

Promise<ISonioxTranscript | null>

refresh()

refresh(signal?): Promise<ISonioxTranscription>;

Parameters

Parameter	Type
`signal?`	`AbortSignal`

Returns

Promise<ISonioxTranscription>

toJSON()

toJSON(): SonioxTranscriptionData;

Returns

SonioxTranscriptionData

wait()

wait(options?): Promise<ISonioxTranscription>;

Parameters

Parameter	Type
`options?`	`WaitOptions`

Returns

Promise<ISonioxTranscription>

Properties

Property	Type
`audio_duration_ms`	`number` \| `null` \| `undefined`
`audio_url`	`string` \| `null` \| `undefined`
`client_reference_id`	`string` \| `null` \| `undefined`
`context`	\| `TranscriptionContext` \| `null` \| `undefined`
`created_at`	`string`
`enable_language_identification`	`boolean`
`enable_speaker_diarization`	`boolean`
`error_message`	`string` \| `null` \| `undefined`
`error_type`	`string` \| `null` \| `undefined`
`file_id`	`string` \| `null` \| `undefined`
`filename`	`string`
`id`	`string`
`language_hints`	`string`[] \| `undefined`
`model`	`string`
`status`	`TranscriptionStatus`
`transcript`	`ISonioxTranscript` \| `null` \| `undefined`
`webhook_auth_header_name`	`string` \| `null` \| `undefined`
`webhook_auth_header_value`	`string` \| `null` \| `undefined`
`webhook_status_code`	`number` \| `null` \| `undefined`
`webhook_url`	`string` \| `null` \| `undefined`

ISonioxTranslationJob

Type contract for SonioxTranslationJob class.

Extends

ISonioxTranscription

Methods

delete()

delete(): Promise<void>;

Returns

Promise<void>

Inherited from

ISonioxTranscription.delete

destroy()

destroy(): Promise<void>;

Returns

Promise<void>

Inherited from

ISonioxTranscription.destroy

fetchTranslation()

fetchTranslation(options?): Promise<SonioxTranslation | null>;

Parameters

Parameter	Type
`options?`	{ `force?`: `boolean`; `signal?`: `AbortSignal`; }
`options.force?`	`boolean`
`options.signal?`	`AbortSignal`

Returns

Promise<SonioxTranslation | null>

getTranscript()

getTranscript(options?): Promise<ISonioxTranscript | null>;

Parameters

Parameter	Type
`options?`	{ `force?`: `boolean`; `signal?`: `AbortSignal`; }
`options.force?`	`boolean`
`options.signal?`	`AbortSignal`

Returns

Promise<ISonioxTranscript | null>

Inherited from

ISonioxTranscription.getTranscript

getTranslation()

getTranslation(options?): Promise<SonioxTranslation | null>;

Parameters

Parameter	Type
`options?`	{ `force?`: `boolean`; `signal?`: `AbortSignal`; }
`options.force?`	`boolean`
`options.signal?`	`AbortSignal`

Returns

Promise<SonioxTranslation | null>

refresh()

refresh(signal?): Promise<ISonioxTranslationJob>;

Parameters

Parameter	Type
`signal?`	`AbortSignal`

Returns

Promise<ISonioxTranslationJob>

Overrides

ISonioxTranscription.refresh

toJSON()

toJSON(): SonioxTranscriptionData;

Returns

SonioxTranscriptionData

Overrides

ISonioxTranscription.toJSON

wait()

wait(options?): Promise<ISonioxTranslationJob>;

Parameters

Parameter	Type
`options?`	`WaitOptions`

Returns

Promise<ISonioxTranslationJob>

Overrides

ISonioxTranscription.wait

Properties

Property	Type
`audio_duration_ms`	`number` \| `null` \| `undefined`
`audio_url`	`string` \| `null` \| `undefined`
`client_reference_id`	`string` \| `null` \| `undefined`
`context`	\| `TranscriptionContext` \| `null` \| `undefined`
`created_at`	`string`
`enable_language_identification`	`boolean`
`enable_speaker_diarization`	`boolean`
`error_message`	`string` \| `null` \| `undefined`
`error_type`	`string` \| `null` \| `undefined`
`file_id`	`string` \| `null` \| `undefined`
`filename`	`string`
`id`	`string`
`language_hints`	`string`[] \| `undefined`
`model`	`string`
`status`	`TranscriptionStatus`
`transcript`	`ISonioxTranscript` \| `null` \| `undefined`
`translation`	`SonioxTranslation` \| `null` \| `undefined`
`webhook_auth_header_name`	`string` \| `null` \| `undefined`
`webhook_auth_header_value`	`string` \| `null` \| `undefined`
`webhook_status_code`	`number` \| `null` \| `undefined`
`webhook_url`	`string` \| `null` \| `undefined`

translateFromTranscript()

function translateFromTranscript(transcript, mode): SonioxTranslation;

Reshape a transcript produced by a translation-enabled transcription into a structured SonioxTranslation result.

This is the same logic SonioxTranslationJob.getTranslation() applies. Use it directly in webhook handlers or anywhere else you already have a transcript in hand.

Parameters

Parameter	Type	Description
`transcript`	`TranscriptLike`	Transcript (or any object with a `tokens` array) emitted for a translation-enabled transcription.
`mode`	`TranslateFromTranscriptMode`	Whether to reshape as one-way or two-way; the discriminator tells the helper which result shape to produce.

Returns

SonioxTranslation

A SonioxTranslation keyed on mode.

Example

import { translateFromTranscript } from '@soniox/node';

// From a webhook handler that just received the transcript
const result = translateFromTranscript(transcript, { type: 'one_way', to: 'es' });
console.log(result.translation_text);

Types

On this page