Types

ApiKeyConfig

type ApiKeyConfig = string | () => Promise<string>;

API key configuration.

string - A pre-fetched temporary API key (e.g., injected from SSR)
() => Promise<string> - An async function that fetches a fresh temporary key from your backend. Called once per recording session.

Deprecated

Use SonioxConnectionConfig with SonioxClientOptions.config instead.

Example

// Static key (for demos or SSR-injected keys)
const client = new SonioxClient({ api_key: 'temp:...' });

// Async function (recommended for production)
const client = new SonioxClient({
  api_key: async () => {
    const res = await fetch('/api/get-temporary-key', { method: 'POST' });
    const { api_key } = await res.json();
    return api_key;
  },
});

Note: If you use Node.js, you can use the SonioxNodeClient to fetch a temporary API key via client.auth.createTemporaryKey().

AudioSourceHandlers

type AudioSourceHandlers = {
  onData: (chunk) => void;
  onError: (error) => void;
  onMuted?: () => void;
  onUnmuted?: () => void;
};

Callbacks for receiving audio data and errors from an AudioSource.

Properties

Property	Type	Description
`onData`	(`chunk`) => `void`	Called when an audio chunk is available.
`onError`	(`error`) => `void`	Called when a runtime error occurs during audio capture (after start).
`onMuted?`	() => `void`	Called when the audio source is muted externally (e.g. OS-level or hardware mute).
`onUnmuted?`	() => `void`	Called when the audio source is unmuted after an external mute.

GenerateSpeechOptions

type GenerateSpeechOptions = {
  audio_format?: string;
  bitrate?: number;
  language?: string;
  model?: string;
  sample_rate?: number;
  signal?: AbortSignal;
  text: string;
  voice: string;
};

Options for REST TTS generation (generate / generateStream).

Properties

Property	Type	Description
`audio_format?`	`string`	Output audio format Default `'wav'`
`bitrate?`	`number`	Codec bitrate in bps (for compressed formats).
`language?`	`string`	Language code. Default `'en'`
`model?`	`string`	Text-to-Speech model to use. Default `'tts-rt-v1'`
`sample_rate?`	`number`	Output sample rate in Hz. Required for raw PCM formats.
`signal?`	`AbortSignal`	Optional AbortSignal for cancellation.
`text`	`string`	Input text to generate as speech.
`voice`	`string`	Voice identifier.

MicrophoneSourceOptions

type MicrophoneSourceOptions = {
  constraints?: MediaTrackConstraints;
  recorderOptions?: MediaRecorderOptions;
  timesliceMs?: number;
};

Options for MicrophoneSource

Properties

Property	Type	Description
`constraints?`	`MediaTrackConstraints`	MediaTrackConstraints for the audio track. Default `{ echoCancellation: false, noiseSuppression: false, autoGainControl: false, channelCount: 1, sampleRate: 16000 }`
`recorderOptions?`	`MediaRecorderOptions`	MediaRecorder options. See https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/MediaRecorder
`timesliceMs?`	`number`	Time interval in milliseconds between audio data chunks. Default `60`

PermissionResult

type PermissionResult = {
  can_request: boolean;
  status: PermissionStatus;
};

Result of a permission check or request.

Properties

Property	Type	Description
`can_request`	`boolean`	Whether the user can be prompted again. `false` means permanently denied (e.g., browser "Block" or iOS settings). Useful for showing "go to settings" instructions.
`status`	`PermissionStatus`	Current permission status.

PermissionStatus

type PermissionStatus = "granted" | "denied" | "prompt" | "unavailable";

Unified permission status across all platforms.

PermissionType

type PermissionType = "microphone";

Permission types supported by the resolver.

RecordOptions

type RecordOptions = SttSessionConfig & ReconnectOptions & {
  buffer_queue_size?: number;
  session_config?: (resolved) => SttSessionConfig;
  session_options?: SttSessionOptions;
  signal?: AbortSignal;
  source?: AudioSource;
};

Options for creating a recording

Type Declaration

Name	Type	Description
`buffer_queue_size?`	`number`	Maximum number of audio chunks to buffer while waiting for key/connection Default `1000`
`session_config()?`	(`resolved`) => `SttSessionConfig`	Function that receives the resolved connection config (including `stt_defaults` from the server) and returns the final session config. When provided, its return value is used as the session config, and any flat session config fields on this object are ignored. Example `client.realtime.record({ session_config: (resolved) => ({ ...resolved.stt_defaults, enable_endpoint_detection: true, }), });`
`session_options?`	`SttSessionOptions`	SDK-level session options (signal, etc.)
`signal?`	`AbortSignal`	AbortSignal for cancellation
`source?`	`AudioSource`	Audio source to use. Defaults to MicrophoneSource if not provided.

RecordingEvents

type RecordingEvents = {
  connected: () => void;
  endpoint: () => void;
  error: (error) => void;
  finalized: () => void;
  finished: () => void;
  reconnected: (event) => void;
  reconnecting: (event) => void;
  result: (result) => void;
  session_restart: (event) => void;
  source_muted: () => void;
  source_unmuted: () => void;
  state_change: (update) => void;
  token: (token) => void;
};

Events emitted by a Recording instance

Properties

Property	Type	Description
`connected`	() => `void`	WebSocket connected and ready.
`endpoint`	() => `void`	Endpoint detected (speaker finished talking).
`error`	(`error`) => `void`	Error occurred during recording.
`finalized`	() => `void`	Finalization complete.
`finished`	() => `void`	Recording finished (server acknowledged end of stream).
`reconnected`	(`event`) => `void`	Successfully reconnected after a drop.
`reconnecting`	(`event`) => `void`	About to attempt a reconnection. Call `preventDefault()` to cancel.
`result`	(`result`) => `void`	Parsed result received from the server.
`session_restart`	(`event`) => `void`	New STT session started (initial or after reconnect). Consumers should reset any session-local tracking state (e.g. token window comparisons). The `reset_transcript` flag indicates whether accumulated transcript state should also be cleared.
`source_muted`	() => `void`	Audio source was muted externally (e.g. OS-level or hardware mute).
`source_unmuted`	() => `void`	Audio source was unmuted after an external mute.
`state_change`	(`update`) => `void`	Recording state transition.
`token`	(`token`) => `void`	Individual token received.

SonioxClientOptions

type SonioxClientOptions = {
  api_key?: ApiKeyConfig;
  buffer_queue_size?: number;
  config?:   | SonioxConnectionConfig
     | (context?) => Promise<SonioxConnectionConfig>;
  default_session_options?: SttSessionOptions;
  permissions?: PermissionResolver;
  ws_base_url?: string;
};

Options for creating a SonioxClient instance.

Properties

Property	Type	Description
~~`api_key?`~~	`ApiKeyConfig`	API key configuration. - `string` - A pre-fetched temporary API key (e.g., injected from SSR) - `() => Promise<string>` - Async function that fetches a fresh key from your backend Deprecated Use `config` instead.
`buffer_queue_size?`	`number`	Default maximum number of audio chunks to buffer while waiting for key/connection. Can be overridden per-recording. Default `1000`
`config?`	\| `SonioxConnectionConfig` \| (`context?`) => `Promise`<`SonioxConnectionConfig`>	Connection configuration — sync object or async function. When provided as a function, it is called once per recording session, allowing you to fetch a fresh temporary API key and connection settings from your backend at runtime. Example `// Sync config with region const client = new SonioxClient({ config: { api_key: tempKey, region: 'eu' }, }); // Async config (recommended for production) const client = new SonioxClient({ config: async () => { const res = await fetch('/api/soniox-config', { method: 'POST' }); return await res.json(); // { api_key, region, ... } }, });`
`default_session_options?`	`SttSessionOptions`	Default session options applied to all sessions. Can be overridden per-recording.
`permissions?`	`PermissionResolver`	Optional permission resolver for pre-flight microphone permission checks. Not set by default (SSR-safe, RN-safe). Example `import { BrowserPermissionResolver } from '@soniox/client'; const client = new SonioxClient({ config: { api_key: tempKey }, permissions: new BrowserPermissionResolver(), });`
~~`ws_base_url?`~~	`string`	WebSocket URL for real-time connections. Default `'wss://stt-rt.soniox.com/transcribe-websocket'` Deprecated Use `config.stt_ws_url` or `config.region` instead.

SttOptions

type SttOptions = {
  api_key: string;
  session_options?: SttSessionOptions;
};

Options for creating a low-level STT session.

Properties

Property	Type	Description
`api_key`	`string`	Resolved API key string (temporary key).
`session_options?`	`SttSessionOptions`	Session options (signal, etc.).

TtsAudioFormat

type TtsAudioFormat = 
  | "pcm_f32le"
  | "pcm_s16le"
  | "pcm_mulaw"
  | "pcm_alaw"
  | "wav"
  | "aac"
  | "mp3"
  | "opus"
  | "flac"
  | string & {
};

Supported audio formats for Text-to-Speech output.

TtsConnectionEvents

type TtsConnectionEvents = {
  close: () => void;
  error: (error) => void;
};

Events emitted by a TTS WebSocket connection.

Properties

Property	Type	Description
`close`	() => `void`	The WebSocket connection was closed.
`error`	(`error`) => `void`	A connection-level error occurred. Always a RealtimeError subclass (e.g. ConnectionError, NetworkError, AuthError).

TtsConnectionOptions

type TtsConnectionOptions = {
  connect_timeout_ms?: number;
  keepalive_interval_ms?: number;
};

Options for creating a TTS connection.

Properties

Property	Type	Description
`connect_timeout_ms?`	`number`	Maximum time to wait for the WebSocket connection to open (milliseconds). Default `20000`
`keepalive_interval_ms?`	`number`	Interval for sending keepalive messages (milliseconds). Default `5000` Minimum 1000

TtsLanguage

type TtsLanguage = {
  code: string;
  name: string;
};

A language supported by a Text-to-Speech model.

Properties

Property	Type	Description
`code`	`string`	ISO language code.
`name`	`string`	Human-readable language name.

TtsModel

type TtsModel = {
  aliased_model_id?: string | null;
  id: string;
  languages: TtsLanguage[];
  name: string;
  voices: TtsVoice[];
};

A Text-to-Speech model.

Properties

Property	Type	Description
`aliased_model_id?`	`string` \| `null`	If this is an alias, the id of the aliased model.
`id`	`string`	Unique identifier of the model.
`languages`	`TtsLanguage`[]	Languages supported by this model.
`name`	`string`	Name of the model.
`voices`	`TtsVoice`[]	Voices supported by this model.

TtsStreamConfig

type TtsStreamConfig = {
  audio_format: string;
  bitrate?: number;
  language: string;
  model: string;
  sample_rate?: number;
  stream_id: string;
  voice: string;
};

Fully resolved TTS stream config sent over the WebSocket. All required fields are present after merging input with defaults.

Properties

Property	Type
`audio_format`	`string`
`bitrate?`	`number`
`language`	`string`
`model`	`string`
`sample_rate?`	`number`
`stream_id`	`string`
`voice`	`string`

TtsStreamEvents

type TtsStreamEvents = {
  audio: (chunk) => void;
  audioEnd: () => void;
  error: (error) => void;
  terminated: () => void;
};

Events emitted by a TTS stream.

Properties

Property	Type	Description
`audio`	(`chunk`) => `void`	Decoded audio chunk received.
`audioEnd`	() => `void`	Server marked the final audio payload for this stream.
`error`	(`error`) => `void`	A stream-level error occurred. Always a RealtimeError subclass mapped from the server `error_code` / `error_message`.
`terminated`	() => `void`	Stream has been fully terminated by the server.

TtsStreamInput

type TtsStreamInput = {
  audio_format?: TtsAudioFormat;
  bitrate?: number;
  language?: string;
  model?: string;
  sample_rate?: number;
  stream_id?: string;
  voice?: string;
};

Input for creating a TTS stream. All fields are optional and are merged with tts_defaults from the resolved connection config. After merging, model, language, voice, and audio_format must be present.

Properties

Property	Type	Description
`audio_format?`	`TtsAudioFormat`	Output audio format Example `'wav'`
`bitrate?`	`number`	Codec bitrate in bps (for compressed formats).
`language?`	`string`	Language code for speech generation. Example `'en'`
`model?`	`string`	Text-to-Speech model to use. Example `'tts-rt-v1'`
`sample_rate?`	`number`	Output sample rate in Hz. Required for raw PCM formats.
`stream_id?`	`string`	Client-generated stream identifier. Must be unique among active streams on the same connection. Auto-generated if omitted.
`voice?`	`string`	Voice identifier. Example `'Adrian'`

TtsStreamState

type TtsStreamState = "active" | "finishing" | "ended" | "error";

Lifecycle states for a TTS stream.

TtsVoice

type TtsVoice = {
  description: string;
  gender: TtsVoiceGender;
  id: string;
};

A Text-to-Speech voice.

Properties

Property	Type	Description
`description`	`string`	Human-readable voice description.
`gender`	`TtsVoiceGender`	Voice gender metadata.
`id`	`string`	Unique identifier of the voice.

TtsVoiceGender

type TtsVoiceGender = "male" | "female" | "neutral";

Voice gender metadata returned by the TTS models API.

AudioSource

Platform-agnostic audio source interface.

Implementations must:

Begin capturing audio in start() and deliver chunks via handlers.onData
Stop all capture and release resources in stop()
Throw typed errors from start() if capture cannot begin (e.g., permission denied)

Example

// Built-in browser source
const source = new MicrophoneSource();

// Custom source (e.g., React Native)
class MyAudioSource implements AudioSource {
  async start(handlers: AudioSourceHandlers) { ... }
  stop() { ... }
}

Methods

pause()?

optional pause(): void;

Pause audio capture (optional). When paused, no data should be delivered via onData.

Returns

void

restart()?

optional restart(): void;

Reinitialize the audio encoder without releasing the underlying capture device (optional).

Called during reconnection so the new server session receives a fresh audio stream with proper container headers. Implementations that produce a header-less format (e.g. raw PCM) can omit this.

Returns

void

resume()?

optional resume(): void;

Resume audio capture after pause (optional).

Returns

void

start()

start(handlers): Promise<void>;

Start capturing audio.

Parameters

Parameter	Type	Description
`handlers`	`AudioSourceHandlers`	Callbacks for audio data and errors

Returns

Promise<void>

Throws

AudioPermissionError if microphone access is denied

Throws

AudioDeviceError if no audio device is found

Throws

AudioUnavailableError if audio capture is not supported

stop()

stop(): void;

Stop capturing audio and release all resources. Safe to call multiple times.

Returns

void

ClientTtsFactory()

Callable TTS factory with .multiStream() for multi-stream connections.

ClientTtsFactory(input?): Promise<RealtimeTtsStream>;

Callable TTS factory with .multiStream() for multi-stream connections.

Parameters

Parameter	Type
`input?`	`TtsStreamInput`

Returns

Promise<RealtimeTtsStream>

Methods

multiStream()

multiStream(): Promise<RealtimeTtsConnection>;

Returns

Promise<RealtimeTtsConnection>

HttpErrorDetails

Error details for SonioxHttpError

Properties

Property	Type	Description
`bodyText?`	`string`	Response body text (capped at 4KB)
`cause?`	`unknown`	-
`code`	`HttpErrorCode`	-
`headers?`	`Record`<`string`, `string`>	-
`message`	`string`	-
`method`	`HttpMethod`	-
`statusCode?`	`number`	-
`url`	`string`	-

PermissionResolver

Platform-agnostic permission resolver.

Implementations handle platform-specific permission APIs:

Browser: navigator.permissions.query + getUserMedia
React Native: expo-av or react-native-permissions

Example

// Check before recording
const mic = await resolver.check('microphone');
if (mic.status === 'denied' && !mic.can_request) {
  showGoToSettingsMessage();
}

Methods

check()

check(permission): Promise<PermissionResult>;

Check current permission status WITHOUT prompting the user.

Parameters

Parameter	Type
`permission`	`"microphone"`

Returns

Promise<PermissionResult>

request()

request(permission): Promise<PermissionResult>;

Request permission from the user (may show a system prompt). On platforms where status is already 'granted', this is a no-op.

Parameters

Parameter	Type
`permission`	`"microphone"`

Returns

Promise<PermissionResult>

resolveApiKey()

function resolveApiKey(config): Promise<string>;

Resolves an ApiKeyConfig to a plain API key string.

Parameters

Parameter	Type	Description
`config`	`ApiKeyConfig`	The API key configuration

Returns

Promise<string>

The resolved API key string

Throws

If the function rejects or returns a non-string value

Deprecated

Use SonioxConnectionConfig with SonioxClientOptions.config instead.

Types

ApiKeyConfig

AudioErrorCode

AudioSourceHandlers

GenerateSpeechOptions

HttpErrorCode

HttpMethod

MicrophoneSourceOptions

PermissionResult

PermissionStatus

PermissionType

RecordOptions

RecordingEvents

RecordingState

SonioxClientOptions

SttOptions

TtsAudioFormat

TtsConnectionEvents

TtsConnectionOptions

TtsLanguage

TtsModel

TtsStreamConfig

TtsStreamEvents

TtsStreamInput

TtsStreamState

TtsVoice

TtsVoiceGender

AudioSource

ClientTtsFactory()

HttpErrorDetails

PermissionResolver

resolveApiKey()

On this page