Soniox

Types

Soniox Client SDK — Types Reference

Type Alias: ApiKeyConfig

type ApiKeyConfig = string | () => Promise<string>;

API key configuration.

  • string - A pre-fetched temporary API key (e.g., injected from SSR)
  • () => Promise<string> - An async function that fetches a fresh temporary key from your backend. Called once per recording session.

Deprecated

Use SonioxConnectionConfig with SonioxClientOptions.config instead.

Example

// Static key (for demos or SSR-injected keys)
const client = new SonioxClient({ api_key: 'temp:...' });

// Async function (recommended for production)
const client = new SonioxClient({
  api_key: async () => {
    const res = await fetch('/api/get-temporary-key', { method: 'POST' });
    const { api_key } = await res.json();
    return api_key;
  },
});

Note: If you use Node.js, you can use the SonioxNodeClient to fetch a temporary API key via client.auth.createTemporaryKey().


AudioErrorCode

type AudioErrorCode = "permission_denied" | "device_not_found" | "audio_unavailable";

Error codes for audio-related errors


AudioSourceHandlers

type AudioSourceHandlers = {
  onData: (chunk) => void;
  onError: (error) => void;
  onMuted?: () => void;
  onUnmuted?: () => void;
};

Callbacks for receiving audio data and errors from an AudioSource.

Properties

PropertyTypeDescription
onData(chunk) => voidCalled when an audio chunk is available.
onError(error) => voidCalled when a runtime error occurs during audio capture (after start).
onMuted?() => voidCalled when the audio source is muted externally (e.g. OS-level or hardware mute).
onUnmuted?() => voidCalled when the audio source is unmuted after an external mute.

GenerateSpeechOptions

type GenerateSpeechOptions = {
  audio_format?: string;
  bitrate?: number;
  language?: string;
  model?: string;
  sample_rate?: number;
  signal?: AbortSignal;
  text: string;
  voice: string;
};

Options for REST TTS generation (generate / generateStream).

Properties

PropertyTypeDescription
audio_format?stringOutput audio format Default 'wav'
bitrate?numberCodec bitrate in bps (for compressed formats).
language?stringLanguage code. Default 'en'
model?stringText-to-Speech model to use. Default 'tts-rt-v1-preview'
sample_rate?numberOutput sample rate in Hz. Required for raw PCM formats.
signal?AbortSignalOptional AbortSignal for cancellation.
textstringInput text to generate as speech.
voicestringVoice identifier.

HttpErrorCode

type HttpErrorCode = "network_error" | "timeout" | "aborted" | "http_error" | "parse_error";

Error codes for HTTP client errors


HttpMethod

type HttpMethod = "GET" | "POST" | "PUT" | "PATCH" | "DELETE" | "HEAD";

HTTP methods supported by the client


MicrophoneSourceOptions

type MicrophoneSourceOptions = {
  constraints?: MediaTrackConstraints;
  recorderOptions?: MediaRecorderOptions;
  timesliceMs?: number;
};

Options for MicrophoneSource

Properties

PropertyTypeDescription
constraints?MediaTrackConstraintsMediaTrackConstraints for the audio track. Default { echoCancellation: false, noiseSuppression: false, autoGainControl: false, channelCount: 1, sampleRate: 16000 }
recorderOptions?MediaRecorderOptionsMediaRecorder options. See https://developer.mozilla.org/en-US/docs/Web/API/MediaRecorder/MediaRecorder
timesliceMs?numberTime interval in milliseconds between audio data chunks. Default 60

PermissionResult

type PermissionResult = {
  can_request: boolean;
  status: PermissionStatus;
};

Result of a permission check or request.

Properties

PropertyTypeDescription
can_requestbooleanWhether the user can be prompted again. false means permanently denied (e.g., browser "Block" or iOS settings). Useful for showing "go to settings" instructions.
statusPermissionStatusCurrent permission status.

PermissionStatus

type PermissionStatus = "granted" | "denied" | "prompt" | "unavailable";

Unified permission status across all platforms.


PermissionType

type PermissionType = "microphone";

Permission types supported by the resolver.


RecordOptions

type RecordOptions = SttSessionConfig & ReconnectOptions & {
  buffer_queue_size?: number;
  session_config?: (resolved) => SttSessionConfig;
  session_options?: SttSessionOptions;
  signal?: AbortSignal;
  source?: AudioSource;
};

Options for creating a recording

Type Declaration

NameTypeDescription
buffer_queue_size?numberMaximum number of audio chunks to buffer while waiting for key/connection Default 1000
session_config()?(resolved) => SttSessionConfigFunction that receives the resolved connection config (including stt_defaults from the server) and returns the final session config. When provided, its return value is used as the session config, and any flat session config fields on this object are ignored. Example client.realtime.record({ session_config: (resolved) => ({ ...resolved.stt_defaults, enable_endpoint_detection: true, }), });
session_options?SttSessionOptionsSDK-level session options (signal, etc.)
signal?AbortSignalAbortSignal for cancellation
source?AudioSourceAudio source to use. Defaults to MicrophoneSource if not provided.

RecordingEvents

type RecordingEvents = {
  connected: () => void;
  endpoint: () => void;
  error: (error) => void;
  finalized: () => void;
  finished: () => void;
  reconnected: (event) => void;
  reconnecting: (event) => void;
  result: (result) => void;
  session_restart: (event) => void;
  source_muted: () => void;
  source_unmuted: () => void;
  state_change: (update) => void;
  token: (token) => void;
};

Events emitted by a Recording instance

Properties

PropertyTypeDescription
connected() => voidWebSocket connected and ready.
endpoint() => voidEndpoint detected (speaker finished talking).
error(error) => voidError occurred during recording.
finalized() => voidFinalization complete.
finished() => voidRecording finished (server acknowledged end of stream).
reconnected(event) => voidSuccessfully reconnected after a drop.
reconnecting(event) => voidAbout to attempt a reconnection. Call preventDefault() to cancel.
result(result) => voidParsed result received from the server.
session_restart(event) => voidNew STT session started (initial or after reconnect). Consumers should reset any session-local tracking state (e.g. token window comparisons). The reset_transcript flag indicates whether accumulated transcript state should also be cleared.
source_muted() => voidAudio source was muted externally (e.g. OS-level or hardware mute).
source_unmuted() => voidAudio source was unmuted after an external mute.
state_change(update) => voidRecording state transition.
token(token) => voidIndividual token received.

RecordingState

type RecordingState = 
  | "idle"
  | "starting"
  | "connecting"
  | "recording"
  | "paused"
  | "reconnecting"
  | "stopping"
  | "stopped"
  | "error"
  | "canceled";

Unified recording lifecycle states.


SonioxClientOptions

type SonioxClientOptions = {
  api_key?: ApiKeyConfig;
  buffer_queue_size?: number;
  config?:   | SonioxConnectionConfig
     | (context?) => Promise<SonioxConnectionConfig>;
  default_session_options?: SttSessionOptions;
  permissions?: PermissionResolver;
  ws_base_url?: string;
};

Options for creating a SonioxClient instance.

Properties

PropertyTypeDescription
api_key?ApiKeyConfigAPI key configuration. - string - A pre-fetched temporary API key (e.g., injected from SSR) - () => Promise<string> - Async function that fetches a fresh key from your backend Deprecated Use config instead.
buffer_queue_size?numberDefault maximum number of audio chunks to buffer while waiting for key/connection. Can be overridden per-recording. Default 1000
config?| SonioxConnectionConfig | (context?) => Promise<SonioxConnectionConfig>Connection configuration — sync object or async function. When provided as a function, it is called once per recording session, allowing you to fetch a fresh temporary API key and connection settings from your backend at runtime. Example // Sync config with region const client = new SonioxClient({ config: { api_key: tempKey, region: 'eu' }, }); // Async config (recommended for production) const client = new SonioxClient({ config: async () => { const res = await fetch('/api/soniox-config', { method: 'POST' }); return await res.json(); // { api_key, region, ... } }, });
default_session_options?SttSessionOptionsDefault session options applied to all sessions. Can be overridden per-recording.
permissions?PermissionResolverOptional permission resolver for pre-flight microphone permission checks. Not set by default (SSR-safe, RN-safe). Example import { BrowserPermissionResolver } from '@soniox/client'; const client = new SonioxClient({ config: { api_key: tempKey }, permissions: new BrowserPermissionResolver(), });
ws_base_url?stringWebSocket URL for real-time connections. Default 'wss://stt-rt.soniox.com/transcribe-websocket' Deprecated Use config.stt_ws_url or config.region instead.

SttOptions

type SttOptions = {
  api_key: string;
  session_options?: SttSessionOptions;
};

Options for creating a low-level STT session.

Properties

PropertyTypeDescription
api_keystringResolved API key string (temporary key).
session_options?SttSessionOptionsSession options (signal, etc.).

TtsAudioFormat

type TtsAudioFormat = 
  | "pcm_f32le"
  | "pcm_s16le"
  | "pcm_mulaw"
  | "pcm_alaw"
  | "wav"
  | "aac"
  | "mp3"
  | "opus"
  | "flac"
  | string & {
};

Supported audio formats for Text-to-Speech output.


TtsConnectionEvents

type TtsConnectionEvents = {
  close: () => void;
  error: (error) => void;
};

Events emitted by a TTS WebSocket connection.

Properties

PropertyTypeDescription
close() => voidThe WebSocket connection was closed.
error(error) => voidA connection-level error occurred. Always a RealtimeError subclass (e.g. ConnectionError, NetworkError, AuthError).

TtsConnectionOptions

type TtsConnectionOptions = {
  connect_timeout_ms?: number;
  keepalive_interval_ms?: number;
};

Options for creating a TTS connection.

Properties

PropertyTypeDescription
connect_timeout_ms?numberMaximum time to wait for the WebSocket connection to open (milliseconds). Default 20000
keepalive_interval_ms?numberInterval for sending keepalive messages (milliseconds). Default 5000 Minimum 1000

TtsStreamConfig

type TtsStreamConfig = {
  audio_format: string;
  bitrate?: number;
  language: string;
  model: string;
  sample_rate?: number;
  stream_id: string;
  voice: string;
};

Fully resolved TTS stream config sent over the WebSocket. All required fields are present after merging input with defaults.

Properties

PropertyType
audio_formatstring
bitrate?number
languagestring
modelstring
sample_rate?number
stream_idstring
voicestring

TtsStreamEvents

type TtsStreamEvents = {
  audio: (chunk) => void;
  audioEnd: () => void;
  error: (error) => void;
  terminated: () => void;
};

Events emitted by a TTS stream.

Properties

PropertyTypeDescription
audio(chunk) => voidDecoded audio chunk received.
audioEnd() => voidServer marked the final audio payload for this stream.
error(error) => voidA stream-level error occurred. Always a RealtimeError subclass mapped from the server error_code / error_message.
terminated() => voidStream has been fully terminated by the server.

TtsStreamInput

type TtsStreamInput = {
  audio_format?: TtsAudioFormat;
  bitrate?: number;
  language?: string;
  model?: string;
  sample_rate?: number;
  stream_id?: string;
  voice?: string;
};

Input for creating a TTS stream. All fields are optional and are merged with tts_defaults from the resolved connection config. After merging, model, language, voice, and audio_format must be present.

Properties

PropertyTypeDescription
audio_format?TtsAudioFormatOutput audio format Example 'wav'
bitrate?numberCodec bitrate in bps (for compressed formats).
language?stringLanguage code for speech generation. Example 'en'
model?stringText-to-Speech model to use. Example 'tts-rt-v1-preview'
sample_rate?numberOutput sample rate in Hz. Required for raw PCM formats.
stream_id?stringClient-generated stream identifier. Must be unique among active streams on the same connection. Auto-generated if omitted.
voice?stringVoice identifier. Example 'Adrian'

TtsStreamState

type TtsStreamState = "active" | "finishing" | "ended" | "error";

Lifecycle states for a TTS stream.


AudioSource

Platform-agnostic audio source interface.

Implementations must:

  • Begin capturing audio in start() and deliver chunks via handlers.onData
  • Stop all capture and release resources in stop()
  • Throw typed errors from start() if capture cannot begin (e.g., permission denied)

Example

// Built-in browser source
const source = new MicrophoneSource();

// Custom source (e.g., React Native)
class MyAudioSource implements AudioSource {
  async start(handlers: AudioSourceHandlers) { ... }
  stop() { ... }
}

Methods

pause()?

optional pause(): void;

Pause audio capture (optional). When paused, no data should be delivered via onData.

Returns

void


restart()?

optional restart(): void;

Reinitialize the audio encoder without releasing the underlying capture device (optional).

Called during reconnection so the new server session receives a fresh audio stream with proper container headers. Implementations that produce a header-less format (e.g. raw PCM) can omit this.

Returns

void


resume()?

optional resume(): void;

Resume audio capture after pause (optional).

Returns

void


start()

start(handlers): Promise<void>;

Start capturing audio.

Parameters

ParameterTypeDescription
handlersAudioSourceHandlersCallbacks for audio data and errors

Returns

Promise<void>

Throws

AudioPermissionError if microphone access is denied

Throws

AudioDeviceError if no audio device is found

Throws

AudioUnavailableError if audio capture is not supported


stop()

stop(): void;

Stop capturing audio and release all resources. Safe to call multiple times.

Returns

void


ClientTtsFactory()

Callable TTS factory with .multiStream() for multi-stream connections.

ClientTtsFactory(input?): Promise<RealtimeTtsStream>;

Callable TTS factory with .multiStream() for multi-stream connections.

Parameters

ParameterType
input?TtsStreamInput

Returns

Promise<RealtimeTtsStream>

Methods

multiStream()

multiStream(): Promise<RealtimeTtsConnection>;

Returns

Promise<RealtimeTtsConnection>


HttpErrorDetails

Error details for SonioxHttpError

Properties

PropertyTypeDescription
bodyText?stringResponse body text (capped at 4KB)
cause?unknown-
codeHttpErrorCode-
headers?Record<string, string>-
messagestring-
methodHttpMethod-
statusCode?number-
urlstring-

PermissionResolver

Platform-agnostic permission resolver.

Implementations handle platform-specific permission APIs:

  • Browser: navigator.permissions.query + getUserMedia
  • React Native: expo-av or react-native-permissions

Example

// Check before recording
const mic = await resolver.check('microphone');
if (mic.status === 'denied' && !mic.can_request) {
  showGoToSettingsMessage();
}

Methods

check()

check(permission): Promise<PermissionResult>;

Check current permission status WITHOUT prompting the user.

Parameters

ParameterType
permission"microphone"

Returns

Promise<PermissionResult>


request()

request(permission): Promise<PermissionResult>;

Request permission from the user (may show a system prompt). On platforms where status is already 'granted', this is a no-op.

Parameters

ParameterType
permission"microphone"

Returns

Promise<PermissionResult>


Function: resolveApiKey()

function resolveApiKey(config): Promise<string>;

Resolves an ApiKeyConfig to a plain API key string.

Parameters

ParameterTypeDescription
configApiKeyConfigThe API key configuration

Returns

Promise<string>

The resolved API key string

Throws

If the function rejects or returns a non-string value

Deprecated

Use SonioxConnectionConfig with SonioxClientOptions.config instead.