Types

SonioxProviderProps

type SonioxProviderProps = {
  children: ReactNode;
} & SonioxProviderConfigProps | SonioxProviderClientProps;

Props for SonioxProvider.

Supply either a pre-built client instance or configuration props

Type Declaration

Name	Type
`children`	`ReactNode`

UnsupportedReason

type UnsupportedReason = "ssr" | "no-mediadevices" | "no-getusermedia" | "insecure-context";

Reason why the built-in browser MicrophoneSource is unavailable:

'ssr' — navigator is undefined (SSR, React Native, or other non-browser JS runtimes).
'no-mediadevices' — navigator exists but navigator.mediaDevices is missing.
'no-getusermedia' — navigator.mediaDevices exists but getUserMedia is not a function.
'insecure-context' — the page is not served over HTTPS.

This only reflects whether the default MicrophoneSource can work. Custom AudioSource implementations (e.g. for React Native) bypass this check entirely and can record regardless of this value.

AudioLevelProps

Extends

UseAudioLevelOptions

Properties

Property	Type	Description
`active?`	`boolean`	Whether volume metering is active. When false, resources are released.
`bands?`	`number`	Number of frequency bands to return. When set, the `bands` array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations.
`children`	(`state`) => `ReactNode`	-
`fftSize?`	`number`	FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default `256`
`smoothing?`	`number`	Exponential smoothing factor (0-1). Higher = smoother/slower decay. Default `0.85`

AudioSupportResult

Properties

Property	Type
`isSupported`	`boolean`
`reason?`	`UnsupportedReason`

MicrophonePermissionState

Properties

Property	Type	Description
`canRequest`	`boolean`	Whether the permission can be requested (e.g., via a prompt).
`check`	() => `Promise`<`void`>	Check (or re-check) the microphone permission. No-op when unsupported.
`isDenied`	`boolean`	`status === 'denied'`.
`isGranted`	`boolean`	`status === 'granted'`.
`isSupported`	`boolean`	Whether permission checking is available.
`status`	`MicPermissionStatus`	Current permission status.

RecordingSnapshot

Immutable snapshot of the recording state exposed to React.

Extended by

UseRecordingReturn

Properties

Property	Type	Description
`error`	`Error` \| `null`	Latest error, if any.
`finalText`	`string`	Accumulated finalized text.
`finalTokens`	readonly `RealtimeToken`[]	All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with `partialTokens` for the complete ordered stream.
`groups`	`Readonly`<`Record`<`string`, `TokenGroup`>>	Tokens grouped by the active `groupBy` strategy. Auto-populated when `translation` config is provided: - `one_way` → keys: `"original"`, `"translation"` - `two_way` → keys: language codes (e.g. `"en"`, `"es"`) Empty `{}` when no grouping is active.
`isActive`	`boolean`	`true` when state is not idle/stopped/canceled/error.
`isPaused`	`boolean`	`true` when `state === 'paused'`.
`isReconnecting`	`boolean`	`true` when `state === 'reconnecting'`.
`isRecording`	`boolean`	`true` when `state === 'recording'`.
`isSourceMuted`	`boolean`	`true` when the audio source is muted externally (e.g. OS-level or hardware mute).
`partialText`	`string`	Text from current non-final tokens.
`partialTokens`	readonly `RealtimeToken`[]	Non-final tokens from the latest result.
`reconnectAttempt`	`number`	Current reconnection attempt number (0 when not reconnecting).
`result`	`RealtimeResult` \| `null`	Latest raw result from the server.
`segments`	readonly `RealtimeSegment`[]	Accumulated final segments.
`state`	`RecordingState`	Current recording lifecycle state.
`text`	`string`	Full transcript: `finalText + partialText`.
`tokens`	readonly `RealtimeToken`[]	Tokens from the latest result message.
`utterances`	readonly `RealtimeUtterance`[]	Accumulated utterances (one per endpoint).

TtsSnapshot

Immutable snapshot of the TTS state exposed to React.

Extended by

UseTtsReturn

Properties

Property	Type
`error`	`Error` \| `null`
`isConnecting`	`boolean`
`isSpeaking`	`boolean`
`state`	`TtsState`

UseAudioLevelOptions

Extended by

AudioLevelProps

Properties

Property	Type	Description
`active?`	`boolean`	Whether volume metering is active. When false, resources are released.
`bands?`	`number`	Number of frequency bands to return. When set, the `bands` array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations.
`fftSize?`	`number`	FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default `256`
`smoothing?`	`number`	Exponential smoothing factor (0-1). Higher = smoother/slower decay. Default `0.85`

UseAudioLevelReturn

Properties

Property	Type	Description
`bands`	readonly `number`[]	Per-band frequency levels, each 0-1. Empty array when the `bands` option is not set.
`volume`	`number`	Current volume level, 0 to 1. Updated every animation frame.

UseMicrophonePermissionOptions

Properties

Property	Type	Description
`autoCheck?`	`boolean`	Automatically check permission on mount.

UseRecordingConfig

Configuration for useRecording.

Extends the STT session config (model, language_hints, etc.) with recording-specific and React-specific options.

Can be used with or without a <SonioxProvider>:

With Provider: omit config/apiKey — the client is read from context.
Without Provider: pass config (or legacy apiKey) — a client is created internally.

Extends

SttSessionConfig

Properties

Property	Type	Description
~~`apiKey?`~~	`ApiKeyConfig`	API key — string or async function that fetches a temporary key. Required when not using `<SonioxProvider>`. Deprecated Use `config` instead.
`audio_format?`	`"auto"` \| `AudioFormat`	Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default `'auto'`
`auto_reconnect?`	`boolean`	Enable automatic reconnection on retriable errors. Default `false`
`buffer_queue_size?`	`number`	Maximum audio chunks to buffer during connection setup.
`client_reference_id?`	`string`	Optional tracking identifier (max 256 chars).
`config?`	\| `SonioxConnectionConfig` \| (`context?`) => `Promise`<`SonioxConnectionConfig`>	Connection configuration — sync object or async function. Required when not using `<SonioxProvider>`.
`context?`	`TranscriptionContext`	Additional context to improve transcription accuracy.
`enable_endpoint_detection?`	`boolean`	Enable endpoint detection for utterance boundaries. Useful for voice AI agents.
`enable_language_identification?`	`boolean`	Enable automatic language detection.
`enable_speaker_diarization?`	`boolean`	Enable speaker identification.
`endpoint_latency_adjustment_level?`	`number`	Reduces endpoint latency compared to the default endpointing behavior. Higher values reduce endpoint latency more aggressively, which means endpoints are returned sooner and more endpoints may be emitted. This can split long speech into more segments and may slightly reduce word recognition accuracy because speech is finalized earlier. Allowed values are 0, 1, 2, and 3. The default value is 0 (default semantic endpointing behavior).
`endpoint_sensitivity?`	`number`	Controls how aggressively endpoints are detected. Adjusts how likely the model is to emit an endpoint. Higher values make endpoints more likely, which can finalize segments sooner. Lower values make endpoints less likely, which can help the system wait longer before finalizing. Allowed values are between -1.0 and 1.0. The default value is 0.0.
`groupBy?`	`"translation"` \| `"language"` \| `"speaker"` \| (`token`) => `string`	Group tokens by a key for easy splitting (e.g. translation, language, speaker). - `'translation'` — group by `translation_status`: keys `"original"` and `"translation"` - `'language'` — group by token `language` field: keys are language codes - `'speaker'` — group by token `speaker` field: keys are speaker identifiers - `(token) => string` — custom grouping function Auto-defaults when `translation` config is provided: - `one_way` → `'translation'` - `two_way` → `'language'`
`language_hints?`	`string`[]	Expected languages in the audio (ISO language codes).
`language_hints_strict?`	`boolean`	When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee.
`max_endpoint_delay_ms?`	`number`	Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms
`max_reconnect_attempts?`	`number`	Maximum consecutive reconnection attempts before giving up. Default `3`
`model`	`string`	Speech-to-text model to use.
`num_channels?`	`number`	Number of audio channels (required for raw audio formats).
`onConnected?`	() => `void`	Called when the WebSocket connects.
`onEndpoint?`	() => `void`	Called when an endpoint is detected.
`onError?`	(`error`) => `void`	Called when an error occurs.
`onFinalized?`	() => `void`	Called when the server acknowledges a finalize request (see UseRecordingReturn).
`onFinished?`	() => `void`	Called when the recording session finishes.
`onReconnected?`	(`event`) => `void`	Called after a successful reconnection.
`onReconnecting?`	(`event`) => `void`	Called before a reconnection attempt. Call `preventDefault()` to cancel.
`onResult?`	(`result`) => `void`	Called on each result from the server.
`onSourceMuted?`	() => `void`	Called when the audio source is muted externally (e.g. OS-level or hardware mute).
`onSourceUnmuted?`	() => `void`	Called when the audio source is unmuted after an external mute.
`onStateChange?`	(`update`) => `void`	Called on each state transition.
`onToken?`	(`token`) => `void`	Called for each token received from the server (both final and non-final).
`permissions?`	`PermissionResolver` \| `null`	Permission resolver override (only used when creating an inline client). Pass `null` to explicitly disable.
`reconnect_base_delay_ms?`	`number`	Base delay in milliseconds for exponential backoff. Default `1000`
`reset_transcript_on_reconnect?`	`boolean`	Clear accumulated transcript state on reconnect. Window-tracking state is always reset regardless. Default `false`
`resetOnStart?`	`boolean`	Reset transcript state when `start()` is called. Default `true`
`sample_rate?`	`number`	Sample rate in Hz (required for PCM formats).
`session_options?`	`SttSessionOptions`	SDK-level session options (signal, etc.).
`sessionConfig?`	(`resolved`) => `SttSessionConfig`	Function that receives the resolved connection config (including `stt_defaults` from the server) and returns session config overrides. When provided, its return value is used as the session config for the recording, and any flat session config fields on this object are ignored. Example `const { start } = useRecording({ config: asyncConfigFn, sessionConfig: (resolved) => ({ ...resolved.stt_defaults, enable_endpoint_detection: true, }), });`
`source?`	`AudioSource`	Custom audio source (bypasses default MicrophoneSource).
`translation?`	`TranslationConfig`	Translation configuration.
~~`wsBaseUrl?`~~	`string`	WebSocket URL override (only used when `apiKey` is provided). Deprecated Use `config.stt_ws_url` or `config.region` instead.

UseRecordingReturn

Immutable snapshot of the recording state exposed to React.

Extends

RecordingSnapshot

Properties

Property	Type	Description
`cancel`	() => `void`	Immediately cancel — does not wait for final results.
`clearTranscript`	() => `void`	Clear transcript state (finalText, partialText, utterances, segments).
`error`	`Error` \| `null`	Latest error, if any.
`finalize`	(`options?`) => `void`	Request the server to finalize current non-final tokens.
`finalText`	`string`	Accumulated finalized text.
`finalTokens`	readonly `RealtimeToken`[]	All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with `partialTokens` for the complete ordered stream.
`groups`	`Readonly`<`Record`<`string`, `TokenGroup`>>	Tokens grouped by the active `groupBy` strategy. Auto-populated when `translation` config is provided: - `one_way` → keys: `"original"`, `"translation"` - `two_way` → keys: language codes (e.g. `"en"`, `"es"`) Empty `{}` when no grouping is active.
`isActive`	`boolean`	`true` when state is not idle/stopped/canceled/error.
`isPaused`	`boolean`	`true` when `state === 'paused'`.
`isReconnecting`	`boolean`	`true` when the WebSocket is reconnecting after a drop.
`isRecording`	`boolean`	`true` when `state === 'recording'`.
`isSourceMuted`	`boolean`	`true` when the audio source is muted externally (e.g. OS-level or hardware mute).
`isSupported`	`boolean`	Whether the built-in browser `MicrophoneSource` is available. Custom `AudioSource` implementations work regardless of this value.
`partialText`	`string`	Text from current non-final tokens.
`partialTokens`	readonly `RealtimeToken`[]	Non-final tokens from the latest result.
`pause`	() => `void`	Pause recording — pauses audio capture and activates keepalive.
`reconnect`	() => `void`	Force a reconnection — tears down the current session and audio encoder, then establishes a new session. Requires `auto_reconnect`. Call this from platform lifecycle handlers (e.g. web `visibilitychange`, React Native `AppState`) to recover from stale connections after sleep/wake or backgrounding.
`reconnectAttempt`	`number`	Current reconnection attempt number (0 when not reconnecting).
`result`	`RealtimeResult` \| `null`	Latest raw result from the server.
`resume`	() => `void`	Resume recording after pause.
`segments`	readonly `RealtimeSegment`[]	Accumulated final segments.
`start`	() => `void`	Start a new recording. Aborts any in-flight recording first.
`state`	`RecordingState`	Current recording lifecycle state.
`stop`	() => `Promise`<`void`>	Gracefully stop — waits for final results from the server.
`text`	`string`	Full transcript: `finalText + partialText`.
`tokens`	readonly `RealtimeToken`[]	Tokens from the latest result message.
`unsupportedReason`	`UnsupportedReason` \| `undefined`	Why the built-in `MicrophoneSource` is unavailable, if applicable. Custom `AudioSource` implementations bypass this check entirely.
`utterances`	readonly `RealtimeUtterance`[]	Accumulated utterances (one per endpoint).

UseTtsConfig

Configuration for useTts.

Extends TtsStreamInput — flat TTS fields (model, voice, language, audio_format) are merged on top of server-provided tts_defaults.

Can be used with or without a <SonioxProvider>:

With Provider: omit config — the client is read from context.
Without Provider: pass config — a client is created internally.

In 'rest' mode, voice is required — the REST TTS endpoint (GenerateSpeechOptions.voice) has no default. Discover available voices via client.tts.listModels().

Extends

TtsStreamInput

Properties

Property	Type	Description
`audio_format?`	`TtsAudioFormat`	Output audio format Example `'wav'`
`bitrate?`	`number`	Codec bitrate in bps (for compressed formats).
`config?`	\| `SonioxConnectionConfig` \| (`context?`) => `Promise`<`SonioxConnectionConfig`>	Connection configuration — sync object or async function. Required when not using `<SonioxProvider>`.
`language?`	`string`	Language code for speech generation. Example `'en'`
`mode?`	`"websocket"` \| `"rest"`	Transport mode for TTS generation. - `'websocket'` (default): Real-time streaming via WebSocket. Supports incremental text input (`sendText`/`finish`) and streaming from LLM. - `'rest'`: HTTP request/response via the TTS REST endpoint. Sends full text at once, streams audio back. Simpler but no incremental text input. Default `'websocket'`
`model?`	`string`	Text-to-Speech model to use. Example `'tts-rt-v1'`
`onAudio?`	(`chunk`, `timestamps?`) => `void`	Called when an audio chunk is received. In WebSocket mode with `return_timestamps` enabled, the second argument carries the character-level alignment for that frame (`undefined` for audio-only frames). REST mode never provides timestamps.
`onAudioEnd?`	() => `void`	Called when the server marks the final audio payload.
`onError?`	(`error`) => `void`	Called on error.
`onStateChange?`	(`event`) => `void`	Called on each state transition.
`onTerminated?`	() => `void`	Called when generation is complete.
`return_timestamps?`	`boolean`	Request character-level audio timestamps in the responses. When enabled, audio frames may carry a TtsTimestamps payload aligning each character of the spoken text to its start/end time in the audio. WebSocket (realtime) only — the REST endpoint streams raw audio bytes and ignores this flag. Timestamps map to the model's preprocessed text, not the raw input. Defaults to `false` when omitted.
`sample_rate?`	`number`	Output sample rate in Hz. Required for raw PCM formats.
`speed?`	`number`	Speaking rate. `1.0` is the normal rate; values below `1.0` slow speech down and values above `1.0` speed it up. Supported range is `0.7`-`1.3`. Defaults to `1.0` when omitted.
`stream_id?`	`string`	Client-generated stream identifier. Must be unique among active streams on the same connection. Auto-generated if omitted.
`voice?`	`string`	Voice identifier. Example `'Adrian'`

UseTtsReturn

Immutable snapshot of the TTS state exposed to React.

Extends

TtsSnapshot

Properties

Property	Type	Description
`cancel`	() => `void`	Cancel the current generation immediately.
`error`	`Error` \| `null`	-
`finish`	() => `void`	Signal that no more text will be sent. WebSocket mode only.
`isConnecting`	`boolean`	-
`isSpeaking`	`boolean`	-
`sendText`	(`text`) => `void`	Send one text chunk without finishing. WebSocket mode only.
`speak`	(`text`) => `void`	Start TTS. Sends text (or pipes an async iterable in WebSocket mode) and generates audio.
`state`	`TtsState`	-
`stop`	() => `Promise`<`void`>	Gracefully stop — sends finish and waits for completion.

AudioLevel()

function AudioLevel(__namedParameters): ReactNode;

Parameters

Parameter	Type
`__namedParameters`	`AudioLevelProps`

Returns

ReactNode

SonioxProvider()

function SonioxProvider(props): ReactNode;

Parameters

Parameter	Type
`props`	`SonioxProviderProps`

Returns

ReactNode

checkAudioSupport()

function checkAudioSupport(): AudioSupportResult;

Check whether the current environment supports the built-in browser MicrophoneSource (which uses navigator.mediaDevices.getUserMedia).

This does not reflect general recording capability — custom AudioSource implementations (e.g. for React Native) bypass this check entirely and can record regardless of the result.

Returns

AudioSupportResult

Platform

browser

useAudioLevel()

function useAudioLevel(options?): UseAudioLevelReturn;

Parameters

Parameter	Type
`options?`	`UseAudioLevelOptions`

Returns

UseAudioLevelReturn

useMicrophonePermission()

function useMicrophonePermission(options?): MicrophonePermissionState;

Parameters

Parameter	Type
`options?`	`UseMicrophonePermissionOptions`

Returns

MicrophonePermissionState

useRecording()

function useRecording(config): UseRecordingReturn;

Parameters

Parameter	Type
`config`	`UseRecordingConfig`

Returns

UseRecordingReturn

useSoniox()

function useSoniox(): SonioxClient;

Returns the SonioxClient instance provided by the nearest SonioxProvider

Returns

SonioxClient

Throws

Error if called outside a SonioxProvider

useTts()

function useTts(config): UseTtsReturn;

Parameters

Parameter	Type
`config`	`UseTtsConfig`

Returns

UseTtsReturn

Types

On this page