Types
Soniox React SDK — Types Reference
SonioxProviderProps
Props for SonioxProvider.
Supply either a pre-built client instance or configuration props
Type Declaration
| Name | Type |
|---|---|
children | ReactNode |
TtsState
Aggregate state for the TTS hook.
UnsupportedReason
Reason why the built-in browser MicrophoneSource is unavailable:
'ssr'—navigatoris undefined (SSR, React Native, or other non-browser JS runtimes).'no-mediadevices'—navigatorexists butnavigator.mediaDevicesis missing.'no-getusermedia'—navigator.mediaDevicesexists butgetUserMediais not a function.'insecure-context'— the page is not served over HTTPS.
This only reflects whether the default MicrophoneSource can work.
Custom AudioSource implementations (e.g. for React Native) bypass this
check entirely and can record regardless of this value.
AudioLevelProps
Extends
Properties
| Property | Type | Description |
|---|---|---|
active? | boolean | Whether volume metering is active. When false, resources are released. |
bands? | number | Number of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. |
children | (state) => ReactNode | - |
fftSize? | number | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256 |
smoothing? | number | Exponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85 |
AudioSupportResult
Properties
| Property | Type |
|---|---|
isSupported | boolean |
reason? | UnsupportedReason |
MicrophonePermissionState
Properties
| Property | Type | Description |
|---|---|---|
canRequest | boolean | Whether the permission can be requested (e.g., via a prompt). |
check | () => Promise<void> | Check (or re-check) the microphone permission. No-op when unsupported. |
isDenied | boolean | status === 'denied'. |
isGranted | boolean | status === 'granted'. |
isSupported | boolean | Whether permission checking is available. |
status | MicPermissionStatus | Current permission status. |
RecordingSnapshot
Immutable snapshot of the recording state exposed to React.
Extended by
Properties
| Property | Type | Description |
|---|---|---|
error | Error | null | Latest error, if any. |
finalText | string | Accumulated finalized text. |
finalTokens | readonly RealtimeToken[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream. |
groups | Readonly<Record<string, TokenGroup>> | Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active. |
isActive | boolean | true when state is not idle/stopped/canceled/error. |
isPaused | boolean | true when state === 'paused'. |
isReconnecting | boolean | true when state === 'reconnecting'. |
isRecording | boolean | true when state === 'recording'. |
isSourceMuted | boolean | true when the audio source is muted externally (e.g. OS-level or hardware mute). |
partialText | string | Text from current non-final tokens. |
partialTokens | readonly RealtimeToken[] | Non-final tokens from the latest result. |
reconnectAttempt | number | Current reconnection attempt number (0 when not reconnecting). |
result | RealtimeResult | null | Latest raw result from the server. |
segments | readonly RealtimeSegment[] | Accumulated final segments. |
state | RecordingState | Current recording lifecycle state. |
text | string | Full transcript: finalText + partialText. |
tokens | readonly RealtimeToken[] | Tokens from the latest result message. |
utterances | readonly RealtimeUtterance[] | Accumulated utterances (one per endpoint). |
TtsSnapshot
Immutable snapshot of the TTS state exposed to React.
Extended by
Properties
| Property | Type |
|---|---|
error | Error | null |
isConnecting | boolean |
isSpeaking | boolean |
state | TtsState |
UseAudioLevelOptions
Extended by
Properties
| Property | Type | Description |
|---|---|---|
active? | boolean | Whether volume metering is active. When false, resources are released. |
bands? | number | Number of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. |
fftSize? | number | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256 |
smoothing? | number | Exponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85 |
UseAudioLevelReturn
Properties
| Property | Type | Description |
|---|---|---|
bands | readonly number[] | Per-band frequency levels, each 0-1. Empty array when the bands option is not set. |
volume | number | Current volume level, 0 to 1. Updated every animation frame. |
UseMicrophonePermissionOptions
Properties
| Property | Type | Description |
|---|---|---|
autoCheck? | boolean | Automatically check permission on mount. |
UseRecordingConfig
Configuration for useRecording.
Extends the STT session config (model, language_hints, etc.) with recording-specific and React-specific options.
Can be used with or without a <SonioxProvider>:
- With Provider: omit
config/apiKey— the client is read from context. - Without Provider: pass
config(or legacyapiKey) — a client is created internally.
Extends
SttSessionConfig
Properties
| Property | Type | Description |
|---|---|---|
apiKey? | ApiKeyConfig | API key — string or async function that fetches a temporary key. Required when not using <SonioxProvider>. Deprecated Use config instead. |
audio_format? | "auto" | AudioFormat | Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default 'auto' |
auto_reconnect? | boolean | Enable automatic reconnection on retriable errors. Default false |
buffer_queue_size? | number | Maximum audio chunks to buffer during connection setup. |
client_reference_id? | string | Optional tracking identifier (max 256 chars). |
config? | | SonioxConnectionConfig | (context?) => Promise<SonioxConnectionConfig> | Connection configuration — sync object or async function. Required when not using <SonioxProvider>. |
context? | TranscriptionContext | Additional context to improve transcription accuracy. |
enable_endpoint_detection? | boolean | Enable endpoint detection for utterance boundaries. Useful for voice AI agents. |
enable_language_identification? | boolean | Enable automatic language detection. |
enable_speaker_diarization? | boolean | Enable speaker identification. |
groupBy? | "translation" | "language" | "speaker" | (token) => string | Group tokens by a key for easy splitting (e.g. translation, language, speaker). - 'translation' — group by translation_status: keys "original" and "translation" - 'language' — group by token language field: keys are language codes - 'speaker' — group by token speaker field: keys are speaker identifiers - (token) => string — custom grouping function Auto-defaults when translation config is provided: - one_way → 'translation' - two_way → 'language' |
language_hints? | string[] | Expected languages in the audio (ISO language codes). |
language_hints_strict? | boolean | When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee. |
max_endpoint_delay_ms? | number | Maximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms |
max_reconnect_attempts? | number | Maximum consecutive reconnection attempts before giving up. Default 3 |
model | string | Speech-to-text model to use. |
num_channels? | number | Number of audio channels (required for raw audio formats). |
onConnected? | () => void | Called when the WebSocket connects. |
onEndpoint? | () => void | Called when an endpoint is detected. |
onError? | (error) => void | Called when an error occurs. |
onFinalized? | () => void | Called when the server acknowledges a finalize request (see UseRecordingReturn.finalize). |
onFinished? | () => void | Called when the recording session finishes. |
onReconnected? | (event) => void | Called after a successful reconnection. |
onReconnecting? | (event) => void | Called before a reconnection attempt. Call preventDefault() to cancel. |
onResult? | (result) => void | Called on each result from the server. |
onSourceMuted? | () => void | Called when the audio source is muted externally (e.g. OS-level or hardware mute). |
onSourceUnmuted? | () => void | Called when the audio source is unmuted after an external mute. |
onStateChange? | (update) => void | Called on each state transition. |
onToken? | (token) => void | Called for each token received from the server (both final and non-final). |
permissions? | PermissionResolver | null | Permission resolver override (only used when creating an inline client). Pass null to explicitly disable. |
reconnect_base_delay_ms? | number | Base delay in milliseconds for exponential backoff. Default 1000 |
reset_transcript_on_reconnect? | boolean | Clear accumulated transcript state on reconnect. Window-tracking state is always reset regardless. Default false |
resetOnStart? | boolean | Reset transcript state when start() is called. Default true |
sample_rate? | number | Sample rate in Hz (required for PCM formats). |
session_options? | SttSessionOptions | SDK-level session options (signal, etc.). |
sessionConfig? | (resolved) => SttSessionConfig | Function that receives the resolved connection config (including stt_defaults from the server) and returns session config overrides. When provided, its return value is used as the session config for the recording, and any flat session config fields on this object are ignored. Example const { start } = useRecording({ config: asyncConfigFn, sessionConfig: (resolved) => ({ ...resolved.stt_defaults, enable_endpoint_detection: true, }), }); |
source? | AudioSource | Custom audio source (bypasses default MicrophoneSource). |
translation? | TranslationConfig | Translation configuration. |
wsBaseUrl? | string | WebSocket URL override (only used when apiKey is provided). Deprecated Use config.stt_ws_url or config.region instead. |
UseRecordingReturn
Immutable snapshot of the recording state exposed to React.
Extends
Properties
| Property | Type | Description |
|---|---|---|
cancel | () => void | Immediately cancel — does not wait for final results. |
clearTranscript | () => void | Clear transcript state (finalText, partialText, utterances, segments). |
error | Error | null | Latest error, if any. |
finalize | (options?) => void | Request the server to finalize current non-final tokens. |
finalText | string | Accumulated finalized text. |
finalTokens | readonly RealtimeToken[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream. |
groups | Readonly<Record<string, TokenGroup>> | Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active. |
isActive | boolean | true when state is not idle/stopped/canceled/error. |
isPaused | boolean | true when state === 'paused'. |
isReconnecting | boolean | true when the WebSocket is reconnecting after a drop. |
isRecording | boolean | true when state === 'recording'. |
isSourceMuted | boolean | true when the audio source is muted externally (e.g. OS-level or hardware mute). |
isSupported | boolean | Whether the built-in browser MicrophoneSource is available. Custom AudioSource implementations work regardless of this value. |
partialText | string | Text from current non-final tokens. |
partialTokens | readonly RealtimeToken[] | Non-final tokens from the latest result. |
pause | () => void | Pause recording — pauses audio capture and activates keepalive. |
reconnect | () => void | Force a reconnection — tears down the current session and audio encoder, then establishes a new session. Requires auto_reconnect. Call this from platform lifecycle handlers (e.g. web visibilitychange, React Native AppState) to recover from stale connections after sleep/wake or backgrounding. |
reconnectAttempt | number | Current reconnection attempt number (0 when not reconnecting). |
result | RealtimeResult | null | Latest raw result from the server. |
resume | () => void | Resume recording after pause. |
segments | readonly RealtimeSegment[] | Accumulated final segments. |
start | () => void | Start a new recording. Aborts any in-flight recording first. |
state | RecordingState | Current recording lifecycle state. |
stop | () => Promise<void> | Gracefully stop — waits for final results from the server. |
text | string | Full transcript: finalText + partialText. |
tokens | readonly RealtimeToken[] | Tokens from the latest result message. |
unsupportedReason | UnsupportedReason | undefined | Why the built-in MicrophoneSource is unavailable, if applicable. Custom AudioSource implementations bypass this check entirely. |
utterances | readonly RealtimeUtterance[] | Accumulated utterances (one per endpoint). |
UseTtsConfig
Configuration for useTts.
Extends TtsStreamInput — flat TTS fields (model, voice, language,
audio_format) are merged on top of server-provided tts_defaults.
Can be used with or without a <SonioxProvider>:
- With Provider: omit
config— the client is read from context. - Without Provider: pass
config— a client is created internally.
In 'rest' mode, voice is required — the REST TTS endpoint
(GenerateSpeechOptions.voice) has no default. Discover available voices
via client.tts.listModels().
Extends
TtsStreamInput
Properties
| Property | Type | Description |
|---|---|---|
audio_format? | TtsAudioFormat | Output audio format Example 'wav' |
bitrate? | number | Codec bitrate in bps (for compressed formats). |
config? | | SonioxConnectionConfig | (context?) => Promise<SonioxConnectionConfig> | Connection configuration — sync object or async function. Required when not using <SonioxProvider>. |
language? | string | Language code for speech generation. Example 'en' |
mode? | "websocket" | "rest" | Transport mode for TTS generation. - 'websocket' (default): Real-time streaming via WebSocket. Supports incremental text input (sendText/finish) and streaming from LLM. - 'rest': HTTP request/response via the TTS REST endpoint. Sends full text at once, streams audio back. Simpler but no incremental text input. Default 'websocket' |
model? | string | Text-to-Speech model to use. Example 'tts-rt-v1-preview' |
onAudio? | (chunk) => void | Called when an audio chunk is received. |
onAudioEnd? | () => void | Called when the server marks the final audio payload. |
onError? | (error) => void | Called on error. |
onStateChange? | (event) => void | Called on each state transition. |
onTerminated? | () => void | Called when generation is complete. |
sample_rate? | number | Output sample rate in Hz. Required for raw PCM formats. |
stream_id? | string | Client-generated stream identifier. Must be unique among active streams on the same connection. Auto-generated if omitted. |
voice? | string | Voice identifier. Example 'Adrian' |
UseTtsReturn
Immutable snapshot of the TTS state exposed to React.
Extends
Properties
| Property | Type | Description |
|---|---|---|
cancel | () => void | Cancel the current generation immediately. |
error | Error | null | - |
finish | () => void | Signal that no more text will be sent. WebSocket mode only. |
isConnecting | boolean | - |
isSpeaking | boolean | - |
sendText | (text) => void | Send one text chunk without finishing. WebSocket mode only. |
speak | (text) => void | Start TTS. Sends text (or pipes an async iterable in WebSocket mode) and generates audio. |
state | TtsState | - |
stop | () => Promise<void> | Gracefully stop — sends finish and waits for completion. |
AudioLevel()
Parameters
| Parameter | Type |
|---|---|
__namedParameters | AudioLevelProps |
Returns
ReactNode
SonioxProvider()
Parameters
| Parameter | Type |
|---|---|
props | SonioxProviderProps |
Returns
ReactNode
checkAudioSupport()
Check whether the current environment supports the built-in browser
MicrophoneSource (which uses navigator.mediaDevices.getUserMedia).
This does not reflect general recording capability — custom AudioSource
implementations (e.g. for React Native) bypass this check entirely and can
record regardless of the result.
Returns
Platform
browser
useAudioLevel()
Parameters
| Parameter | Type |
|---|---|
options? | UseAudioLevelOptions |
Returns
useMicrophonePermission()
Parameters
| Parameter | Type |
|---|---|
options? | UseMicrophonePermissionOptions |
Returns
useRecording()
Parameters
| Parameter | Type |
|---|---|
config | UseRecordingConfig |
Returns
useSoniox()
Returns the SonioxClient instance provided by the nearest SonioxProvider
Returns
SonioxClient
Throws
Error if called outside a SonioxProvider
useTts()
Parameters
| Parameter | Type |
|---|---|
config | UseTtsConfig |
Returns