Types
Soniox React SDK — Types Reference
SonioxProviderProps
Props for SonioxProvider.
Supply either a pre-built client instance or configuration props
Type Declaration
| Name | Type |
|---|---|
children | ReactNode |
UnsupportedReason
Reason why the built-in browser MicrophoneSource is unavailable:
'ssr'—navigatoris undefined (SSR, React Native, or other non-browser JS runtimes).'no-mediadevices'—navigatorexists butnavigator.mediaDevicesis missing.'no-getusermedia'—navigator.mediaDevicesexists butgetUserMediais not a function.'insecure-context'— the page is not served over HTTPS.
This only reflects whether the default MicrophoneSource can work.
Custom AudioSource implementations (e.g. for React Native) bypass this
check entirely and can record regardless of this value.
AudioLevelProps
Extends
Properties
| Property | Type | Description |
|---|---|---|
active? | boolean | Whether volume metering is active. When false, resources are released. |
bands? | number | Number of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. |
children | (state) => ReactNode | - |
fftSize? | number | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256 |
smoothing? | number | Exponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85 |
AudioSupportResult
Properties
| Property | Type |
|---|---|
isSupported | boolean |
reason? | UnsupportedReason |
MicrophonePermissionState
Properties
| Property | Type | Description |
|---|---|---|
canRequest | boolean | Whether the permission can be requested (e.g., via a prompt). |
check | () => Promise<void> | Check (or re-check) the microphone permission. No-op when unsupported. |
isDenied | boolean | status === 'denied'. |
isGranted | boolean | status === 'granted'. |
isSupported | boolean | Whether permission checking is available. |
status | MicPermissionStatus | Current permission status. |
RecordingSnapshot
Immutable snapshot of the recording state exposed to React.
Extended by
Properties
| Property | Type | Description |
|---|---|---|
error | Error | null | Latest error, if any. |
finalText | string | Accumulated finalized text. |
finalTokens | readonly RealtimeToken[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream. |
groups | Readonly<Record<string, TokenGroup>> | Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active. |
isActive | boolean | true when state is not idle/stopped/canceled/error. |
isPaused | boolean | true when state === 'paused'. |
isRecording | boolean | true when state === 'recording'. |
isSourceMuted | boolean | true when the audio source is muted externally (e.g. OS-level or hardware mute). |
partialText | string | Text from current non-final tokens. |
partialTokens | readonly RealtimeToken[] | Non-final tokens from the latest result. |
result | RealtimeResult | null | Latest raw result from the server. |
segments | readonly RealtimeSegment[] | Accumulated final segments. |
state | RecordingState | Current recording lifecycle state. |
text | string | Full transcript: finalText + partialText. |
tokens | readonly RealtimeToken[] | Tokens from the latest result message. |
utterances | readonly RealtimeUtterance[] | Accumulated utterances (one per endpoint). |
UseAudioLevelOptions
Extended by
Properties
| Property | Type | Description |
|---|---|---|
active? | boolean | Whether volume metering is active. When false, resources are released. |
bands? | number | Number of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations. |
fftSize? | number | FFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256 |
smoothing? | number | Exponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85 |
UseAudioLevelReturn
Properties
| Property | Type | Description |
|---|---|---|
bands | readonly number[] | Per-band frequency levels, each 0-1. Empty array when the bands option is not set. |
volume | number | Current volume level, 0 to 1. Updated every animation frame. |
UseMicrophonePermissionOptions
Properties
| Property | Type | Description |
|---|---|---|
autoCheck? | boolean | Automatically check permission on mount. |
UseRecordingConfig
Configuration for useRecording.
Extends the STT session config (model, language_hints, etc.) with recording-specific and React-specific options.
Can be used with or without a <SonioxProvider>:
- With Provider: omit
apiKey— the client is read from context. - Without Provider: pass
apiKeydirectly — a client is created internally.
Extends
SttSessionConfig
Properties
| Property | Type | Description |
|---|---|---|
apiKey? | ApiKeyConfig | API key — string or async function that fetches a temporary key. Required when not using <SonioxProvider>. |
audio_format? | "auto" | AudioFormat | Audio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default 'auto' |
buffer_queue_size? | number | Maximum audio chunks to buffer during connection setup. |
client_reference_id? | string | Optional tracking identifier (max 256 chars). |
context? | TranscriptionContext | Additional context to improve transcription accuracy. |
enable_endpoint_detection? | boolean | Enable endpoint detection for utterance boundaries. Useful for voice AI agents. |
enable_language_identification? | boolean | Enable automatic language detection. |
enable_speaker_diarization? | boolean | Enable speaker identification. |
groupBy? | "translation" | "language" | "speaker" | (token) => string | Group tokens by a key for easy splitting (e.g. translation, language, speaker). - 'translation' — group by translation_status: keys "original" and "translation" - 'language' — group by token language field: keys are language codes - 'speaker' — group by token speaker field: keys are speaker identifiers - (token) => string — custom grouping function Auto-defaults when translation config is provided: - one_way → 'translation' - two_way → 'language' |
language_hints? | string[] | Expected languages in the audio (ISO language codes). |
language_hints_strict? | boolean | When true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee. |
model | string | Speech-to-text model to use. |
num_channels? | number | Number of audio channels (required for raw audio formats). |
onConnected? | () => void | Called when the WebSocket connects. |
onEndpoint? | () => void | Called when an endpoint is detected. |
onError? | (error) => void | Called when an error occurs. |
onFinished? | () => void | Called when the recording session finishes. |
onResult? | (result) => void | Called on each result from the server. |
onSourceMuted? | () => void | Called when the audio source is muted externally (e.g. OS-level or hardware mute). |
onSourceUnmuted? | () => void | Called when the audio source is unmuted after an external mute. |
onStateChange? | (update) => void | Called on each state transition. |
permissions? | PermissionResolver | null | Permission resolver override (only used when apiKey is provided). Pass null to explicitly disable. |
resetOnStart? | boolean | Reset transcript state when start() is called. Default true |
sample_rate? | number | Sample rate in Hz (required for PCM formats). |
session_options? | SttSessionOptions | SDK-level session options (signal, etc.). |
source? | AudioSource | Custom audio source (bypasses default MicrophoneSource). |
translation? | TranslationConfig | Translation configuration. |
wsBaseUrl? | string | WebSocket URL override (only used when apiKey is provided). |
UseRecordingReturn
Immutable snapshot of the recording state exposed to React.
Extends
Properties
| Property | Type | Description |
|---|---|---|
cancel | () => void | Immediately cancel — does not wait for final results. |
clearTranscript | () => void | Clear transcript state (finalText, partialText, utterances, segments). |
error | Error | null | Latest error, if any. |
finalize | (options?) => void | Request the server to finalize current non-final tokens. |
finalText | string | Accumulated finalized text. |
finalTokens | readonly RealtimeToken[] | All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream. |
groups | Readonly<Record<string, TokenGroup>> | Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active. |
isActive | boolean | true when state is not idle/stopped/canceled/error. |
isPaused | boolean | true when state === 'paused'. |
isRecording | boolean | true when state === 'recording'. |
isSourceMuted | boolean | true when the audio source is muted externally (e.g. OS-level or hardware mute). |
isSupported | boolean | Whether the built-in browser MicrophoneSource is available. Custom AudioSource implementations work regardless of this value. |
partialText | string | Text from current non-final tokens. |
partialTokens | readonly RealtimeToken[] | Non-final tokens from the latest result. |
pause | () => void | Pause recording — pauses audio capture and activates keepalive. |
result | RealtimeResult | null | Latest raw result from the server. |
resume | () => void | Resume recording after pause. |
segments | readonly RealtimeSegment[] | Accumulated final segments. |
start | () => void | Start a new recording. Aborts any in-flight recording first. |
state | RecordingState | Current recording lifecycle state. |
stop | () => Promise<void> | Gracefully stop — waits for final results from the server. |
text | string | Full transcript: finalText + partialText. |
tokens | readonly RealtimeToken[] | Tokens from the latest result message. |
unsupportedReason | UnsupportedReason | undefined | Why the built-in MicrophoneSource is unavailable, if applicable. Custom AudioSource implementations bypass this check entirely. |
utterances | readonly RealtimeUtterance[] | Accumulated utterances (one per endpoint). |
AudioLevel()
Parameters
| Parameter | Type |
|---|---|
__namedParameters | AudioLevelProps |
Returns
ReactNode
SonioxProvider()
Parameters
| Parameter | Type |
|---|---|
props | SonioxProviderProps |
Returns
ReactNode
checkAudioSupport()
Check whether the current environment supports the built-in browser
MicrophoneSource (which uses navigator.mediaDevices.getUserMedia).
This does not reflect general recording capability — custom AudioSource
implementations (e.g. for React Native) bypass this check entirely and can
record regardless of the result.
Returns
Platform
browser
useAudioLevel()
Parameters
| Parameter | Type |
|---|---|
options? | UseAudioLevelOptions |
Returns
useMicrophonePermission()
Parameters
| Parameter | Type |
|---|---|
options? | UseMicrophonePermissionOptions |
Returns
useRecording()
Parameters
| Parameter | Type |
|---|---|
config | UseRecordingConfig |
Returns
useSoniox()
Returns the SonioxClient instance provided by the nearest SonioxProvider
Returns
SonioxClient
Throws
Error if called outside a SonioxProvider