Soniox

Types

Soniox React SDK — Types Reference

SonioxProviderProps

type SonioxProviderProps = {
  children: ReactNode;
} & SonioxProviderConfigProps | SonioxProviderClientProps;

Props for SonioxProvider.

Supply either a pre-built client instance or configuration props

Type Declaration

NameType
childrenReactNode

UnsupportedReason

type UnsupportedReason = "ssr" | "no-mediadevices" | "no-getusermedia" | "insecure-context";

Reason why the built-in browser MicrophoneSource is unavailable:

  • 'ssr'navigator is undefined (SSR, React Native, or other non-browser JS runtimes).
  • 'no-mediadevices'navigator exists but navigator.mediaDevices is missing.
  • 'no-getusermedia'navigator.mediaDevices exists but getUserMedia is not a function.
  • 'insecure-context' — the page is not served over HTTPS.

This only reflects whether the default MicrophoneSource can work. Custom AudioSource implementations (e.g. for React Native) bypass this check entirely and can record regardless of this value.


AudioLevelProps

Extends

Properties

PropertyTypeDescription
active?booleanWhether volume metering is active. When false, resources are released.
bands?numberNumber of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations.
children(state) => ReactNode-
fftSize?numberFFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256
smoothing?numberExponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85

AudioSupportResult

Properties

PropertyType
isSupportedboolean
reason?UnsupportedReason

MicrophonePermissionState

Properties

PropertyTypeDescription
canRequestbooleanWhether the permission can be requested (e.g., via a prompt).
check() => Promise<void>Check (or re-check) the microphone permission. No-op when unsupported.
isDeniedbooleanstatus === 'denied'.
isGrantedbooleanstatus === 'granted'.
isSupportedbooleanWhether permission checking is available.
statusMicPermissionStatusCurrent permission status.

RecordingSnapshot

Immutable snapshot of the recording state exposed to React.

Extended by

Properties

PropertyTypeDescription
errorError | nullLatest error, if any.
finalTextstringAccumulated finalized text.
finalTokensreadonly RealtimeToken[]All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream.
groupsReadonly<Record<string, TokenGroup>>Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active.
isActivebooleantrue when state is not idle/stopped/canceled/error.
isPausedbooleantrue when state === 'paused'.
isRecordingbooleantrue when state === 'recording'.
isSourceMutedbooleantrue when the audio source is muted externally (e.g. OS-level or hardware mute).
partialTextstringText from current non-final tokens.
partialTokensreadonly RealtimeToken[]Non-final tokens from the latest result.
resultRealtimeResult | nullLatest raw result from the server.
segmentsreadonly RealtimeSegment[]Accumulated final segments.
stateRecordingStateCurrent recording lifecycle state.
textstringFull transcript: finalText + partialText.
tokensreadonly RealtimeToken[]Tokens from the latest result message.
utterancesreadonly RealtimeUtterance[]Accumulated utterances (one per endpoint).

UseAudioLevelOptions

Extended by

Properties

PropertyTypeDescription
active?booleanWhether volume metering is active. When false, resources are released.
bands?numberNumber of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations.
fftSize?numberFFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256
smoothing?numberExponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85

UseAudioLevelReturn

Properties

PropertyTypeDescription
bandsreadonly number[]Per-band frequency levels, each 0-1. Empty array when the bands option is not set.
volumenumberCurrent volume level, 0 to 1. Updated every animation frame.

UseMicrophonePermissionOptions

Properties

PropertyTypeDescription
autoCheck?booleanAutomatically check permission on mount.

UseRecordingConfig

Configuration for useRecording.

Extends the STT session config (model, language_hints, etc.) with recording-specific and React-specific options.

Can be used with or without a <SonioxProvider>:

  • With Provider: omit apiKey — the client is read from context.
  • Without Provider: pass apiKey directly — a client is created internally.

Extends

  • SttSessionConfig

Properties

PropertyTypeDescription
apiKey?ApiKeyConfigAPI key — string or async function that fetches a temporary key. Required when not using <SonioxProvider>.
audio_format?"auto" | AudioFormatAudio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default 'auto'
buffer_queue_size?numberMaximum audio chunks to buffer during connection setup.
client_reference_id?stringOptional tracking identifier (max 256 chars).
context?TranscriptionContextAdditional context to improve transcription accuracy.
enable_endpoint_detection?booleanEnable endpoint detection for utterance boundaries. Useful for voice AI agents.
enable_language_identification?booleanEnable automatic language detection.
enable_speaker_diarization?booleanEnable speaker identification.
groupBy?"translation" | "language" | "speaker" | (token) => stringGroup tokens by a key for easy splitting (e.g. translation, language, speaker). - 'translation' — group by translation_status: keys "original" and "translation" - 'language' — group by token language field: keys are language codes - 'speaker' — group by token speaker field: keys are speaker identifiers - (token) => string — custom grouping function Auto-defaults when translation config is provided: - one_way'translation' - two_way'language'
language_hints?string[]Expected languages in the audio (ISO language codes).
language_hints_strict?booleanWhen true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee.
modelstringSpeech-to-text model to use.
num_channels?numberNumber of audio channels (required for raw audio formats).
onConnected?() => voidCalled when the WebSocket connects.
onEndpoint?() => voidCalled when an endpoint is detected.
onError?(error) => voidCalled when an error occurs.
onFinished?() => voidCalled when the recording session finishes.
onResult?(result) => voidCalled on each result from the server.
onSourceMuted?() => voidCalled when the audio source is muted externally (e.g. OS-level or hardware mute).
onSourceUnmuted?() => voidCalled when the audio source is unmuted after an external mute.
onStateChange?(update) => voidCalled on each state transition.
permissions?PermissionResolver | nullPermission resolver override (only used when apiKey is provided). Pass null to explicitly disable.
resetOnStart?booleanReset transcript state when start() is called. Default true
sample_rate?numberSample rate in Hz (required for PCM formats).
session_options?SttSessionOptionsSDK-level session options (signal, etc.).
source?AudioSourceCustom audio source (bypasses default MicrophoneSource).
translation?TranslationConfigTranslation configuration.
wsBaseUrl?stringWebSocket URL override (only used when apiKey is provided).

UseRecordingReturn

Immutable snapshot of the recording state exposed to React.

Extends

Properties

PropertyTypeDescription
cancel() => voidImmediately cancel — does not wait for final results.
clearTranscript() => voidClear transcript state (finalText, partialText, utterances, segments).
errorError | nullLatest error, if any.
finalize(options?) => voidRequest the server to finalize current non-final tokens.
finalTextstringAccumulated finalized text.
finalTokensreadonly RealtimeToken[]All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream.
groupsReadonly<Record<string, TokenGroup>>Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active.
isActivebooleantrue when state is not idle/stopped/canceled/error.
isPausedbooleantrue when state === 'paused'.
isRecordingbooleantrue when state === 'recording'.
isSourceMutedbooleantrue when the audio source is muted externally (e.g. OS-level or hardware mute).
isSupportedbooleanWhether the built-in browser MicrophoneSource is available. Custom AudioSource implementations work regardless of this value.
partialTextstringText from current non-final tokens.
partialTokensreadonly RealtimeToken[]Non-final tokens from the latest result.
pause() => voidPause recording — pauses audio capture and activates keepalive.
resultRealtimeResult | nullLatest raw result from the server.
resume() => voidResume recording after pause.
segmentsreadonly RealtimeSegment[]Accumulated final segments.
start() => voidStart a new recording. Aborts any in-flight recording first.
stateRecordingStateCurrent recording lifecycle state.
stop() => Promise<void>Gracefully stop — waits for final results from the server.
textstringFull transcript: finalText + partialText.
tokensreadonly RealtimeToken[]Tokens from the latest result message.
unsupportedReasonUnsupportedReason | undefinedWhy the built-in MicrophoneSource is unavailable, if applicable. Custom AudioSource implementations bypass this check entirely.
utterancesreadonly RealtimeUtterance[]Accumulated utterances (one per endpoint).

AudioLevel()

function AudioLevel(__namedParameters): ReactNode;

Parameters

ParameterType
__namedParametersAudioLevelProps

Returns

ReactNode


SonioxProvider()

function SonioxProvider(props): ReactNode;

Parameters

ParameterType
propsSonioxProviderProps

Returns

ReactNode


checkAudioSupport()

function checkAudioSupport(): AudioSupportResult;

Check whether the current environment supports the built-in browser MicrophoneSource (which uses navigator.mediaDevices.getUserMedia).

This does not reflect general recording capability — custom AudioSource implementations (e.g. for React Native) bypass this check entirely and can record regardless of the result.

Returns

AudioSupportResult

Platform

browser


useAudioLevel()

function useAudioLevel(options?): UseAudioLevelReturn;

Parameters

ParameterType
options?UseAudioLevelOptions

Returns

UseAudioLevelReturn


useMicrophonePermission()

function useMicrophonePermission(options?): MicrophonePermissionState;

Parameters

ParameterType
options?UseMicrophonePermissionOptions

Returns

MicrophonePermissionState


useRecording()

function useRecording(config): UseRecordingReturn;

Parameters

ParameterType
configUseRecordingConfig

Returns

UseRecordingReturn


useSoniox()

function useSoniox(): SonioxClient;

Returns the SonioxClient instance provided by the nearest SonioxProvider

Returns

SonioxClient

Throws

Error if called outside a SonioxProvider