Soniox

Types

Soniox React SDK — Types Reference

SonioxProviderProps

type SonioxProviderProps = {
  children: ReactNode;
} & SonioxProviderConfigProps | SonioxProviderClientProps;

Props for SonioxProvider.

Supply either a pre-built client instance or configuration props

Type Declaration

NameType
childrenReactNode

TtsState

type TtsState = "idle" | "connecting" | "speaking" | "stopping" | "error";

Aggregate state for the TTS hook.


UnsupportedReason

type UnsupportedReason = "ssr" | "no-mediadevices" | "no-getusermedia" | "insecure-context";

Reason why the built-in browser MicrophoneSource is unavailable:

  • 'ssr'navigator is undefined (SSR, React Native, or other non-browser JS runtimes).
  • 'no-mediadevices'navigator exists but navigator.mediaDevices is missing.
  • 'no-getusermedia'navigator.mediaDevices exists but getUserMedia is not a function.
  • 'insecure-context' — the page is not served over HTTPS.

This only reflects whether the default MicrophoneSource can work. Custom AudioSource implementations (e.g. for React Native) bypass this check entirely and can record regardless of this value.


AudioLevelProps

Extends

Properties

PropertyTypeDescription
active?booleanWhether volume metering is active. When false, resources are released.
bands?numberNumber of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations.
children(state) => ReactNode-
fftSize?numberFFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256
smoothing?numberExponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85

AudioSupportResult

Properties

PropertyType
isSupportedboolean
reason?UnsupportedReason

MicrophonePermissionState

Properties

PropertyTypeDescription
canRequestbooleanWhether the permission can be requested (e.g., via a prompt).
check() => Promise<void>Check (or re-check) the microphone permission. No-op when unsupported.
isDeniedbooleanstatus === 'denied'.
isGrantedbooleanstatus === 'granted'.
isSupportedbooleanWhether permission checking is available.
statusMicPermissionStatusCurrent permission status.

RecordingSnapshot

Immutable snapshot of the recording state exposed to React.

Extended by

Properties

PropertyTypeDescription
errorError | nullLatest error, if any.
finalTextstringAccumulated finalized text.
finalTokensreadonly RealtimeToken[]All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream.
groupsReadonly<Record<string, TokenGroup>>Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active.
isActivebooleantrue when state is not idle/stopped/canceled/error.
isPausedbooleantrue when state === 'paused'.
isReconnectingbooleantrue when state === 'reconnecting'.
isRecordingbooleantrue when state === 'recording'.
isSourceMutedbooleantrue when the audio source is muted externally (e.g. OS-level or hardware mute).
partialTextstringText from current non-final tokens.
partialTokensreadonly RealtimeToken[]Non-final tokens from the latest result.
reconnectAttemptnumberCurrent reconnection attempt number (0 when not reconnecting).
resultRealtimeResult | nullLatest raw result from the server.
segmentsreadonly RealtimeSegment[]Accumulated final segments.
stateRecordingStateCurrent recording lifecycle state.
textstringFull transcript: finalText + partialText.
tokensreadonly RealtimeToken[]Tokens from the latest result message.
utterancesreadonly RealtimeUtterance[]Accumulated utterances (one per endpoint).

TtsSnapshot

Immutable snapshot of the TTS state exposed to React.

Extended by

Properties

PropertyType
errorError | null
isConnectingboolean
isSpeakingboolean
stateTtsState

UseAudioLevelOptions

Extended by

Properties

PropertyTypeDescription
active?booleanWhether volume metering is active. When false, resources are released.
bands?numberNumber of frequency bands to return. When set, the bands array is populated with per-band levels (0-1). Useful for spectrum/equalizer visualizations.
fftSize?numberFFT size for the AnalyserNode. Must be a power of 2. Higher values give more frequency resolution (more bins per band) but update less frequently. Default 256
smoothing?numberExponential smoothing factor (0-1). Higher = smoother/slower decay. Default 0.85

UseAudioLevelReturn

Properties

PropertyTypeDescription
bandsreadonly number[]Per-band frequency levels, each 0-1. Empty array when the bands option is not set.
volumenumberCurrent volume level, 0 to 1. Updated every animation frame.

UseMicrophonePermissionOptions

Properties

PropertyTypeDescription
autoCheck?booleanAutomatically check permission on mount.

UseRecordingConfig

Configuration for useRecording.

Extends the STT session config (model, language_hints, etc.) with recording-specific and React-specific options.

Can be used with or without a <SonioxProvider>:

  • With Provider: omit config/apiKey — the client is read from context.
  • Without Provider: pass config (or legacy apiKey) — a client is created internally.

Extends

  • SttSessionConfig

Properties

PropertyTypeDescription
apiKey?ApiKeyConfigAPI key — string or async function that fetches a temporary key. Required when not using <SonioxProvider>. Deprecated Use config instead.
audio_format?"auto" | AudioFormatAudio format. Use 'auto' for automatic detection of container formats. For raw PCM formats, also set sample_rate and num_channels. Default 'auto'
auto_reconnect?booleanEnable automatic reconnection on retriable errors. Default false
buffer_queue_size?numberMaximum audio chunks to buffer during connection setup.
client_reference_id?stringOptional tracking identifier (max 256 chars).
config?| SonioxConnectionConfig | (context?) => Promise<SonioxConnectionConfig>Connection configuration — sync object or async function. Required when not using <SonioxProvider>.
context?TranscriptionContextAdditional context to improve transcription accuracy.
enable_endpoint_detection?booleanEnable endpoint detection for utterance boundaries. Useful for voice AI agents.
enable_language_identification?booleanEnable automatic language detection.
enable_speaker_diarization?booleanEnable speaker identification.
groupBy?"translation" | "language" | "speaker" | (token) => stringGroup tokens by a key for easy splitting (e.g. translation, language, speaker). - 'translation' — group by translation_status: keys "original" and "translation" - 'language' — group by token language field: keys are language codes - 'speaker' — group by token speaker field: keys are speaker identifiers - (token) => string — custom grouping function Auto-defaults when translation config is provided: - one_way'translation' - two_way'language'
language_hints?string[]Expected languages in the audio (ISO language codes).
language_hints_strict?booleanWhen true, recognition is strongly biased toward language hints. Best-effort only, not a hard guarantee.
max_endpoint_delay_ms?numberMaximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms
max_reconnect_attempts?numberMaximum consecutive reconnection attempts before giving up. Default 3
modelstringSpeech-to-text model to use.
num_channels?numberNumber of audio channels (required for raw audio formats).
onConnected?() => voidCalled when the WebSocket connects.
onEndpoint?() => voidCalled when an endpoint is detected.
onError?(error) => voidCalled when an error occurs.
onFinalized?() => voidCalled when the server acknowledges a finalize request (see UseRecordingReturn.finalize).
onFinished?() => voidCalled when the recording session finishes.
onReconnected?(event) => voidCalled after a successful reconnection.
onReconnecting?(event) => voidCalled before a reconnection attempt. Call preventDefault() to cancel.
onResult?(result) => voidCalled on each result from the server.
onSourceMuted?() => voidCalled when the audio source is muted externally (e.g. OS-level or hardware mute).
onSourceUnmuted?() => voidCalled when the audio source is unmuted after an external mute.
onStateChange?(update) => voidCalled on each state transition.
onToken?(token) => voidCalled for each token received from the server (both final and non-final).
permissions?PermissionResolver | nullPermission resolver override (only used when creating an inline client). Pass null to explicitly disable.
reconnect_base_delay_ms?numberBase delay in milliseconds for exponential backoff. Default 1000
reset_transcript_on_reconnect?booleanClear accumulated transcript state on reconnect. Window-tracking state is always reset regardless. Default false
resetOnStart?booleanReset transcript state when start() is called. Default true
sample_rate?numberSample rate in Hz (required for PCM formats).
session_options?SttSessionOptionsSDK-level session options (signal, etc.).
sessionConfig?(resolved) => SttSessionConfigFunction that receives the resolved connection config (including stt_defaults from the server) and returns session config overrides. When provided, its return value is used as the session config for the recording, and any flat session config fields on this object are ignored. Example const { start } = useRecording({ config: asyncConfigFn, sessionConfig: (resolved) => ({ ...resolved.stt_defaults, enable_endpoint_detection: true, }), });
source?AudioSourceCustom audio source (bypasses default MicrophoneSource).
translation?TranslationConfigTranslation configuration.
wsBaseUrl?stringWebSocket URL override (only used when apiKey is provided). Deprecated Use config.stt_ws_url or config.region instead.

UseRecordingReturn

Immutable snapshot of the recording state exposed to React.

Extends

Properties

PropertyTypeDescription
cancel() => voidImmediately cancel — does not wait for final results.
clearTranscript() => voidClear transcript state (finalText, partialText, utterances, segments).
errorError | nullLatest error, if any.
finalize(options?) => voidRequest the server to finalize current non-final tokens.
finalTextstringAccumulated finalized text.
finalTokensreadonly RealtimeToken[]All finalized tokens in chronological order. Useful for rendering per-token metadata (language, speaker, etc.) in the order tokens were spoken. Pair with partialTokens for the complete ordered stream.
groupsReadonly<Record<string, TokenGroup>>Tokens grouped by the active groupBy strategy. Auto-populated when translation config is provided: - one_way → keys: "original", "translation" - two_way → keys: language codes (e.g. "en", "es") Empty {} when no grouping is active.
isActivebooleantrue when state is not idle/stopped/canceled/error.
isPausedbooleantrue when state === 'paused'.
isReconnectingbooleantrue when the WebSocket is reconnecting after a drop.
isRecordingbooleantrue when state === 'recording'.
isSourceMutedbooleantrue when the audio source is muted externally (e.g. OS-level or hardware mute).
isSupportedbooleanWhether the built-in browser MicrophoneSource is available. Custom AudioSource implementations work regardless of this value.
partialTextstringText from current non-final tokens.
partialTokensreadonly RealtimeToken[]Non-final tokens from the latest result.
pause() => voidPause recording — pauses audio capture and activates keepalive.
reconnect() => voidForce a reconnection — tears down the current session and audio encoder, then establishes a new session. Requires auto_reconnect. Call this from platform lifecycle handlers (e.g. web visibilitychange, React Native AppState) to recover from stale connections after sleep/wake or backgrounding.
reconnectAttemptnumberCurrent reconnection attempt number (0 when not reconnecting).
resultRealtimeResult | nullLatest raw result from the server.
resume() => voidResume recording after pause.
segmentsreadonly RealtimeSegment[]Accumulated final segments.
start() => voidStart a new recording. Aborts any in-flight recording first.
stateRecordingStateCurrent recording lifecycle state.
stop() => Promise<void>Gracefully stop — waits for final results from the server.
textstringFull transcript: finalText + partialText.
tokensreadonly RealtimeToken[]Tokens from the latest result message.
unsupportedReasonUnsupportedReason | undefinedWhy the built-in MicrophoneSource is unavailable, if applicable. Custom AudioSource implementations bypass this check entirely.
utterancesreadonly RealtimeUtterance[]Accumulated utterances (one per endpoint).

UseTtsConfig

Configuration for useTts.

Extends TtsStreamInput — flat TTS fields (model, voice, language, audio_format) are merged on top of server-provided tts_defaults.

Can be used with or without a <SonioxProvider>:

  • With Provider: omit config — the client is read from context.
  • Without Provider: pass config — a client is created internally.

In 'rest' mode, voice is required — the REST TTS endpoint (GenerateSpeechOptions.voice) has no default. Discover available voices via client.tts.listModels().

Extends

  • TtsStreamInput

Properties

PropertyTypeDescription
audio_format?TtsAudioFormatOutput audio format Example 'wav'
bitrate?numberCodec bitrate in bps (for compressed formats).
config?| SonioxConnectionConfig | (context?) => Promise<SonioxConnectionConfig>Connection configuration — sync object or async function. Required when not using <SonioxProvider>.
language?stringLanguage code for speech generation. Example 'en'
mode?"websocket" | "rest"Transport mode for TTS generation. - 'websocket' (default): Real-time streaming via WebSocket. Supports incremental text input (sendText/finish) and streaming from LLM. - 'rest': HTTP request/response via the TTS REST endpoint. Sends full text at once, streams audio back. Simpler but no incremental text input. Default 'websocket'
model?stringText-to-Speech model to use. Example 'tts-rt-v1-preview'
onAudio?(chunk) => voidCalled when an audio chunk is received.
onAudioEnd?() => voidCalled when the server marks the final audio payload.
onError?(error) => voidCalled on error.
onStateChange?(event) => voidCalled on each state transition.
onTerminated?() => voidCalled when generation is complete.
sample_rate?numberOutput sample rate in Hz. Required for raw PCM formats.
stream_id?stringClient-generated stream identifier. Must be unique among active streams on the same connection. Auto-generated if omitted.
voice?stringVoice identifier. Example 'Adrian'

UseTtsReturn

Immutable snapshot of the TTS state exposed to React.

Extends

Properties

PropertyTypeDescription
cancel() => voidCancel the current generation immediately.
errorError | null-
finish() => voidSignal that no more text will be sent. WebSocket mode only.
isConnectingboolean-
isSpeakingboolean-
sendText(text) => voidSend one text chunk without finishing. WebSocket mode only.
speak(text) => voidStart TTS. Sends text (or pipes an async iterable in WebSocket mode) and generates audio.
stateTtsState-
stop() => Promise<void>Gracefully stop — sends finish and waits for completion.

AudioLevel()

function AudioLevel(__namedParameters): ReactNode;

Parameters

ParameterType
__namedParametersAudioLevelProps

Returns

ReactNode


SonioxProvider()

function SonioxProvider(props): ReactNode;

Parameters

ParameterType
propsSonioxProviderProps

Returns

ReactNode


checkAudioSupport()

function checkAudioSupport(): AudioSupportResult;

Check whether the current environment supports the built-in browser MicrophoneSource (which uses navigator.mediaDevices.getUserMedia).

This does not reflect general recording capability — custom AudioSource implementations (e.g. for React Native) bypass this check entirely and can record regardless of the result.

Returns

AudioSupportResult

Platform

browser


useAudioLevel()

function useAudioLevel(options?): UseAudioLevelReturn;

Parameters

ParameterType
options?UseAudioLevelOptions

Returns

UseAudioLevelReturn


useMicrophonePermission()

function useMicrophonePermission(options?): MicrophonePermissionState;

Parameters

ParameterType
options?UseMicrophonePermissionOptions

Returns

MicrophonePermissionState


useRecording()

function useRecording(config): UseRecordingReturn;

Parameters

ParameterType
configUseRecordingConfig

Returns

UseRecordingReturn


useSoniox()

function useSoniox(): SonioxClient;

Returns the SonioxClient instance provided by the nearest SonioxProvider

Returns

SonioxClient

Throws

Error if called outside a SonioxProvider


useTts()

function useTts(config): UseTtsReturn;

Parameters

ParameterType
configUseTtsConfig

Returns

UseTtsReturn