Real-time transcription with React SDK
Create and manage real-time speech-to-text sessions with the Soniox React SDK
Soniox React SDK supports real-time transcription via React hooks, built on top of the @soniox/client Web SDK. This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences.
You can capture audio from the user's microphone, receive transcription results as reactive state, and control sessions with simple start/stop calls.
Soniox Provider
SonioxProvider creates and shares a single SonioxClient instance via React context. Place it near the root of your component tree.
With configuration props
With a pre-built client
useRecording
useRecording is the primary hook for real-time speech-to-text. Returns reactive transcript state
and control methods. Returns UseRecordingReturn which contains reactive state and
control methods.
Handle session events
| Callback | Signature | Description |
|---|---|---|
onResult | (result: RealtimeResult) => void | Called on each result from the server. |
onEndpoint | () => void | Called when an endpoint is detected. |
onError | (error: Error) => void | Called when an error occurs. |
onStateChange | (update: { old_state, new_state }) => void | Called on each state transition. |
onFinished | () => void | Called when the recording session finishes. |
onConnected | () => void | Called when the WebSocket connects. |
onSourceMuted | () => void | Called when the audio source is muted externally. |
onSourceUnmuted | () => void | Called when the audio source is unmuted. |
Session lifecycle
Recording state
| Field | Type | Description |
|---|---|---|
state | RecordingState | Current lifecycle state ('idle', 'recording', 'paused', etc.). |
isActive | boolean | true when state is not idle/stopped/canceled/error. |
isRecording | boolean | true when state === 'recording'. |
isPaused | boolean | true when state === 'paused'. |
isSourceMuted | boolean | true when the audio source is muted externally. |
Available methods
| Method | Signature | Description |
|---|---|---|
start | () => void | Start a new recording. Aborts any in-flight recording first. |
stop | () => Promise<void> | Gracefully stop — waits for final results from the server. |
cancel | () => void | Immediately cancel — does not wait for final results. |
pause | () => void | Pause audio capture (keepalive keeps connection open). |
resume | () => void | Resume after pause. |
finalize | (options?) => void | Request the server to finalize current non-final tokens. |
clearTranscript | () => void | Clear transcript state (finalText, partialText, utterances, etc.). |
Endpoint detection and manual finalization
Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.
Read more about Endpoint detection
Enable endpoint detection by setting enable_endpoint_detection: true in the hook configuration.
Use the onEndpoint callback to know when a speaker has finished speaking.
Manual finalization gives you precise control over when audio should be finalized — useful for push-to-talk systems and client-side voice activity detection (VAD).
Read more about Manual finalization
The finalize function is returned by useRecording and can be called at any time during an active recording:
Pause, resume and muting audio source
The pause and resume functions are returned by useRecording. The isPaused flag reflects the current pause state reactively.
The hook also tracks system-level mute events via isSourceMuted.
When the audio source is muted externally (e.g. OS-level or hardware mute), keepalive messages are sent automatically to keep the session alive.
You can listen for mute state changes with the onSourceMuted and onSourceUnmuted callbacks.
You are billed for the full stream duration even when the session is paused.
Handling translation
The React SDK supports one-way and two-way real-time translation. Configure translation in the useRecording hook config. The hook automatically groups tokens by translation status or language via the groups snapshot field, so you can render original and translated text separately without manual filtering.
One-way translation
Translates all spoken audio into a single target language.
When translation is provided with type: one_way, the hook automatically sets groupBy: 'translation', splitting tokens into original and translation groups.
Two-way translation
Translates between two languages — each speaker's speech is translated into the other language.
When translation is provided with type: two_way, the hook automatically sets groupBy: 'language', splitting tokens by language code (e.g. en, fr).
Learn more about Real-time translation
Utterances
When enable_endpoint_detection is enabled, the utterances array accumulates utterances separated by natural pauses:
Learn more about Endpoint detection
Token grouping
The groupBy option splits tokens into named groups, accessible via
recording.groups. This is particularly useful for translation and
multi-speaker scenarios.
groupBy strategies
| Value | Keys | Description |
|---|---|---|
'translation' | "original", "translation" | Group by translation_status. |
'language' | Language codes (e.g. "en", "fr") | Group by token language field. |
'speaker' | Speaker IDs (e.g. "1") | Group by token speaker field. |
(token) => string | Custom keys | Custom grouping function. |
Learn more about Speaker diarization
TokenGroup fields
Each group in recording.groups contains:
| Field | Type | Description |
|---|---|---|
text | string | Full text: finalText + partialText. |
finalText | string | Accumulated finalized text in this group. |
partialText | string | Text from current non-final tokens. |
partialTokens | RealtimeToken[] | Current non-final tokens (from the latest result only). |
Automatic grouping for translation
When a translation config is provided, groupBy is set automatically:
one_waytranslation → groups by'translation'(keys:"original","translation")two_waytranslation → groups by'language'(keys: language codes like"en","es")
useSoniox
Returns the SonioxClient instance from the nearest SonioxProvider. Useful for low-level session access.
useMicrophonePermission
Hook for checking and requesting microphone permission before recording.
Requires a SonioxProvider with a permission resolver configured (default in browsers).
Options
| Option | Type | Default | Description |
|---|---|---|---|
autoCheck | boolean | false | Automatically check permission on mount. |
Return value
| Field | Type | Description |
|---|---|---|
status | MicPermissionStatus | Current status: 'granted', 'denied', 'prompt', 'unavailable', 'unsupported', or 'unknown'. |
canRequest | boolean | Whether the user can be prompted again. false when permanently denied. |
isGranted | boolean | status === 'granted'. |
isDenied | boolean | status === 'denied'. |
isSupported | boolean | Whether permission checking is available. |
check | () => Promise<void> | Check (or re-check) the microphone permission. No-op when unsupported. |
Status values
| Status | Description |
|---|---|
'granted' | Microphone access is granted. |
'denied' | Microphone access is denied. |
'prompt' | User hasn't been asked yet. |
'unavailable' | Permissions API not available in this browser. |
'unsupported' | No PermissionResolver configured in the provider. |
'unknown' | Initial state before the first check() call. |
useAudioLevel
Hook for real-time audio volume metering. Useful for building recording indicators and animations.
Next.js (App Router)
The package declares 'use client' at the entry point. All hooks must be used
inside Client Components. Server Components cannot use useRecording or other
hooks directly.