Real-time transcription with React SDK

Soniox React SDK supports real-time transcription via React hooks, built on top of the @soniox/client Web SDK. This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences.

You can capture audio from the user's microphone, receive transcription results as reactive state, and control sessions with simple start/stop calls.

Soniox Provider

SonioxProvider creates and shares a single SonioxClient instance via React context. Place it near the root of your component tree.

With configuration props

import { SonioxProvider } from "@soniox/react";

function App() {
  return (
    <SonioxProvider
      config={async () => {
        const res = await fetch("/api/get-temporary-key", { method: "POST" });
        const { api_key } = await res.json();
        return { api_key };
      }}
    >
      {children}
    </SonioxProvider>
  );
}

The config resolver runs once per recording session and can return any SonioxConnectionConfig fields — for example region, stt_ws_url, or tts_ws_url — alongside the api_key.

With a pre-built client

import { SonioxClient } from "@soniox/client";
import { SonioxProvider } from "@soniox/react";

const client = new SonioxClient({
  config: async () => {
    const { api_key } = await fetchKey();
    return { api_key };
  },
});

function App() {
  return <SonioxProvider client={client}>{children}</SonioxProvider>;
}

`useRecording`

useRecording is the primary hook for real-time speech-to-text. Returns reactive transcript state and control methods. Returns UseRecordingReturn which contains reactive state and control methods.

function Transcriber() {
  const recording = useRecording({
    model: "stt-rt-v5",
    language_hints: ["en", "es"],
    enable_endpoint_detection: true,
  });

  return (
    <div>
      <p>State: {recording.state}</p>
      <p>{recording.text}</p>
      <button onClick={recording.start} disabled={recording.isActive}>
        Start
      </button>
      <button onClick={recording.stop} disabled={!recording.isActive}>
        Stop
      </button>
    </div>
  );
}

Handle session events

Callback	Signature	Description
`onResult`	`(result: RealtimeResult) => void`	Called on each result from the server.
`onToken`	`(token: RealtimeToken) => void`	Called once per token, in addition to `onResult`.
`onEndpoint`	`() => void`	Called when an endpoint is detected.
`onError`	`(error: Error) => void`	Called when an error occurs.
`onStateChange`	`(update: { old_state, new_state, reason? }) => void`	Called on each state transition.
`onFinished`	`() => void`	Called when the recording session finishes.
`onConnected`	`() => void`	Called when the WebSocket connects.
`onReconnecting`	`({ attempt, max_attempts, delay_ms, preventDefault }) => void`	Called before each reconnect attempt (requires `auto_reconnect`).
`onReconnected`	`({ attempt }) => void`	Called after a successful reconnect.
`onSessionRestart`	`({ reset_transcript }) => void`	Called when a new STT session is started (initial or after reconnect).
`onSourceMuted`	`() => void`	Called when the audio source is muted externally.
`onSourceUnmuted`	`() => void`	Called when the audio source is unmuted.

Session lifecycle

Recording state

Field	Type	Description
`state`	`RecordingState`	Current lifecycle state (`'idle'`, `'recording'`, `'paused'`, etc.).
`isActive`	`boolean`	`true` when state is not `idle`/`stopped`/`canceled`/`error`.
`isRecording`	`boolean`	`true` when `state === 'recording'`.
`isPaused`	`boolean`	`true` when `state === 'paused'`.
`isSourceMuted`	`boolean`	`true` when the audio source is muted externally.

Available methods

Method	Signature	Description
`start`	`() => void`	Start a new recording. Aborts any in-flight recording first.
`stop`	`() => Promise<void>`	Gracefully stop — waits for final results from the server.
`cancel`	`() => void`	Immediately cancel — does not wait for final results.
`pause`	`() => void`	Pause audio capture (keepalive keeps connection open).
`resume`	`() => void`	Resume after pause.
`finalize`	`(options?) => void`	Request the server to finalize current non-final tokens.
`clearTranscript`	`() => void`	Clear transcript state (`finalText`, `partialText`, `utterances`, etc.).

Endpoint detection and manual finalization

Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.

Pause, resume and muting audio source

The pause and resume functions are returned by useRecording. The isPaused flag reflects the current pause state reactively.

const { start, stop, pause, resume, isPaused } = useRecording({});

pause();  // keeps connection alive, drops audio while paused
resume(); // resume sending audio

SDK will finalize audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about Manual finalization

The hook also tracks system-level mute events via isSourceMuted. When the audio source is muted externally (e.g. OS-level or hardware mute), keepalive messages are sent automatically to keep the session alive.

You can listen for mute state changes with the onSourceMuted and onSourceUnmuted callbacks.

const { isSourceMuted } = useRecording({
  onSourceMuted: () => {
    console.log("Microphone muted externally");
  },
  onSourceUnmuted: () => {
    console.log("Microphone unmuted");
  },
});

You are billed for the full stream duration even when the session is paused.

Auto-reconnect

useRecording can transparently recover from transient network drops. Opt in with auto_reconnect: true — on a retriable error, the hook tears down the current WebSocket and audio encoder, re-resolves the connection config, and starts a new session with exponential backoff. Audio captured during the reconnect is buffered and flushed on resume.

const { start, stop, state, isReconnecting, reconnectAttempt } = useRecording({
  model: "stt-rt-v5",
  auto_reconnect: true,
  max_reconnect_attempts: 3,       // default: 3
  reconnect_base_delay_ms: 1000,   // exponential backoff: 1s, 2s, 4s, ...
  onReconnecting: ({ attempt, max_attempts, delay_ms, preventDefault }) => {
    console.log(`Reconnect attempt ${attempt}/${max_attempts} in ${delay_ms}ms`);
    // preventDefault() cancels this attempt (e.g. for manual backoff control).
  },
  onReconnected: ({ attempt }) => {
    console.log(`Reconnected after ${attempt} attempt(s)`);
  },
});

Options

Option	Type	Default	Description
`auto_reconnect`	`boolean`	`false`	Enable automatic reconnection on retriable errors.
`max_reconnect_attempts`	`number`	`3`	Maximum consecutive attempts before surfacing the error.
`reconnect_base_delay_ms`	`number`	`1000`	Base delay in ms for exponential backoff (1x, 2x, 4x, ...).
`reset_transcript_on_reconnect`	`boolean`	`false`	Clear accumulated transcript state on reconnect.

Return values

Field	Type	Description
`isReconnecting`	`boolean`	`true` while the hook is in the `reconnecting` state.
`reconnectAttempt`	`number`	Current attempt number (resets to `0` after a successful reconnect).

Force a reconnect

useRecording also returns a reconnect() function — call it from platform lifecycle handlers like visibilitychange or React Native AppState to proactively rebuild the session when you suspect a stale connection. Requires auto_reconnect: true.

const { reconnect } = useRecording({
  model: "stt-rt-v5",
  auto_reconnect: true,
});

useEffect(() => {
  const onVisibility = () => {
    if (document.visibilityState === "visible") reconnect();
  };
  document.addEventListener("visibilitychange", onVisibility);
  return () => document.removeEventListener("visibilitychange", onVisibility);
}, [reconnect]);

Handling translation

The React SDK supports one-way and two-way real-time translation. Configure translation in the useRecording hook config. The hook automatically groups tokens by translation status or language via the groups snapshot field, so you can render original and translated text separately without manual filtering.

One-way translation

Translates all spoken audio into a single target language.

When translation is provided with type: one_way, the hook automatically sets groupBy: 'translation', splitting tokens into original and translation groups.

const { groups } = useRecording({
  model: "stt-rt-v5",
  translation: {
    type: "one_way",
    target_language: "es", // Translate everything to Spanish
  },
});

// Render grouped text
return (
  <div>
    <p>Original: {groups.original?.text}</p>
    <p>Translated: {groups.translation?.text}</p>
  </div>
);

Two-way translation

Translates between two languages — each speaker's speech is translated into the other language.

When translation is provided with type: two_way, the hook automatically sets groupBy: 'language', splitting tokens by language code (e.g. en, fr).

const { groups } = useRecording({
  model: "stt-rt-v5",
  translation: {
    type: "two_way",
    language_a: "en",
    language_b: "fr",
  },
});

// Render grouped text by language
return (
  <div>
    <p>English: {groups.en?.text}</p>
    <p>French: {groups.fr?.text}</p>
  </div>
);

Learn more about Real-time translation

Utterances

When enable_endpoint_detection is enabled, the utterances array accumulates utterances separated by natural pauses:

function TranscriptWithUtterances() {
  const { utterances, partialText, start, stop, isActive } = useRecording({
    model: "stt-rt-v5",
    enable_endpoint_detection: true,
  });

  return (
    <div>
      <button onClick={isActive ? stop : start}>
        {isActive ? "Stop" : "Start"}
      </button>

      {utterances.map((utterance, i) => (
        <p key={i}>{utterance.text}</p>
      ))}

      {partialText && <p className="text-gray-400">{partialText}</p>}
    </div>
  );
}

Learn more about Endpoint detection

Token grouping

The groupBy option splits tokens into named groups, accessible via recording.groups. This is particularly useful for translation and multi-speaker scenarios.

`groupBy` strategies

Value	Keys	Description
`'translation'`	`"original"`, `"translation"`	Group by `translation_status`.
`'language'`	Language codes (e.g. `"en"`, `"fr"`)	Group by token `language` field.
`'speaker'`	Speaker IDs (e.g. `"1"`)	Group by token `speaker` field.
`(token) => string`	Custom keys	Custom grouping function.

Learn more about Speaker diarization

TokenGroup fields

Each group in recording.groups contains:

Field	Type	Description
`text`	`string`	Full text: `finalText + partialText`.
`finalText`	`string`	Accumulated finalized text in this group.
`partialText`	`string`	Text from current non-final tokens.
`partialTokens`	`RealtimeToken[]`	Current non-final tokens (from the latest result only).

Automatic grouping for translation

When a translation config is provided, groupBy is set automatically:

one_way translation → groups by 'translation' (keys: "original", "translation")
two_way translation → groups by 'language' (keys: language codes like "en", "es")

function TranslatedTranscript() {
  const { groups, start, stop, isActive } = useRecording({
    model: "stt-rt-v5",
    translation: { type: "one_way", target_language: "es" },
  });

  return (
    <div>
      <button onClick={isActive ? stop : start}>
        {isActive ? "Stop" : "Start"}
      </button>

      <div>
        <h3>Original</h3>
        <p>{groups.original?.text}</p>
      </div>
      <div>
        <h3>Translation</h3>
        <p>{groups.translation?.text}</p>
      </div>
    </div>
  );
}

`useSoniox`

Returns the SonioxClient instance from the nearest SonioxProvider. Useful for low-level session access.

import { useSoniox } from "@soniox/react";

function MyComponent() {
  const client = useSoniox();

  // Low-level session access (non-reactive):
  // const session = client.realtime.stt({ model: 'stt-rt-v5' }, { api_key: '...' });
  //
  // Permission helpers (if a resolver is configured):
  // const result = await client.permissions?.check('microphone');

  return null;
}

client.tts in the browser only exposes generate() and generateStream(). To enumerate available TTS models and voices, use the Node SDK's client.tts.listModels() on your server.

`useMicrophonePermission`

Hook for checking and requesting microphone permission before recording. Requires a SonioxProvider with a permission resolver configured (default in browsers).

import { useMicrophonePermission } from "@soniox/react";

function PermissionGate({ children }) {
  const mic = useMicrophonePermission({ autoCheck: true });

  if (!mic.isSupported) {
    return <p>Microphone permissions are not available.</p>;
  }

  if (mic.status === "unknown") {
    return <p>Checking permission...</p>;
  }

  if (mic.isDenied) {
    return (
      <div>
        <p>Microphone access denied.</p>
        {!mic.canRequest && (
          <p>Please enable microphone access in your browser settings.</p>
        )}
      </div>
    );
  }

  if (mic.status === "prompt") {
    // `mic.check` re-queries the permission state. To actually show the
    // browser prompt, start a recording or call `getUserMedia({ audio: true })`
    // from the click handler.
    return (
      <button onClick={() => void navigator.mediaDevices.getUserMedia({ audio: true })}>
        Allow microphone access
      </button>
    );
  }

  return children;
}

Options

Option	Type	Default	Description
`autoCheck`	`boolean`	`false`	Automatically check permission on mount.

Return value

Field	Type	Description
`status`	`MicPermissionStatus`	Current status: `'granted'`, `'denied'`, `'prompt'`, `'unavailable'`, `'unsupported'`, or `'unknown'`.
`canRequest`	`boolean`	Whether the user can be prompted again. `false` when permanently denied.
`isGranted`	`boolean`	`status === 'granted'`.
`isDenied`	`boolean`	`status === 'denied'`.
`isSupported`	`boolean`	Whether permission checking is available.
`check`	`() => Promise<void>`	Check (or re-check) the microphone permission. No-op when unsupported.

Status values

Status	Description
`'granted'`	Microphone access is granted.
`'denied'`	Microphone access is denied.
`'prompt'`	User hasn't been asked yet.
`'unavailable'`	Permissions API not available in this browser.
`'unsupported'`	No `PermissionResolver` configured in the provider.
`'unknown'`	Initial state before the first `check()` call.

`useAudioLevel`

Hook for real-time audio volume metering. Useful for building recording indicators and animations.

import { useAudioLevel } from "@soniox/react";

function VolumeIndicator({ isActive }) {
  const { volume } = useAudioLevel({ active: isActive }); // float value between 0 and 1

  return (
    <div
      className="h-4 bg-green-500 rounded transition-all"
      style={{ width: `${volume * 100}%` }}
    />
  );
}

Next.js (App Router)

The package declares 'use client' at the entry point. All hooks must be used inside Client Components. Server Components cannot use useRecording or other hooks directly.

Next.js example

Transcription and translation example with temporary API key generation.

Real-time transcription with React SDK

On this page