Soniox
SDKsReactSpeech-to-Text

Real-time transcription with React SDK

Create and manage real-time speech-to-text sessions with the Soniox React SDK

Soniox React SDK supports real-time transcription via React hooks, built on top of the @soniox/client Web SDK. This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences.

You can capture audio from the user's microphone, receive transcription results as reactive state, and control sessions with simple start/stop calls.

Soniox Provider

SonioxProvider creates and shares a single SonioxClient instance via React context. Place it near the root of your component tree.

With configuration props

import { SonioxProvider } from "@soniox/react";

function App() {
  return (
    <SonioxProvider
      config={async () => {
        const res = await fetch("/api/get-temporary-key", { method: "POST" });
        const { api_key } = await res.json();
        return { api_key };
      }}
    >
      {children}
    </SonioxProvider>
  );
}

The config resolver runs once per recording session and can return any SonioxConnectionConfig fields — for example region, stt_ws_url, or tts_ws_url — alongside the api_key.

With a pre-built client

import { SonioxClient } from "@soniox/client";
import { SonioxProvider } from "@soniox/react";

const client = new SonioxClient({
  config: async () => {
    const { api_key } = await fetchKey();
    return { api_key };
  },
});

function App() {
  return <SonioxProvider client={client}>{children}</SonioxProvider>;
}

useRecording

useRecording is the primary hook for real-time speech-to-text. Returns reactive transcript state and control methods. Returns UseRecordingReturn which contains reactive state and control methods.

function Transcriber() {
  const recording = useRecording({
    model: "stt-rt-v4",
    language_hints: ["en", "es"],
    enable_endpoint_detection: true,
  });

  return (
    <div>
      <p>State: {recording.state}</p>
      <p>{recording.text}</p>
      <button onClick={recording.start} disabled={recording.isActive}>
        Start
      </button>
      <button onClick={recording.stop} disabled={!recording.isActive}>
        Stop
      </button>
    </div>
  );
}

Handle session events

CallbackSignatureDescription
onResult(result: RealtimeResult) => voidCalled on each result from the server.
onToken(token: RealtimeToken) => voidCalled once per token, in addition to onResult.
onEndpoint() => voidCalled when an endpoint is detected.
onError(error: Error) => voidCalled when an error occurs.
onStateChange(update: { old_state, new_state, reason? }) => voidCalled on each state transition.
onFinished() => voidCalled when the recording session finishes.
onConnected() => voidCalled when the WebSocket connects.
onReconnecting({ attempt, max_attempts, delay_ms, preventDefault }) => voidCalled before each reconnect attempt (requires auto_reconnect).
onReconnected({ attempt }) => voidCalled after a successful reconnect.
onSessionRestart({ reset_transcript }) => voidCalled when a new STT session is started (initial or after reconnect).
onSourceMuted() => voidCalled when the audio source is muted externally.
onSourceUnmuted() => voidCalled when the audio source is unmuted.

Session lifecycle

Recording state

FieldTypeDescription
stateRecordingStateCurrent lifecycle state ('idle', 'recording', 'paused', etc.).
isActivebooleantrue when state is not idle/stopped/canceled/error.
isRecordingbooleantrue when state === 'recording'.
isPausedbooleantrue when state === 'paused'.
isSourceMutedbooleantrue when the audio source is muted externally.

Available methods

MethodSignatureDescription
start() => voidStart a new recording. Aborts any in-flight recording first.
stop() => Promise<void>Gracefully stop — waits for final results from the server.
cancel() => voidImmediately cancel — does not wait for final results.
pause() => voidPause audio capture (keepalive keeps connection open).
resume() => voidResume after pause.
finalize(options?) => voidRequest the server to finalize current non-final tokens.
clearTranscript() => voidClear transcript state (finalText, partialText, utterances, etc.).

Endpoint detection and manual finalization

Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.

Read more about Endpoint detection

Enable endpoint detection by setting enable_endpoint_detection: true in the hook configuration. Use the onEndpoint callback to know when a speaker has finished speaking.

const { start, stop, text } = useRecording({
  model: "stt-rt-v4",
  enable_endpoint_detection: true,
  onEndpoint: () => {
    console.log("--- speaker finished ---");
  },
});

Manual finalization gives you precise control over when audio should be finalized — useful for push-to-talk systems and client-side voice activity detection (VAD).

Read more about Manual finalization

The finalize function is returned by useRecording and can be called at any time during an active recording:

const { start, stop, finalize } = useRecording({});

// Later, when you want to force finalization:
finalize();

Pause, resume and muting audio source

The pause and resume functions are returned by useRecording. The isPaused flag reflects the current pause state reactively.

const { start, stop, pause, resume, isPaused } = useRecording({});

pause();  // keeps connection alive, drops audio while paused
resume(); // resume sending audio

SDK will finalize audio on pause. Make sure to adjust your VAD sensitivity to have enough silence before pause. Learn more about Manual finalization

The hook also tracks system-level mute events via isSourceMuted. When the audio source is muted externally (e.g. OS-level or hardware mute), keepalive messages are sent automatically to keep the session alive.

You can listen for mute state changes with the onSourceMuted and onSourceUnmuted callbacks.

const { isSourceMuted } = useRecording({
  onSourceMuted: () => {
    console.log("Microphone muted externally");
  },
  onSourceUnmuted: () => {
    console.log("Microphone unmuted");
  },
});

You are billed for the full stream duration even when the session is paused.

Auto-reconnect

useRecording can transparently recover from transient network drops. Opt in with auto_reconnect: true — on a retriable error, the hook tears down the current WebSocket and audio encoder, re-resolves the connection config, and starts a new session with exponential backoff. Audio captured during the reconnect is buffered and flushed on resume.

const { start, stop, state, isReconnecting, reconnectAttempt } = useRecording({
  model: "stt-rt-v4",
  auto_reconnect: true,
  max_reconnect_attempts: 3,       // default: 3
  reconnect_base_delay_ms: 1000,   // exponential backoff: 1s, 2s, 4s, ...
  onReconnecting: ({ attempt, max_attempts, delay_ms, preventDefault }) => {
    console.log(`Reconnect attempt ${attempt}/${max_attempts} in ${delay_ms}ms`);
    // preventDefault() cancels this attempt (e.g. for manual backoff control).
  },
  onReconnected: ({ attempt }) => {
    console.log(`Reconnected after ${attempt} attempt(s)`);
  },
});

Options

OptionTypeDefaultDescription
auto_reconnectbooleanfalseEnable automatic reconnection on retriable errors.
max_reconnect_attemptsnumber3Maximum consecutive attempts before surfacing the error.
reconnect_base_delay_msnumber1000Base delay in ms for exponential backoff (1x, 2x, 4x, ...).
reset_transcript_on_reconnectbooleanfalseClear accumulated transcript state on reconnect.

Return values

FieldTypeDescription
isReconnectingbooleantrue while the hook is in the reconnecting state.
reconnectAttemptnumberCurrent attempt number (resets to 0 after a successful reconnect).

Force a reconnect

useRecording also returns a reconnect() function — call it from platform lifecycle handlers like visibilitychange or React Native AppState to proactively rebuild the session when you suspect a stale connection. Requires auto_reconnect: true.

const { reconnect } = useRecording({
  model: "stt-rt-v4",
  auto_reconnect: true,
});

useEffect(() => {
  const onVisibility = () => {
    if (document.visibilityState === "visible") reconnect();
  };
  document.addEventListener("visibilitychange", onVisibility);
  return () => document.removeEventListener("visibilitychange", onVisibility);
}, [reconnect]);

Handling translation

The React SDK supports one-way and two-way real-time translation. Configure translation in the useRecording hook config. The hook automatically groups tokens by translation status or language via the groups snapshot field, so you can render original and translated text separately without manual filtering.

One-way translation

Translates all spoken audio into a single target language.

When translation is provided with type: one_way, the hook automatically sets groupBy: 'translation', splitting tokens into original and translation groups.

const { groups } = useRecording({
  model: "stt-rt-v4",
  translation: {
    type: "one_way",
    target_language: "es", // Translate everything to Spanish
  },
});

// Render grouped text
return (
  <div>
    <p>Original: {groups.original?.text}</p>
    <p>Translated: {groups.translation?.text}</p>
  </div>
);

Two-way translation

Translates between two languages — each speaker's speech is translated into the other language.

When translation is provided with type: two_way, the hook automatically sets groupBy: 'language', splitting tokens by language code (e.g. en, fr).

const { groups } = useRecording({
  model: "stt-rt-v4",
  translation: {
    type: "two_way",
    language_a: "en",
    language_b: "fr",
  },
});

// Render grouped text by language
return (
  <div>
    <p>English: {groups.en?.text}</p>
    <p>French: {groups.fr?.text}</p>
  </div>
);

Learn more about Real-time translation

Utterances

When enable_endpoint_detection is enabled, the utterances array accumulates utterances separated by natural pauses:

function TranscriptWithUtterances() {
  const { utterances, partialText, start, stop, isActive } = useRecording({
    model: "stt-rt-v4",
    enable_endpoint_detection: true,
  });

  return (
    <div>
      <button onClick={isActive ? stop : start}>
        {isActive ? "Stop" : "Start"}
      </button>

      {utterances.map((utterance, i) => (
        <p key={i}>{utterance.text}</p>
      ))}

      {partialText && <p className="text-gray-400">{partialText}</p>}
    </div>
  );
}

Learn more about Endpoint detection

Token grouping

The groupBy option splits tokens into named groups, accessible via recording.groups. This is particularly useful for translation and multi-speaker scenarios.

groupBy strategies

ValueKeysDescription
'translation'"original", "translation"Group by translation_status.
'language'Language codes (e.g. "en", "fr")Group by token language field.
'speaker'Speaker IDs (e.g. "1")Group by token speaker field.
(token) => stringCustom keysCustom grouping function.

Learn more about Speaker diarization

TokenGroup fields

Each group in recording.groups contains:

FieldTypeDescription
textstringFull text: finalText + partialText.
finalTextstringAccumulated finalized text in this group.
partialTextstringText from current non-final tokens.
partialTokensRealtimeToken[]Current non-final tokens (from the latest result only).

Automatic grouping for translation

When a translation config is provided, groupBy is set automatically:

  • one_way translation → groups by 'translation' (keys: "original", "translation")
  • two_way translation → groups by 'language' (keys: language codes like "en", "es")
function TranslatedTranscript() {
  const { groups, start, stop, isActive } = useRecording({
    model: "stt-rt-v4",
    translation: { type: "one_way", target_language: "es" },
  });

  return (
    <div>
      <button onClick={isActive ? stop : start}>
        {isActive ? "Stop" : "Start"}
      </button>

      <div>
        <h3>Original</h3>
        <p>{groups.original?.text}</p>
      </div>
      <div>
        <h3>Translation</h3>
        <p>{groups.translation?.text}</p>
      </div>
    </div>
  );
}

useSoniox

Returns the SonioxClient instance from the nearest SonioxProvider. Useful for low-level session access.

import { useSoniox } from "@soniox/react";

function MyComponent() {
  const client = useSoniox();

  // Low-level session access (non-reactive):
  // const session = client.realtime.stt({ model: 'stt-rt-v4' }, { api_key: '...' });
  //
  // Permission helpers (if a resolver is configured):
  // const result = await client.permissions?.check('microphone');

  return null;
}

client.tts in the browser only exposes generate() and generateStream(). To enumerate available TTS models and voices, use the Node SDK's client.tts.listModels() on your server.

useMicrophonePermission

Hook for checking and requesting microphone permission before recording. Requires a SonioxProvider with a permission resolver configured (default in browsers).

import { useMicrophonePermission } from "@soniox/react";

function PermissionGate({ children }) {
  const mic = useMicrophonePermission({ autoCheck: true });

  if (!mic.isSupported) {
    return <p>Microphone permissions are not available.</p>;
  }

  if (mic.status === "unknown") {
    return <p>Checking permission...</p>;
  }

  if (mic.isDenied) {
    return (
      <div>
        <p>Microphone access denied.</p>
        {!mic.canRequest && (
          <p>Please enable microphone access in your browser settings.</p>
        )}
      </div>
    );
  }

  if (mic.status === "prompt") {
    // `mic.check` re-queries the permission state. To actually show the
    // browser prompt, start a recording or call `getUserMedia({ audio: true })`
    // from the click handler.
    return (
      <button onClick={() => void navigator.mediaDevices.getUserMedia({ audio: true })}>
        Allow microphone access
      </button>
    );
  }

  return children;
}

Options

OptionTypeDefaultDescription
autoCheckbooleanfalseAutomatically check permission on mount.

Return value

FieldTypeDescription
statusMicPermissionStatusCurrent status: 'granted', 'denied', 'prompt', 'unavailable', 'unsupported', or 'unknown'.
canRequestbooleanWhether the user can be prompted again. false when permanently denied.
isGrantedbooleanstatus === 'granted'.
isDeniedbooleanstatus === 'denied'.
isSupportedbooleanWhether permission checking is available.
check() => Promise<void>Check (or re-check) the microphone permission. No-op when unsupported.

Status values

StatusDescription
'granted'Microphone access is granted.
'denied'Microphone access is denied.
'prompt'User hasn't been asked yet.
'unavailable'Permissions API not available in this browser.
'unsupported'No PermissionResolver configured in the provider.
'unknown'Initial state before the first check() call.

useAudioLevel

Hook for real-time audio volume metering. Useful for building recording indicators and animations.

import { useAudioLevel } from "@soniox/react";

function VolumeIndicator({ isActive }) {
  const { volume } = useAudioLevel({ active: isActive }); // float value between 0 and 1

  return (
    <div
      className="h-4 bg-green-500 rounded transition-all"
      style={{ width: `${volume * 100}%` }}
    />
  );
}

Next.js (App Router)

The package declares 'use client' at the entry point. All hooks must be used inside Client Components. Server Components cannot use useRecording or other hooks directly.