Soniox
SDKsReact

Real-time transcription with React SDK

Create and manage real-time speech-to-text sessions with the Soniox React SDK

Soniox React SDK supports real-time transcription via React hooks, built on top of the @soniox/client Web SDK. This allows you to transcribe live audio with low latency — ideal for live captions, voice input, and interactive experiences.

You can capture audio from the user's microphone, receive transcription results as reactive state, and control sessions with simple start/stop calls.

Soniox Provider

SonioxProvider creates and shares a single SonioxClient instance via React context. Place it near the root of your component tree.

With configuration props

import { SonioxProvider } from "@soniox/react";

function App() {
  return (
    <SonioxProvider
      apiKey={async () => {
        const res = await fetch("/api/get-temporary-key", { method: "POST" });
        return (await res.json()).api_key;
      }}
    >
      {children}
    </SonioxProvider>
  );
}

With a pre-built client

import { SonioxClient } from "@soniox/client";
import { SonioxProvider } from "@soniox/react";

const client = new SonioxClient({
  api_key: async () => fetchKey(),
});

function App() {
  return <SonioxProvider client={client}>{children}</SonioxProvider>;
}

useRecording

useRecording is the primary hook for real-time speech-to-text. Returns reactive transcript state and control methods. Returns UseRecordingReturn which contains reactive state and control methods.

function Transcriber() {
  const recording = useRecording({
    model: "stt-rt-v4",
    language_hints: ["en", "es"],
    enable_endpoint_detection: true,
  });

  return (
    <div>
      <p>State: {recording.state}</p>
      <p>{recording.text}</p>
      <button onClick={recording.start} disabled={recording.isActive}>
        Start
      </button>
      <button onClick={recording.stop} disabled={!recording.isActive}>
        Stop
      </button>
    </div>
  );
}

Handle session events

CallbackSignatureDescription
onResult(result: RealtimeResult) => voidCalled on each result from the server.
onEndpoint() => voidCalled when an endpoint is detected.
onError(error: Error) => voidCalled when an error occurs.
onStateChange(update: { old_state, new_state }) => voidCalled on each state transition.
onFinished() => voidCalled when the recording session finishes.
onConnected() => voidCalled when the WebSocket connects.
onSourceMuted() => voidCalled when the audio source is muted externally.
onSourceUnmuted() => voidCalled when the audio source is unmuted.

Session lifecycle

Recording state

FieldTypeDescription
stateRecordingStateCurrent lifecycle state ('idle', 'recording', 'paused', etc.).
isActivebooleantrue when state is not idle/stopped/canceled/error.
isRecordingbooleantrue when state === 'recording'.
isPausedbooleantrue when state === 'paused'.
isSourceMutedbooleantrue when the audio source is muted externally.

Available methods

MethodSignatureDescription
start() => voidStart a new recording. Aborts any in-flight recording first.
stop() => Promise<void>Gracefully stop — waits for final results from the server.
cancel() => voidImmediately cancel — does not wait for final results.
pause() => voidPause audio capture (keepalive keeps connection open).
resume() => voidResume after pause.
finalize(options?) => voidRequest the server to finalize current non-final tokens.
clearTranscript() => voidClear transcript state (finalText, partialText, utterances, etc.).

Endpoint detection and manual finalization

Endpoint detection lets you know when a speaker has finished speaking. This is critical for real-time voice AI assistants, command-and-response systems, and conversational apps where you want to respond immediately without waiting for long silences.

Read more about Endpoint detection

Enable endpoint detection by setting enable_endpoint_detection: true in the hook configuration. Use the onEndpoint callback to know when a speaker has finished speaking.

const { start, stop, text } = useRecording({
  apiKey: "<YOUR_API_KEY>",  
  model: "stt-rt-v4",  
  enable_endpoint_detection: true,  
  onEndpoint: () => {    console.log("--- speaker finished ---");  },
});

Manual finalization gives you precise control over when audio should be finalized — useful for push-to-talk systems and client-side voice activity detection (VAD).

Read more about Manual finalization

The finalize function is returned by useRecording and can be called at any time during an active recording:

const { start, stop, finalize } = useRecording({});

// Later, when you want to force finalization:finalize();

Pause, resume and muting audio source

The pause and resume functions are returned by useRecording. The isPaused flag reflects the current pause state reactively.

const { start, stop, pause, resume, isPaused } = useRecording({});
pause();   // keeps connection alive, drops audio while pausedresume();  // resume sending audio

The hook also tracks system-level mute events via isSourceMuted. When the audio source is muted externally (e.g. OS-level or hardware mute), keepalive messages are sent automatically to keep the session alive.

You can listen for mute state changes with the onSourceMuted and onSourceUnmuted callbacks.

const { isSourceMuted } = useRecording({  
  onSourceMuted: () => {    
    console.log("Microphone muted externally");  
  },  
  onSourceUnmuted: () => {    
    console.log("Microphone unmuted");  
    },
});

You are billed for the full stream duration even when the session is paused.

Handling translation

The React SDK supports one-way and two-way real-time translation. Configure translation in the useRecording hook config. The hook automatically groups tokens by translation status or language via the groups snapshot field, so you can render original and translated text separately without manual filtering.

One-way translation

Translates all spoken audio into a single target language.

When translation is provided with type: one_way, the hook automatically sets groupBy: 'translation', splitting tokens into original and translation groups.

const { groups } = useRecording({
  apiKey: "<YOUR_API_KEY>",
  model: "stt-rt-preview",
  translation: {
    type: "one_way",
    target_language: "es", // Translate everything to Spanish
  },
});

// Render grouped text
return (
  <div>
    <p>Original: {groups.original?.text}</p>
    <p>Translated: {groups.translation?.text}</p>
  </div>
);

Two-way translation

Translates between two languages — each speaker's speech is translated into the other language.

When translation is provided with type: two_way, the hook automatically sets groupBy: 'language', splitting tokens by language code (e.g. en, fr).

const { groups } = useRecording({
  apiKey: "<YOUR_API_KEY>",
  model: "stt-rt-preview",
  translation: {
    type: "two_way",
    language_a: "en",
    language_b: "fr",
  },
});

// Render grouped text by language
return (
  <div>
    <p>English: {groups.en?.text}</p>
    <p>French: {groups.fr?.text}</p>
  </div>
);

Learn more about Real-time translation

Utterances

When enable_endpoint_detection is enabled, the utterances array accumulates utterances separated by natural pauses:

function TranscriptWithUtterances() {
  const { utterances, partialText, start, stop, isActive } = useRecording({
    model: "stt-rt-v4",
    enable_endpoint_detection: true,
  });

  return (
    <div>
      <button onClick={isActive ? stop : start}>
        {isActive ? "Stop" : "Start"}
      </button>

      {utterances.map((utterance, i) => (
        <p key={i}>{utterance.text}</p>
      ))}

      {partialText && <p className="text-gray-400">{partialText}</p>}
    </div>
  );
}

Learn more about Endpoint detection

Token grouping

The groupBy option splits tokens into named groups, accessible via recording.groups. This is particularly useful for translation and multi-speaker scenarios.

groupBy strategies

ValueKeysDescription
'translation'"original", "translation"Group by translation_status.
'language'Language codes (e.g. "en", "fr")Group by token language field.
'speaker'Speaker IDs (e.g. "1")Group by token speaker field.
(token) => stringCustom keysCustom grouping function.

Learn more about Speaker diarization

TokenGroup fields

Each group in recording.groups contains:

FieldTypeDescription
textstringFull text: finalText + partialText.
finalTextstringAccumulated finalized text in this group.
partialTextstringText from current non-final tokens.
partialTokensRealtimeToken[]Current non-final tokens (from the latest result only).

Automatic grouping for translation

When a translation config is provided, groupBy is set automatically:

  • one_way translation → groups by 'translation' (keys: "original", "translation")
  • two_way translation → groups by 'language' (keys: language codes like "en", "es")
function TranslatedTranscript() {
  const { groups, start, stop, isActive } = useRecording({
    model: "stt-rt-v4",
    translation: { type: "one_way", target_language: "es" },
  });

  return (
    <div>
      <button onClick={isActive ? stop : start}>
        {isActive ? "Stop" : "Start"}
      </button>

      <div>
        <h3>Original</h3>
        <p>{groups.original?.text}</p>
      </div>
      <div>
        <h3>Translation</h3>
        <p>{groups.translation?.text}</p>
      </div>
    </div>
  );
}

useSoniox

Returns the SonioxClient instance from the nearest SonioxProvider. Useful for low-level session access.

import { useSoniox } from "@soniox/react";

function MyComponent() {
  const client = useSoniox();
  // Use client.realtime.stt() for low-level session access
  // Use client.permissions for permission checks
}

useMicrophonePermission

Hook for checking and requesting microphone permission before recording. Requires a SonioxProvider with a permission resolver configured (default in browsers).

import { useMicrophonePermission } from "@soniox/react";

function PermissionGate({ children }) {
  const mic = useMicrophonePermission({ autoCheck: true });

  if (!mic.isSupported) {
    return <p>Microphone permissions are not available.</p>;
  }

  if (mic.status === "unknown") {
    return <p>Checking permission...</p>;
  }

  if (mic.isDenied) {
    return (
      <div>
        <p>Microphone access denied.</p>
        {!mic.canRequest && (
          <p>Please enable microphone access in your browser settings.</p>
        )}
      </div>
    );
  }

  if (mic.status === "prompt") {
    return (
      <button onClick={mic.check}>
        Allow microphone access
      </button>
    );
  }

  return children;
}

Options

OptionTypeDefaultDescription
autoCheckbooleanfalseAutomatically check permission on mount.

Return value

FieldTypeDescription
statusMicPermissionStatusCurrent status: 'granted', 'denied', 'prompt', 'unavailable', 'unsupported', or 'unknown'.
canRequestbooleanWhether the user can be prompted again. false when permanently denied.
isGrantedbooleanstatus === 'granted'.
isDeniedbooleanstatus === 'denied'.
isSupportedbooleanWhether permission checking is available.
check() => Promise<void>Check (or re-check) the microphone permission. No-op when unsupported.

Status values

StatusDescription
'granted'Microphone access is granted.
'denied'Microphone access is denied.
'prompt'User hasn't been asked yet.
'unavailable'Permissions API not available in this browser.
'unsupported'No PermissionResolver configured in the provider.
'unknown'Initial state before the first check() call.

useAudioLevel

Hook for real-time audio volume metering. Useful for building recording indicators and animations.

import { useAudioLevel } from "@soniox/react";

function VolumeIndicator({ isActive }) {
  const { volume } = useAudioLevel({ active: isActive }); // float value between 0 and 1

  return (
    <div
      className="h-4 bg-green-500 rounded transition-all"
      style={{ width: `${volume * 100}%` }}
    />
  );
}

Next.js (App Router)

The package declares 'use client' at the entry point. All hooks must be used inside Client Components. Server Components cannot use useRecording or other hooks directly.