Soniox
SDKsReactText-to-Speech

Real-time speech generation with React SDK

Stream text to speech in React with the useTts hook

The Soniox React SDK exposes a single useTts hook that covers both real-time WebSocket TTS and one-shot REST TTS. It manages the stream lifecycle, surfaces reactive state for rendering, and handles cleanup automatically when your component unmounts.

Transport modes

useTts has two modes selected via the mode option:

Feature'websocket' (default)'rest'
speak(string)YesYes
speak(asyncIterable) — LLM token streamingYesNo
sendText() / finish() — incremental textYesNo-op
Audio deliveryStreaming chunksStreaming chunks
Error detectionFull (in-band WebSocket errors)Pre-stream only (HTTP status)
Use when...Narrating LLM output, lowest latency to first audioYou have the full text and want a simple HTTP request

Set up your temporary API key endpoint

In a browser environment you don't want to expose your primary API key. Create a temporary key endpoint on your server using the Soniox Node SDK. TTS keys use the tts_rt usage type.

import express from 'express';
import { SonioxNodeClient } from '@soniox/node';

const app = express();
const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env

app.get('/tts-tmp-key', async (_req, res) => {
  try {
    const { api_key, expires_at } = await client.auth.createTemporaryKey({
      usage_type: 'tts_rt',
      expires_in_seconds: 300,
    });
    res.json({ api_key, expires_at });
  } catch (err) {
    res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' });
  }
});

app.listen(3000);

Because TTS and STT temporary keys have different usage_type values, useTts always creates its own client from the inline config prop — even when a <SonioxProvider> is present. Pass the config prop on useTts with a resolver that fetches a tts_rt key.

Quickstart

Pass a config resolver and a voice — that's all that's required. speak() starts generation; state reflects the lifecycle; audio plays via the onAudio callback or any hook-up of your choice.

import { useTts } from "@soniox/react";
import { useRef } from "react";

async function fetchTtsKey() {
  const res = await fetch("/tts-tmp-key");
  const { api_key } = await res.json();
  return { api_key };
}

export function TtsButton() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  const { speak, state, isSpeaking, error } = useTts({
    config: fetchTtsKey,
    voice: "Adrian",
    model: "tts-rt-v1-preview",
    language: "en",
    audio_format: "wav",
    onAudio: (chunk) => {
      audioChunksRef.current.push(chunk);
    },
    onTerminated: () => {
      const blob = new Blob(audioChunksRef.current, { type: "audio/wav" });
      new Audio(URL.createObjectURL(blob)).play();
      audioChunksRef.current = [];
    },
    onError: (err) => console.error(err),
  });

  return (
    <div>
      <button onClick={() => speak("Hello from Soniox React SDK!")} disabled={isSpeaking}>
        Speak
      </button>
      <p>State: {state}</p>
      {error && <p style={{ color: "red" }}>{error.message}</p>}
    </div>
  );
}

Stream from an LLM

WebSocket mode accepts an AsyncIterable<string> — pipe LLM tokens straight into speak() and audio starts playing as the first tokens arrive.

async function* llmTokens(prompt: string): AsyncIterable<string> {
  const res = await fetch("/llm/stream", {
    method: "POST",
    body: JSON.stringify({ prompt }),
  });
  const reader = res.body!.getReader();
  const decoder = new TextDecoder();
  while (true) {
    const { value, done } = await reader.read();
    if (done) return;
    yield decoder.decode(value);
  }
}

function Narrator() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  const { speak, isSpeaking, stop } = useTts({
    config: fetchTtsKey,
    voice: "Adrian",
    audio_format: "wav",
    onAudio: (chunk) => audioChunksRef.current.push(chunk),
  });

  return (
    <button
      onClick={() => (isSpeaking ? stop() : speak(llmTokens("Tell me a story.")))}
    >
      {isSpeaking ? "Stop" : "Narrate story"}
    </button>
  );
}

Send text incrementally

For finer control over when text arrives, call sendText for each chunk and finish when done. This is useful when the text source isn't already an async iterable.

function IncrementalTts() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  const { sendText, finish, stop, isSpeaking } = useTts({
    config: fetchTtsKey,
    voice: "Adrian",
    audio_format: "wav",
    onAudio: (chunk) => audioChunksRef.current.push(chunk),
  });

  const speakParagraphs = (paragraphs: string[]) => {
    for (const p of paragraphs) sendText(p + " ");
    finish();
  };

  return (
    <div>
      <button onClick={() => speakParagraphs(["Hello.", "How are you?"])}>
        Speak
      </button>
      {isSpeaking && <button onClick={stop}>Stop</button>}
    </div>
  );
}

sendText and finish are no-ops in REST mode — the REST endpoint only accepts the full text in a single request.

REST mode

Set mode: 'rest' to run TTS over HTTP. speak(string) still works, but speak(asyncIterable), sendText, and finish are not available (the hook emits an error if you try to stream an async iterable).

const audioChunksRef = useRef<Uint8Array[]>([]);

const { speak, state, isSpeaking } = useTts({
  config: fetchTtsKey,
  mode: "rest",
  voice: "Adrian",
  audio_format: "wav",
  onAudio: (chunk) => audioChunksRef.current.push(chunk),
  onTerminated: () => console.log("done"),
});

speak("Hello over REST.");

Use REST mode for one-off playback (confirmations, notifications) where the lower latency of WebSocket isn't worth the extra connection.

REST mode requires voice in the hook config — there is no built-in fallback. Omitting it surfaces an error via onError and leaves state: 'error'.

WebSocket mode also needs a voice, but it can come from the hook config or from server-returned tts_defaults (see Server-driven defaults).

To discover available voices, call client.tts.listModels() from the Node SDK on your server — it is not available in the browser client.

Lifecycle

useTts exposes a single state that transitions through the lifecycle below. isSpeaking and isConnecting are derived booleans you can use directly in UI.

StateMeaning
idleNo generation in flight. Initial state, or after a completed / cancelled run.
connectingOpening the WebSocket (or issuing the REST request).
speakingReceiving audio chunks.
stoppingstop() was called — waiting for terminated to flush.
errorLast run failed. Inspect error and call speak() again to retry.

Transitions, in plain terms:

  • The hook starts in idle. Calling speak() (or sendText() in WebSocket mode) moves it to connecting.
  • Once the first audio chunk is received, it moves to speaking.
  • When the server finishes (either because speak() passed a complete string, you called finish(), or the LLM async iterable ended), the hook fires onTerminated and returns to idle.
  • Calling stop() during speaking moves it to stopping and then back to idle once the server flushes. Calling cancel() at any point jumps straight back to idle.
  • Any error (connect failure or in-stream error) moves it to error. The next speak() resets it back to connecting.

Methods

MethodSignatureDescription
speak(text: string | AsyncIterable<string>) => voidStart a new run. Cancels any in-flight generation first.
sendText(text: string) => voidWebSocket only. Send one chunk without finishing. Use with finish().
finish() => voidWebSocket only. Signal no more text — server finishes and sends terminated.
stop() => Promise<void>Graceful stop. Sends finish() and resolves when the server reaches terminated.
cancel() => voidImmediate cancel. Audio stops right away.

Callbacks

CallbackSignatureDescription
onAudio(chunk: Uint8Array) => voidAudio chunk received. Fired for both REST and WebSocket modes.
onAudioEnd() => voidServer marked the final audio payload.
onTerminated() => voidGeneration is fully complete.
onError(error: Error) => voidStream or connection error.
onStateChange({ old_state, new_state }) => voidFired on every state transition.

Return values

Reactive snapshot exposed by the hook — see UseTtsReturn:

FieldTypeDescription
stateTtsStateCurrent lifecycle state.
isSpeakingbooleanstate === 'speaking'.
isConnectingbooleanstate === 'connecting'.
errorError | nullLast error, or null if none.
speak, sendText, finish, stop, cancelControl methods (see above).

Configuration

See UseTtsConfig for the full type. Most-used options:

OptionTypeDescription
configSonioxConnectionConfig | (() => Promise<SonioxConnectionConfig>)Connection configuration. Required — useTts always uses its own client.
mode'websocket' | 'rest'Transport mode. Default 'websocket'.
voicestringVoice identifier (e.g. "Adrian"). Required unless provided via server-returned tts_defaults (WebSocket mode only). Discover available voices via the Node SDK's client.tts.listModels().
modelstringTTS model. Default "tts-rt-v1-preview".
languagestringLanguage code. Default "en".
audio_formatTtsAudioFormatOutput audio format. Default "wav".
sample_ratenumberOutput sample rate in Hz. Required for raw PCM formats.
bitratenumberCodec bitrate in bps (for compressed formats).
stream_idstringOverride the auto-generated stream id. WebSocket mode only.

See Available models for the full list of TTS models, voices, and supported audio formats.

Server-driven defaults

There's no first-class endpoint for TTS defaults — you own them. Keep them on your server next to the temporary-key endpoint and return them via SonioxConnectionConfig.tts_defaults. useTts consumes the defaults automatically through its config resolver; caller-provided hook options (voice, model, audio_format, ...) still override them.

// server
app.get('/tts-tmp-key', async (_req, res) => {
  const { api_key, expires_at } = await nodeClient.auth.createTemporaryKey({
    usage_type: 'tts_rt',
    expires_in_seconds: 300,
  });

  res.json({
    api_key,
    expires_at,
    tts_defaults: {
      model: 'tts-rt-v1-preview',
      language: 'en',
      voice: 'Adrian',
      audio_format: 'wav',
    },
  });
});
// client
async function fetchTtsKey() {
  const res = await fetch("/tts-tmp-key");
  return await res.json(); // { api_key, tts_defaults, ... }
}

function TtsButton() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  // Inherits model / voice / audio_format from tts_defaults returned by the server.
  const { speak } = useTts({
    config: fetchTtsKey,
    onAudio: (chunk) => audioChunksRef.current.push(chunk),
  });

  return <button onClick={() => speak("Hello from server defaults!")}>Speak</button>;
}

See also