Real-time speech generation with React SDK

The Soniox React SDK exposes a single useTts hook that covers both real-time WebSocket TTS and one-shot REST TTS. It manages the stream lifecycle, surfaces reactive state for rendering, and handles cleanup automatically when your component unmounts.

Transport modes

useTts has two modes selected via the mode option:

Feature	`'websocket'` (default)	`'rest'`
`speak(string)`	Yes	Yes
`speak(asyncIterable)` — LLM token streaming	Yes	No
`sendText()` / `finish()` — incremental text	Yes	No-op
Audio delivery	Streaming chunks	Streaming chunks
Error detection	Full (in-band WebSocket errors)	Pre-stream only (HTTP status)
Use when...	Narrating LLM output, lowest latency to first audio	You have the full text and want a simple HTTP request

Set up your temporary API key endpoint

In a browser environment you don't want to expose your primary API key. Create a temporary key endpoint on your server using the Soniox Node SDK. TTS keys use the tts_rt usage type.

To attribute browser-side TTS traffic to an end user or session, pass client_reference_id to createTemporaryKey - every request authenticated with the key is recorded under that identifier in usage logs. Clients cannot override it.

import express from 'express';
import { SonioxNodeClient } from '@soniox/node';

const app = express();
const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env

app.get('/tts-tmp-key', async (_req, res) => {
  try {
    const { api_key, expires_at } = await client.auth.createTemporaryKey({
      usage_type: 'tts_rt',
      expires_in_seconds: 300,
    });
    res.json({ api_key, expires_at });
  } catch (err) {
    res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' });
  }
});

app.listen(3000);

Because TTS and STT temporary keys have different usage_type values, useTts always creates its own client from the inline config prop — even when a <SonioxProvider> is present. Pass the config prop on useTts with a resolver that fetches a tts_rt key.

Quickstart

Pass a config resolver and a voice — that's all that's required. speak() starts generation; state reflects the lifecycle; audio plays via the onAudio callback or any hook-up of your choice.

import { useTts } from "@soniox/react";
import { useRef } from "react";

async function fetchTtsKey() {
  const res = await fetch("/tts-tmp-key");
  const { api_key } = await res.json();
  return { api_key };
}

export function TtsButton() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  const { speak, state, isSpeaking, error } = useTts({
    config: fetchTtsKey,
    voice: "Adrian",
    model: "tts-rt-v1",
    language: "en",
    audio_format: "wav",
    onAudio: (chunk) => {
      audioChunksRef.current.push(chunk);
    },
    onTerminated: () => {
      const blob = new Blob(audioChunksRef.current, { type: "audio/wav" });
      new Audio(URL.createObjectURL(blob)).play();
      audioChunksRef.current = [];
    },
    onError: (err) => console.error(err),
  });

  return (
    <div>
      <button onClick={() => speak("Hello from Soniox React SDK!")} disabled={isSpeaking}>
        Speak
      </button>
      <p>State: {state}</p>
      {error && <p style={{ color: "red" }}>{error.message}</p>}
    </div>
  );
}

Stream from an LLM

WebSocket mode accepts an AsyncIterable<string> — pipe LLM tokens straight into speak() and audio starts playing as the first tokens arrive.

async function* llmTokens(prompt: string): AsyncIterable<string> {
  const res = await fetch("/llm/stream", {
    method: "POST",
    body: JSON.stringify({ prompt }),
  });
  const reader = res.body!.getReader();
  const decoder = new TextDecoder();
  while (true) {
    const { value, done } = await reader.read();
    if (done) return;
    yield decoder.decode(value);
  }
}

function Narrator() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  const { speak, isSpeaking, stop } = useTts({
    config: fetchTtsKey,
    voice: "Adrian",
    audio_format: "wav",
    onAudio: (chunk) => audioChunksRef.current.push(chunk),
  });

  return (
    <button
      onClick={() => (isSpeaking ? stop() : speak(llmTokens("Tell me a story.")))}
    >
      {isSpeaking ? "Stop" : "Narrate story"}
    </button>
  );
}

Send text incrementally

For finer control over when text arrives, call sendText for each chunk and finish when done. This is useful when the text source isn't already an async iterable.

function IncrementalTts() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  const { sendText, finish, stop, isSpeaking } = useTts({
    config: fetchTtsKey,
    voice: "Adrian",
    audio_format: "wav",
    onAudio: (chunk) => audioChunksRef.current.push(chunk),
  });

  const speakParagraphs = (paragraphs: string[]) => {
    for (const p of paragraphs) sendText(p + " ");
    finish();
  };

  return (
    <div>
      <button onClick={() => speakParagraphs(["Hello.", "How are you?"])}>
        Speak
      </button>
      {isSpeaking && <button onClick={stop}>Stop</button>}
    </div>
  );
}

sendText and finish are no-ops in REST mode — the REST endpoint only accepts the full text in a single request.

REST mode

Set mode: 'rest' to run TTS over HTTP. speak(string) still works, but speak(asyncIterable), sendText, and finish are not available (the hook emits an error if you try to stream an async iterable).

const audioChunksRef = useRef<Uint8Array[]>([]);

const { speak, state, isSpeaking } = useTts({
  config: fetchTtsKey,
  mode: "rest",
  voice: "Adrian",
  audio_format: "wav",
  onAudio: (chunk) => audioChunksRef.current.push(chunk),
  onTerminated: () => console.log("done"),
});

speak("Hello over REST.");

Use REST mode for one-off playback (confirmations, notifications) where the lower latency of WebSocket isn't worth the extra connection.

REST mode requires voice in the hook config — there is no built-in fallback. Omitting it surfaces an error via onError and leaves state: 'error'.

WebSocket mode also needs a voice, but it can come from the hook config or from server-returned tts_defaults (see Server-driven defaults).

To discover available voices, call client.tts.listModels() from the Node SDK on your server — it is not available in the browser client.

Lifecycle

useTts exposes a single state that transitions through the lifecycle below. isSpeaking and isConnecting are derived booleans you can use directly in UI.

State	Meaning
`idle`	No generation in flight. Initial state, or after a completed / cancelled run.
`connecting`	Opening the WebSocket (or issuing the REST request).
`speaking`	Receiving audio chunks.
`stopping`	`stop()` was called — waiting for `terminated` to flush.
`error`	Last run failed. Inspect `error` and call `speak()` again to retry.

Transitions, in plain terms:

The hook starts in idle. Calling speak() (or sendText() in WebSocket mode) moves it to connecting.
Once the first audio chunk is received, it moves to speaking.
When the server finishes (either because speak() passed a complete string, you called finish(), or the LLM async iterable ended), the hook fires onTerminated and returns to idle.
Calling stop() during speaking moves it to stopping and then back to idle once the server flushes. Calling cancel() at any point jumps straight back to idle.
Any error (connect failure or in-stream error) moves it to error. The next speak() resets it back to connecting.

Methods

Method	Signature	Description
`speak`	`(text: string \| AsyncIterable<string>) => void`	Start a new run. Cancels any in-flight generation first.
`sendText`	`(text: string) => void`	WebSocket only. Send one chunk without finishing. Use with `finish()`.
`finish`	`() => void`	WebSocket only. Signal no more text — server finishes and sends `terminated`.
`stop`	`() => Promise<void>`	Graceful stop. Sends `finish()` and resolves when the server reaches `terminated`.
`cancel`	`() => void`	Immediate cancel. Audio stops right away.

Callbacks

Callback	Signature	Description
`onAudio`	`(chunk: Uint8Array) => void`	Audio chunk received. Fired for both REST and WebSocket modes.
`onAudioEnd`	`() => void`	Server marked the final audio payload.
`onTerminated`	`() => void`	Generation is fully complete.
`onError`	`(error: Error) => void`	Stream or connection error.
`onStateChange`	`({ old_state, new_state }) => void`	Fired on every state transition.

Return values

Reactive snapshot exposed by the hook — see UseTtsReturn:

Field	Type	Description
`state`	`TtsState`	Current lifecycle state.
`isSpeaking`	`boolean`	`state === 'speaking'`.
`isConnecting`	`boolean`	`state === 'connecting'`.
`error`	`Error \| null`	Last error, or `null` if none.
`speak`, `sendText`, `finish`, `stop`, `cancel`	—	Control methods (see above).

Configuration

See UseTtsConfig for the full type. Most-used options:

Option	Type	Description
`config`	`SonioxConnectionConfig \| (() => Promise<SonioxConnectionConfig>)`	Connection configuration. Required — `useTts` always uses its own client.
`mode`	`'websocket' \| 'rest'`	Transport mode. Default `'websocket'`.
`voice`	`string`	Voice identifier (e.g. `"Adrian"`). Required unless provided via server-returned `tts_defaults` (WebSocket mode only). Discover available voices via the Node SDK's `client.tts.listModels()`.
`model`	`string`	TTS model. Default `"tts-rt-v1"`.
`language`	`string`	Language code. Default `"en"`.
`audio_format`	`TtsAudioFormat`	Output audio format. Default `"wav"`.
`sample_rate`	`number`	Output sample rate in Hz. Required for raw PCM formats.
`bitrate`	`number`	Codec bitrate in bps (for compressed formats).
`stream_id`	`string`	Override the auto-generated stream id. WebSocket mode only.

See Available models for the full list of TTS models, voices, and supported audio formats.

Server-driven defaults

There's no first-class endpoint for TTS defaults — you own them. Keep them on your server next to the temporary-key endpoint and return them via SonioxConnectionConfig.tts_defaults. useTts consumes the defaults automatically through its config resolver; caller-provided hook options (voice, model, audio_format, ...) still override them.

// server
app.get('/tts-tmp-key', async (_req, res) => {
  const { api_key, expires_at } = await nodeClient.auth.createTemporaryKey({
    usage_type: 'tts_rt',
    expires_in_seconds: 300,
  });

  res.json({
    api_key,
    expires_at,
    tts_defaults: {
      model: 'tts-rt-v1',
      language: 'en',
      voice: 'Adrian',
      audio_format: 'wav',
    },
  });
});

// client
async function fetchTtsKey() {
  const res = await fetch("/tts-tmp-key");
  return await res.json(); // { api_key, tts_defaults, ... }
}

function TtsButton() {
  const audioChunksRef = useRef<Uint8Array[]>([]);

  // Inherits model / voice / audio_format from tts_defaults returned by the server.
  const { speak } = useTts({
    config: fetchTtsKey,
    onAudio: (chunk) => audioChunksRef.current.push(chunk),
  });

  return <button onClick={() => speak("Hello from server defaults!")}>Speak</button>;
}

Next.js TTS example

Full Next.js example with temporary API key generation and TTS playback.

Real-time speech generation with React SDK

Transport modes

Set up your temporary API key endpoint

Quickstart

Stream from an LLM

Send text incrementally

REST mode

Lifecycle

Methods

Callbacks

Return values

Configuration

Server-driven defaults

See also

On this page

Real-time speech generation with React SDK

Example temporary key endpoint for TTS

On this page