Real-time speech generation with React SDK
Stream text to speech in React with the useTts hook
The Soniox React SDK exposes a single useTts hook that covers both real-time WebSocket TTS and one-shot REST TTS. It manages the stream lifecycle, surfaces reactive state for rendering, and handles cleanup automatically when your component unmounts.
Transport modes
useTts has two modes selected via the mode option:
| Feature | 'websocket' (default) | 'rest' |
|---|---|---|
speak(string) | Yes | Yes |
speak(asyncIterable) — LLM token streaming | Yes | No |
sendText() / finish() — incremental text | Yes | No-op |
| Audio delivery | Streaming chunks | Streaming chunks |
| Error detection | Full (in-band WebSocket errors) | Pre-stream only (HTTP status) |
| Use when... | Narrating LLM output, lowest latency to first audio | You have the full text and want a simple HTTP request |
Set up your temporary API key endpoint
In a browser environment you don't want to expose your primary API key. Create a temporary key endpoint on your server using the Soniox Node SDK. TTS keys use the tts_rt usage type.
Because TTS and STT temporary keys have different usage_type values, useTts always creates its own client from the inline config prop — even when a <SonioxProvider> is present. Pass the config prop on useTts with a resolver that fetches a tts_rt key.
Quickstart
Pass a config resolver and a voice — that's all that's required. speak() starts generation; state reflects the lifecycle; audio plays via the onAudio callback or any hook-up of your choice.
Stream from an LLM
WebSocket mode accepts an AsyncIterable<string> — pipe LLM tokens straight into speak() and audio starts playing as the first tokens arrive.
Send text incrementally
For finer control over when text arrives, call sendText for each chunk and finish when done. This is useful when the text source isn't already an async iterable.
sendText and finish are no-ops in REST mode — the REST endpoint only accepts the full text in a single request.
REST mode
Set mode: 'rest' to run TTS over HTTP. speak(string) still works, but speak(asyncIterable), sendText, and finish are not available (the hook emits an error if you try to stream an async iterable).
Use REST mode for one-off playback (confirmations, notifications) where the lower latency of WebSocket isn't worth the extra connection.
REST mode requires voice in the hook config — there is no built-in fallback. Omitting it surfaces an error via onError and leaves state: 'error'.
WebSocket mode also needs a voice, but it can come from the hook config or from server-returned tts_defaults (see Server-driven defaults).
To discover available voices, call client.tts.listModels() from the Node SDK on your server — it is not available in the browser client.
Lifecycle
useTts exposes a single state that transitions through the lifecycle below. isSpeaking and isConnecting are derived booleans you can use directly in UI.
| State | Meaning |
|---|---|
idle | No generation in flight. Initial state, or after a completed / cancelled run. |
connecting | Opening the WebSocket (or issuing the REST request). |
speaking | Receiving audio chunks. |
stopping | stop() was called — waiting for terminated to flush. |
error | Last run failed. Inspect error and call speak() again to retry. |
Transitions, in plain terms:
- The hook starts in
idle. Callingspeak()(orsendText()in WebSocket mode) moves it toconnecting. - Once the first audio chunk is received, it moves to
speaking. - When the server finishes (either because
speak()passed a complete string, you calledfinish(), or the LLM async iterable ended), the hook firesonTerminatedand returns toidle. - Calling
stop()duringspeakingmoves it tostoppingand then back toidleonce the server flushes. Callingcancel()at any point jumps straight back toidle. - Any error (connect failure or in-stream error) moves it to
error. The nextspeak()resets it back toconnecting.
Methods
| Method | Signature | Description |
|---|---|---|
speak | (text: string | AsyncIterable<string>) => void | Start a new run. Cancels any in-flight generation first. |
sendText | (text: string) => void | WebSocket only. Send one chunk without finishing. Use with finish(). |
finish | () => void | WebSocket only. Signal no more text — server finishes and sends terminated. |
stop | () => Promise<void> | Graceful stop. Sends finish() and resolves when the server reaches terminated. |
cancel | () => void | Immediate cancel. Audio stops right away. |
Callbacks
| Callback | Signature | Description |
|---|---|---|
onAudio | (chunk: Uint8Array) => void | Audio chunk received. Fired for both REST and WebSocket modes. |
onAudioEnd | () => void | Server marked the final audio payload. |
onTerminated | () => void | Generation is fully complete. |
onError | (error: Error) => void | Stream or connection error. |
onStateChange | ({ old_state, new_state }) => void | Fired on every state transition. |
Return values
Reactive snapshot exposed by the hook — see UseTtsReturn:
| Field | Type | Description |
|---|---|---|
state | TtsState | Current lifecycle state. |
isSpeaking | boolean | state === 'speaking'. |
isConnecting | boolean | state === 'connecting'. |
error | Error | null | Last error, or null if none. |
speak, sendText, finish, stop, cancel | — | Control methods (see above). |
Configuration
See UseTtsConfig for the full type. Most-used options:
| Option | Type | Description |
|---|---|---|
config | SonioxConnectionConfig | (() => Promise<SonioxConnectionConfig>) | Connection configuration. Required — useTts always uses its own client. |
mode | 'websocket' | 'rest' | Transport mode. Default 'websocket'. |
voice | string | Voice identifier (e.g. "Adrian"). Required unless provided via server-returned tts_defaults (WebSocket mode only). Discover available voices via the Node SDK's client.tts.listModels(). |
model | string | TTS model. Default "tts-rt-v1-preview". |
language | string | Language code. Default "en". |
audio_format | TtsAudioFormat | Output audio format. Default "wav". |
sample_rate | number | Output sample rate in Hz. Required for raw PCM formats. |
bitrate | number | Codec bitrate in bps (for compressed formats). |
stream_id | string | Override the auto-generated stream id. WebSocket mode only. |
See Available models for the full list of TTS models, voices, and supported audio formats.
Server-driven defaults
There's no first-class endpoint for TTS defaults — you own them. Keep them on your server next to the temporary-key endpoint and return them via SonioxConnectionConfig.tts_defaults. useTts consumes the defaults automatically through its config resolver; caller-provided hook options (voice, model, audio_format, ...) still override them.
See also
useTtsreturn type and configTtsStatereference- Web SDK real-time TTS — the underlying transport used in
'websocket'mode. - Web SDK REST TTS — the underlying transport used in
'rest'mode.