Real-time speech generation with Node SDK

The Soniox Node SDK supports real-time Text-to-Speech generation over WebSocket. You send text — all at once or incrementally — and receive decoded audio chunks as they are generated. This is the lowest-latency path for voice agents, LLM output narration, and any scenario where text arrives progressively.

If you already have the full text up front and don't need chunk-by-chunk streaming, use REST speech generation instead — it's a single HTTP request.

Quickstart

client.realtime.tts() creates a single-stream session: it opens a WebSocket, configures a stream, and returns a RealtimeTtsStream. Send text, then consume audio by async iteration.

import { writeFileSync } from "node:fs";
import { SonioxNodeClient } from "@soniox/node";

const client = new SonioxNodeClient();

const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1",
  language: "en",
  audio_format: "wav",
});

// Send all the text and mark the stream as finished.
stream.sendText(
  "Hello from Soniox real-time text-to-speech. This is a single-stream example.",
  { end: true },
);

// Collect audio as it arrives.
const chunks: Uint8Array[] = [];
for await (const chunk of stream) {
  chunks.push(chunk);
}

const audio = Buffer.concat(chunks);
writeFileSync("tts_realtime.wav", audio);

console.log(`Wrote ${audio.byteLength} bytes`);

The stream closes itself (and the underlying WebSocket) once terminated fires. You never have to call close() in single-stream mode.

Send text incrementally

Use sendText(text) for each chunk as it becomes available, then either set { end: true } on the last call or invoke finish() explicitly. This is the pattern for narrating an LLM response token-by-token.

const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1",
  audio_format: "wav",
});

stream.sendText("Hello from Soniox ");
stream.sendText("real-time TTS. ");
stream.sendText("This is the final chunk.", { end: true });

for await (const chunk of stream) {
  playback(chunk); // play, buffer, or write the chunk somewhere
}

Equivalent with an explicit finish():

stream.sendText("Hello from Soniox real-time TTS.");
stream.finish();

for await (const chunk of stream) {
  playback(chunk);
}

Pipe from an async iterable

stream.sendStream(source) pipes any AsyncIterable<string> into the TTS session and auto-finishes when the iterable completes. This is the idiomatic way to connect an LLM token stream directly to speech output — sending and receiving run concurrently.

async function* tokensFromLlm(): AsyncIterable<string> {
  const words = "Hello from Soniox real-time TTS.".split(" ");
  for (let i = 0; i < words.length; i++) {
    await new Promise((r) => setTimeout(r, 50));
    yield i === 0 ? words[i] : " " + words[i];
  }
}

const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1",
  audio_format: "wav",
});

// Start piping text. Audio consumption runs concurrently below.
void stream.sendStream(tokensFromLlm());

for await (const chunk of stream) {
  playback(chunk);
}

Event-based consumption

RealtimeTtsStream is also a TypedEmitter. When you prefer an event-driven style over async iteration, listen for TtsStreamEvents:

Event	Payload	Description
`audio`	`Uint8Array`	Decoded audio chunk.
`audioEnd`	—	Server marked the final audio payload for this stream.
`terminated`	—	Stream fully closed by the server.
`error`	`Error`	Stream-level error.

const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1",
  audio_format: "wav",
});

stream.on("audio", (chunk) => playback(chunk));
stream.on("audioEnd", () => console.log("last audio payload received"));
stream.on("error", (err) => console.error("Stream error:", err));
stream.on("terminated", () => console.log("stream done"));

stream.sendText("Hello from event-based TTS.", { end: true });

Choose either async iteration or event listeners — not both. The async iterator consumes audio events internally.

Multi-stream connection

A single WebSocket connection can carry up to 5 concurrent TTS streams. Use client.realtime.tts.multiStream() to open a RealtimeTtsConnection, then call connection.stream() for each stream. Each stream has its own streamId and can have different voice, model, and audio format settings.

const connection = await client.realtime.tts.multiStream();

const streamA = await connection.stream({
  voice: "Adrian",
  audio_format: "wav",
});
const streamB = await connection.stream({
  // Enumerate available voices via `client.tts.listModels()`.
  voice: "<another-voice>",
  audio_format: "wav",
});

streamA.sendText("Hello from stream A.", { end: true });
streamB.sendText("Hello from stream B.", { end: true });

const [audioA, audioB] = await Promise.all([
  collect(streamA),
  collect(streamB),
]);

connection.close();

async function collect(stream: AsyncIterable<Uint8Array>): Promise<Buffer> {
  const chunks: Uint8Array[] = [];
  for await (const chunk of stream) {
    chunks.push(chunk);
  }
  return Buffer.concat(chunks);
}

Call connection.close() when you're done — this ends all active streams and closes the WebSocket.

Cancel, finish, and close

Method	Behavior
`stream.finish()`	Signals "no more text". The server finishes generating audio and sends `terminated`.
`stream.cancel()`	Aborts generation immediately. The server stops producing audio and sends `terminated`.
`stream.close()`	Terminates the stream. In single-stream mode (`client.realtime.tts(...)`) this also closes the WebSocket.
`connection.close()`	Closes the WebSocket and terminates all streams on a multi-stream connection.

// Graceful stop
stream.finish();

// User-triggered cancel
stream.cancel();

Error handling

A failed stream does not close the whole WebSocket connection by default. Stream-level errors finalize only that stream (terminated fires for the same streamId), while other streams on the same connection can continue. Connection-level failures end the whole connection and all active streams.

import { RealtimeError, SonioxError } from "@soniox/node";

try {
  const stream = await client.realtime.tts({ voice: "Adrian" });
  stream.sendText("Hello!", { end: true });
  for await (const _ of stream) {
    // consume audio
  }
} catch (err) {
  if (err instanceof RealtimeError) {
    console.error(`Realtime TTS error (${err.code}):`, err.message);
  } else if (err instanceof SonioxError) {
    console.error("Soniox SDK error:", err.message);
  } else {
    throw err;
  }
}

Server-driven defaults

Set shared TTS fields once on the client via tts_defaults and they'll be merged as the base layer every time you open a stream. Caller-provided fields on client.realtime.tts(...) / connection.stream(...) override the defaults, so you never need to spread them manually.

import { SonioxNodeClient } from "@soniox/node";

const client = new SonioxNodeClient({
  tts_defaults: {
    model: "tts-rt-v1",
    language: "en",
    voice: "Adrian",
    audio_format: "wav",
  },
});

// Inherits everything from tts_defaults.
const stream = await client.realtime.tts();

// Overrides the default voice for this stream only.
const customVoice = await client.realtime.tts({ voice: "<another-voice>" });

tts_defaults is also accepted on RealtimeOptions if you want to scope defaults to a specific realtime namespace.

On the Web and React SDKs, the equivalent is SonioxConnectionConfig.tts_defaults — return it from the async config resolver alongside the temporary api_key so the server owns the defaults.