Soniox
SDKsNode.jsText-to-Speech

Real-time speech generation with Node SDK

Stream text to speech with the Soniox Node SDK over WebSocket

The Soniox Node SDK supports real-time Text-to-Speech generation over WebSocket. You send text — all at once or incrementally — and receive decoded audio chunks as they are generated. This is the lowest-latency path for voice agents, LLM output narration, and any scenario where text arrives progressively.

If you already have the full text up front and don't need chunk-by-chunk streaming, use REST speech generation instead — it's a single HTTP request.

Quickstart

client.realtime.tts() creates a single-stream session: it opens a WebSocket, configures a stream, and returns a RealtimeTtsStream. Send text, then consume audio by async iteration.

import { writeFileSync } from "node:fs";
import { SonioxNodeClient } from "@soniox/node";

const client = new SonioxNodeClient();

const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1-preview",
  language: "en",
  audio_format: "wav",
});

// Send all the text and mark the stream as finished.
stream.sendText(
  "Hello from Soniox real-time text-to-speech. This is a single-stream example.",
  { end: true },
);

// Collect audio as it arrives.
const chunks: Uint8Array[] = [];
for await (const chunk of stream) {
  chunks.push(chunk);
}

const audio = Buffer.concat(chunks);
writeFileSync("tts_realtime.wav", audio);

console.log(`Wrote ${audio.byteLength} bytes`);

The stream closes itself (and the underlying WebSocket) once terminated fires. You never have to call close() in single-stream mode.

Send text incrementally

Use sendText(text) for each chunk as it becomes available, then either set { end: true } on the last call or invoke finish() explicitly. This is the pattern for narrating an LLM response token-by-token.

const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1-preview",
  audio_format: "wav",
});

stream.sendText("Hello from Soniox ");
stream.sendText("real-time TTS. ");
stream.sendText("This is the final chunk.", { end: true });

for await (const chunk of stream) {
  playback(chunk); // play, buffer, or write the chunk somewhere
}

Equivalent with an explicit finish():

stream.sendText("Hello from Soniox real-time TTS.");
stream.finish();

for await (const chunk of stream) {
  playback(chunk);
}

Pipe from an async iterable

stream.sendStream(source) pipes any AsyncIterable<string> into the TTS session and auto-finishes when the iterable completes. This is the idiomatic way to connect an LLM token stream directly to speech output — sending and receiving run concurrently.

async function* tokensFromLlm(): AsyncIterable<string> {
  const words = "Hello from Soniox real-time TTS.".split(" ");
  for (let i = 0; i < words.length; i++) {
    await new Promise((r) => setTimeout(r, 50));
    yield i === 0 ? words[i] : " " + words[i];
  }
}

const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1-preview",
  audio_format: "wav",
});

// Start piping text. Audio consumption runs concurrently below.
void stream.sendStream(tokensFromLlm());

for await (const chunk of stream) {
  playback(chunk);
}

Event-based consumption

RealtimeTtsStream is also a TypedEmitter. When you prefer an event-driven style over async iteration, listen for TtsStreamEvents:

EventPayloadDescription
audioUint8ArrayDecoded audio chunk.
audioEndServer marked the final audio payload for this stream.
terminatedStream fully closed by the server.
errorErrorStream-level error.
const stream = await client.realtime.tts({
  voice: "Adrian",
  model: "tts-rt-v1-preview",
  audio_format: "wav",
});

stream.on("audio", (chunk) => playback(chunk));
stream.on("audioEnd", () => console.log("last audio payload received"));
stream.on("error", (err) => console.error("Stream error:", err));
stream.on("terminated", () => console.log("stream done"));

stream.sendText("Hello from event-based TTS.", { end: true });

Choose either async iteration or event listeners — not both. The async iterator consumes audio events internally.

Multi-stream connection

A single WebSocket connection can carry up to 5 concurrent TTS streams. Use client.realtime.tts.multiStream() to open a RealtimeTtsConnection, then call connection.stream() for each stream. Each stream has its own streamId and can have different voice, model, and audio format settings.

const connection = await client.realtime.tts.multiStream();

const streamA = await connection.stream({
  voice: "Adrian",
  audio_format: "wav",
});
const streamB = await connection.stream({
  // Enumerate available voices via `client.tts.listModels()`.
  voice: "<another-voice>",
  audio_format: "wav",
});

streamA.sendText("Hello from stream A.", { end: true });
streamB.sendText("Hello from stream B.", { end: true });

const [audioA, audioB] = await Promise.all([
  collect(streamA),
  collect(streamB),
]);

connection.close();

async function collect(stream: AsyncIterable<Uint8Array>): Promise<Buffer> {
  const chunks: Uint8Array[] = [];
  for await (const chunk of stream) {
    chunks.push(chunk);
  }
  return Buffer.concat(chunks);
}

Call connection.close() when you're done — this ends all active streams and closes the WebSocket.

Cancel, finish, and close

MethodBehavior
stream.finish()Signals "no more text". The server finishes generating audio and sends terminated.
stream.cancel()Aborts generation immediately. The server stops producing audio and sends terminated.
stream.close()Terminates the stream. In single-stream mode (client.realtime.tts(...)) this also closes the WebSocket.
connection.close()Closes the WebSocket and terminates all streams on a multi-stream connection.
// Graceful stop
stream.finish();

// User-triggered cancel
stream.cancel();

Error handling

A failed stream does not close the whole WebSocket connection by default. Stream-level errors finalize only that stream (terminated fires for the same streamId), while other streams on the same connection can continue. Connection-level failures end the whole connection and all active streams.

import { RealtimeError, SonioxError } from "@soniox/node";

try {
  const stream = await client.realtime.tts({ voice: "Adrian" });
  stream.sendText("Hello!", { end: true });
  for await (const _ of stream) {
    // consume audio
  }
} catch (err) {
  if (err instanceof RealtimeError) {
    console.error(`Realtime TTS error (${err.code}):`, err.message);
  } else if (err instanceof SonioxError) {
    console.error("Soniox SDK error:", err.message);
  } else {
    throw err;
  }
}

Server-driven defaults

Set shared TTS fields once on the client via tts_defaults and they'll be merged as the base layer every time you open a stream. Caller-provided fields on client.realtime.tts(...) / connection.stream(...) override the defaults, so you never need to spread them manually.

import { SonioxNodeClient } from "@soniox/node";

const client = new SonioxNodeClient({
  tts_defaults: {
    model: "tts-rt-v1-preview",
    language: "en",
    voice: "Adrian",
    audio_format: "wav",
  },
});

// Inherits everything from tts_defaults.
const stream = await client.realtime.tts();

// Overrides the default voice for this stream only.
const customVoice = await client.realtime.tts({ voice: "<another-voice>" });

tts_defaults is also accepted on RealtimeOptions if you want to scope defaults to a specific realtime namespace.

On the Web and React SDKs, the equivalent is SonioxConnectionConfig.tts_defaults — return it from the async config resolver alongside the temporary api_key so the server owns the defaults.

See also