REST speech generation with Web SDK

The Soniox Web SDK supports Text-to-Speech generation over HTTP with SonioxClient. Use REST when you have the full text up front — the SDK returns audio bytes that you can play, download, or hand to an <audio> element or MediaSource.

Use real-time speech generation when you want to narrate text as it arrives from an LLM or need the lowest latency to first audio.

Set up your temporary API key endpoint

In a browser environment you don't want to expose your primary API key. Create a temporary key endpoint on your server using the Soniox Node SDK and request a key with usage_type: 'tts_rt'.

To attribute browser-side TTS traffic to an end user or session, pass client_reference_id to createTemporaryKey - every request authenticated with the key is recorded under that identifier in usage logs. Clients cannot override it.

import express from 'express';
import { SonioxNodeClient } from '@soniox/node';

const app = express();
const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env

app.get('/tts-tmp-key', async (_req, res) => {
  try {
    const { api_key, expires_at } = await client.auth.createTemporaryKey({
      usage_type: 'tts_rt',
      expires_in_seconds: 300,
    });
    res.json({ api_key, expires_at });
  } catch (err) {
    res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' });
  }
});

app.listen(3000);

Quickstart

Create a SonioxClient with a config resolver that fetches a fresh temporary key, then call client.tts.generate() and play the resulting audio.

import { SonioxClient } from "@soniox/client";

const client = new SonioxClient({
  config: async () => {
    const res = await fetch("/tts-tmp-key");
    const { api_key } = await res.json();
    return { api_key };
  },
});

const audio = await client.tts.generate({
  text: "Hello from the Soniox Web SDK text-to-speech example.",
  voice: "Adrian",
  model: "tts-rt-v1",
  language: "en",
  audio_format: "wav",
});

const blob = new Blob([audio], { type: "audio/wav" });
const audioEl = new Audio(URL.createObjectURL(blob));
await audioEl.play();

Generate to bytes

client.tts.generate() returns a Promise<Uint8Array> with the full audio payload. Use this when you want to upload the audio, store it locally, or decode it with the Web Audio API.

const audio = await client.tts.generate({
  text: "This response is generated in memory.",
  voice: "Adrian",
  audio_format: "wav",
});

console.log(`Received ${audio.byteLength} bytes`);

Stream audio chunks

client.tts.generateStream() returns an AsyncIterable<Uint8Array> so you can start playback before the full payload has arrived. Pair it with MediaSource for progressive playback.

const mediaSource = new MediaSource();
const audioEl = new Audio(URL.createObjectURL(mediaSource));
await audioEl.play();

mediaSource.addEventListener("sourceopen", async () => {
  const sourceBuffer = mediaSource.addSourceBuffer("audio/wav");

  for await (const chunk of client.tts.generateStream({
    text: "Streaming audio as it arrives from the server.",
    voice: "Adrian",
    audio_format: "wav",
  })) {
    await new Promise<void>((resolve) => {
      sourceBuffer.addEventListener("updateend", () => resolve(), { once: true });
      sourceBuffer.appendBuffer(chunk);
    });
  }

  mediaSource.endOfStream();
});

For simpler use cases, you can concatenate chunks and play a single Blob:

const chunks: Uint8Array[] = [];
for await (const chunk of client.tts.generateStream({
  text: "Hello!",
  voice: "Adrian",
  audio_format: "wav",
})) {
  chunks.push(chunk);
}

const blob = new Blob(chunks, { type: "audio/wav" });
new Audio(URL.createObjectURL(blob)).play();

Generation options

Both generate() and generateStream() accept the same GenerateSpeechOptions shape:

Option	Type	Description
`text`	`string`	Input text to synthesize. Required.
`voice`	`string`	Voice identifier (e.g. `"Adrian"`). Required.
`model`	`string`	TTS model. Default `"tts-rt-v1"`.
`language`	`string`	Language code. Default `"en"`.
`audio_format`	`TtsAudioFormat`	Output audio format. Default `"wav"`.
`sample_rate`	`number`	Output sample rate in Hz. Required for raw PCM formats.
`bitrate`	`number`	Codec bitrate in bps (for compressed formats).
`signal`	`AbortSignal`	Optional signal to cancel the request.

See Available models for the full list of TTS models, voices, and supported audio formats.

Cancel a request

Pass an AbortSignal to cancel a generation request — useful when the user clicks "stop" mid-playback.

import { SonioxHttpError } from "@soniox/client";

const controller = new AbortController();

const audioPromise = client.tts.generate({
  text: "Some long text...",
  voice: "Adrian",
  signal: controller.signal,
});

stopButton.addEventListener("click", () => controller.abort());

try {
  const audio = await audioPromise;
  // ... play audio ...
} catch (err) {
  if (err instanceof SonioxHttpError && err.code === "aborted") {
    console.log("Generation cancelled");
  } else {
    throw err;
  }
}

Error handling

REST TTS requests can throw the following errors:

Error	When it's thrown
`SonioxHttpError`	Covers all HTTP failures: non-2xx responses (`code: 'http_error'`), network failures (`code: 'network_error'`), timeouts (`code: 'timeout'`), aborted requests (`code: 'aborted'`), and parse errors (`code: 'parse_error'`). Inspect `code`, `statusCode`, `message`, and `bodyText`.
`SonioxError`	Base class for all SDK errors (`SonioxHttpError` extends it). Catch this if you want a single branch for every Soniox-originated failure.

import { SonioxHttpError, SonioxError } from "@soniox/client";

try {
  await client.tts.generate({ text: "Hello!", voice: "Adrian" });
} catch (err) {
  if (err instanceof SonioxHttpError) {
    console.error(
      `HTTP ${err.statusCode ?? "n/a"} (${err.code}): ${err.message}`,
    );
  } else if (err instanceof SonioxError) {
    console.error("Soniox SDK error:", err.message);
  } else {
    throw err;
  }
}

For raw HTTP integration details, see the TTS REST API reference.

Error handling limitations

Once audio streaming has started, errors cannot be delivered to the client. For guaranteed error delivery, use the realtime WebSocket TTS instead.

Server-driven defaults

TTS defaults travel with your temporary API key. Return a tts_defaults object next to api_key from your key endpoint and the Web SDK will merge it as the base layer for every REST (and WebSocket) TTS call. Caller-provided fields on client.tts.generate(...) / generateStream(...) override the defaults.

app.get('/tts-tmp-key', async (_req, res) => {
  const { api_key, expires_at } = await nodeClient.auth.createTemporaryKey({
    usage_type: 'tts_rt',
    expires_in_seconds: 300,
  });

  res.json({
    api_key,
    expires_at,
    tts_defaults: {
      model: 'tts-rt-v1',
      language: 'en',
      voice: 'Adrian',
      audio_format: 'wav',
    },
  });
});

const client = new SonioxClient({
  config: async () => {
    const res = await fetch("/tts-tmp-key");
    return await res.json(); // { api_key, tts_defaults, ... }
  },
});

// Inherits model / language / voice / audio_format from tts_defaults.
const audio = await client.tts.generate({ text: "Hello!" });

// Override only what you need per call.
const other = await client.tts.generate({ text: "Hi!", voice: "<another-voice>" });

REST speech generation with Web SDK

Set up your temporary API key endpoint

Quickstart

Generate to bytes

Stream audio chunks

Generation options

Cancel a request

Error handling

Error handling limitations

Server-driven defaults

See also

On this page

REST speech generation with Web SDK

Example temporary key endpoint for TTS

On this page