Soniox
SDKsWebText-to-Speech

REST speech generation with Web SDK

Generate speech from text in the browser with the Soniox Web SDK over HTTP

The Soniox Web SDK supports Text-to-Speech generation over HTTP with SonioxClient. Use REST when you have the full text up front — the SDK returns audio bytes that you can play, download, or hand to an <audio> element or MediaSource.

Use real-time speech generation when you want to narrate text as it arrives from an LLM or need the lowest latency to first audio.

Set up your temporary API key endpoint

In a browser environment you don't want to expose your primary API key. Create a temporary key endpoint on your server using the Soniox Node SDK and request a key with usage_type: 'tts_rt'.

import express from 'express';
import { SonioxNodeClient } from '@soniox/node';

const app = express();
const client = new SonioxNodeClient(); // reads SONIOX_API_KEY from env

app.get('/tts-tmp-key', async (_req, res) => {
  try {
    const { api_key, expires_at } = await client.auth.createTemporaryKey({
      usage_type: 'tts_rt',
      expires_in_seconds: 300,
    });
    res.json({ api_key, expires_at });
  } catch (err) {
    res.status(500).json({ error: err instanceof Error ? err.message : 'Failed to create temporary key' });
  }
});

app.listen(3000);

Read more about Temporary API keys.

Quickstart

Create a SonioxClient with a config resolver that fetches a fresh temporary key, then call client.tts.generate() and play the resulting audio.

import { SonioxClient } from "@soniox/client";

const client = new SonioxClient({
  config: async () => {
    const res = await fetch("/tts-tmp-key");
    const { api_key } = await res.json();
    return { api_key };
  },
});

const audio = await client.tts.generate({
  text: "Hello from the Soniox Web SDK text-to-speech example.",
  voice: "Adrian",
  model: "tts-rt-v1-preview",
  language: "en",
  audio_format: "wav",
});

const blob = new Blob([audio], { type: "audio/wav" });
const audioEl = new Audio(URL.createObjectURL(blob));
await audioEl.play();

Generate to bytes

client.tts.generate() returns a Promise<Uint8Array> with the full audio payload. Use this when you want to upload the audio, store it locally, or decode it with the Web Audio API.

const audio = await client.tts.generate({
  text: "This response is generated in memory.",
  voice: "Adrian",
  audio_format: "wav",
});

console.log(`Received ${audio.byteLength} bytes`);

Stream audio chunks

client.tts.generateStream() returns an AsyncIterable<Uint8Array> so you can start playback before the full payload has arrived. Pair it with MediaSource for progressive playback.

const mediaSource = new MediaSource();
const audioEl = new Audio(URL.createObjectURL(mediaSource));
await audioEl.play();

mediaSource.addEventListener("sourceopen", async () => {
  const sourceBuffer = mediaSource.addSourceBuffer("audio/wav");

  for await (const chunk of client.tts.generateStream({
    text: "Streaming audio as it arrives from the server.",
    voice: "Adrian",
    audio_format: "wav",
  })) {
    await new Promise<void>((resolve) => {
      sourceBuffer.addEventListener("updateend", () => resolve(), { once: true });
      sourceBuffer.appendBuffer(chunk);
    });
  }

  mediaSource.endOfStream();
});

For simpler use cases, you can concatenate chunks and play a single Blob:

const chunks: Uint8Array[] = [];
for await (const chunk of client.tts.generateStream({
  text: "Hello!",
  voice: "Adrian",
  audio_format: "wav",
})) {
  chunks.push(chunk);
}

const blob = new Blob(chunks, { type: "audio/wav" });
new Audio(URL.createObjectURL(blob)).play();

Generation options

Both generate() and generateStream() accept the same GenerateSpeechOptions shape:

OptionTypeDescription
textstringInput text to synthesize. Required.
voicestringVoice identifier (e.g. "Adrian"). Required.
modelstringTTS model. Default "tts-rt-v1-preview".
languagestringLanguage code. Default "en".
audio_formatTtsAudioFormatOutput audio format. Default "wav".
sample_ratenumberOutput sample rate in Hz. Required for raw PCM formats.
bitratenumberCodec bitrate in bps (for compressed formats).
signalAbortSignalOptional signal to cancel the request.

See Available models for the full list of TTS models, voices, and supported audio formats.

Cancel a request

Pass an AbortSignal to cancel a generation request — useful when the user clicks "stop" mid-playback.

import { SonioxHttpError } from "@soniox/client";

const controller = new AbortController();

const audioPromise = client.tts.generate({
  text: "Some long text...",
  voice: "Adrian",
  signal: controller.signal,
});

stopButton.addEventListener("click", () => controller.abort());

try {
  const audio = await audioPromise;
  // ... play audio ...
} catch (err) {
  if (err instanceof SonioxHttpError && err.code === "aborted") {
    console.log("Generation cancelled");
  } else {
    throw err;
  }
}

Error handling

REST TTS requests can throw the following errors:

ErrorWhen it's thrown
SonioxHttpErrorCovers all HTTP failures: non-2xx responses (code: 'http_error'), network failures (code: 'network_error'), timeouts (code: 'timeout'), aborted requests (code: 'aborted'), and parse errors (code: 'parse_error'). Inspect code, statusCode, message, and bodyText.
SonioxErrorBase class for all SDK errors (SonioxHttpError extends it). Catch this if you want a single branch for every Soniox-originated failure.
import { SonioxHttpError, SonioxError } from "@soniox/client";

try {
  await client.tts.generate({ text: "Hello!", voice: "Adrian" });
} catch (err) {
  if (err instanceof SonioxHttpError) {
    console.error(
      `HTTP ${err.statusCode ?? "n/a"} (${err.code}): ${err.message}`,
    );
  } else if (err instanceof SonioxError) {
    console.error("Soniox SDK error:", err.message);
  } else {
    throw err;
  }
}

For raw HTTP integration details, see the TTS REST API reference.

Error handling limitations

Mid-stream errors reported via HTTP trailers (X-Tts-Error-Code, X-Tts-Error-Message) are not surfaced by browser fetch (and therefore by the Soniox Web SDK). For guaranteed error delivery, use the realtime WebSocket TTS instead.

Server-driven defaults

TTS defaults travel with your temporary API key. Return a tts_defaults object next to api_key from your key endpoint and the Web SDK will merge it as the base layer for every REST (and WebSocket) TTS call. Caller-provided fields on client.tts.generate(...) / generateStream(...) override the defaults.

app.get('/tts-tmp-key', async (_req, res) => {
  const { api_key, expires_at } = await nodeClient.auth.createTemporaryKey({
    usage_type: 'tts_rt',
    expires_in_seconds: 300,
  });

  res.json({
    api_key,
    expires_at,
    tts_defaults: {
      model: 'tts-rt-v1-preview',
      language: 'en',
      voice: 'Adrian',
      audio_format: 'wav',
    },
  });
});
const client = new SonioxClient({
  config: async () => {
    const res = await fetch("/tts-tmp-key");
    return await res.json(); // { api_key, tts_defaults, ... }
  },
});

// Inherits model / language / voice / audio_format from tts_defaults.
const audio = await client.tts.generate({ text: "Hello!" });

// Override only what you need per call.
const other = await client.tts.generate({ text: "Hi!", voice: "<another-voice>" });

See also