TanStack AI SDK

Overview

TanStack AI is a TypeScript toolkit for building AI applications. It provides a unified API that abstracts away the differences between various AI providers, allowing developers to switch models with just a few lines of code.

This package (@soniox/tanstack-ai-adapter) implements the SDK's transcription adapter, enabling you to use Soniox's Speech-to-Text models directly within the standard TanStack AI workflow.

Installation

npm install @soniox/tanstack-ai-adapter

Authentication

Set SONIOX_API_KEY in your environment or pass apiKey when creating the adapter. Get your API key from the Soniox Console.

Example

import { generateTranscription } from '@tanstack/ai';
import { sonioxTranscription } from '@soniox/tanstack-ai-adapter';

const result = await generateTranscription({
  adapter: sonioxTranscription('stt-async-v4'),
  audio: new URL(
    'https://soniox.com/media/examples/coffee_shop.mp3',
  ),
  modelOptions: {
    enableLanguageIdentification: true,
    enableSpeakerDiarization: true,
  },
});

console.log(result.text);
console.log(result.segments); // Timestamped segments with speaker info

Adapter configuration

Use createSonioxTranscription to customize the adapter instance:

import { createSonioxTranscription } from '@soniox/tanstack-ai-adapter';

const adapter = createSonioxTranscription('stt-async-v4', process.env.SONIOX_API_KEY!, {
  baseUrl: 'https://api.soniox.com',
  pollingIntervalMs: 1000,
  timeout: 180000,
});

Options:

apiKey: override SONIOX_API_KEY (required when using createSonioxTranscription).
baseUrl: custom API base URL. See list of regional API endpoints here. Default is https://api.soniox.com.
headers: additional request headers.
timeout: transcription timeout in milliseconds. Default is 180000ms (3 minutes).
pollingIntervalMs: transcription polling interval in milliseconds. Default is 1000ms.

Transcription options

Per-request options are passed via modelOptions:

const result = await generateTranscription({
  adapter: sonioxTranscription('stt-async-v4'),
  audio,
  modelOptions: {
    languageHints: ['en', 'es'],
    enableLanguageIdentification: true,
    enableSpeakerDiarization: true,
    context: {
      terms: ['Soniox', 'TanStack'],
    },
  },
});

Available options:

languageHints - Array of ISO language codes to bias recognition. If you pass the TanStack language option, this adapter will merge it into languageHints for convenience.
languageHintsStrict - When true, rely more heavily on language hints (note: not supported by all models)
enableLanguageIdentification - Automatically detect spoken language
enableSpeakerDiarization - Identify and separate different speakers
context - Additional context to improve accuracy
clientReferenceId - Optional client-defined reference ID
webhookUrl - Webhook URL for transcription completion notifications
webhookAuthHeaderName - Webhook authentication header name
webhookAuthHeaderValue - Webhook authentication header value
translation - Translation configuration

For more information on the available options, see the Speech-to-Text API reference.

Accessing raw tokens

When using translation or working with multilingual audio, you may need access to raw tokens with per-token language information and translation status. The adapter attaches a non-standard providerMetadata field at runtime:

const result = await generateTranscription({
  adapter: sonioxTranscription('stt-async-v4'),
  audio,
  modelOptions: {
    translation: { type: 'one_way', targetLanguage: 'es' },
  },
});

// Access raw Soniox tokens with full metadata
const rawTokens = (result as any).providerMetadata?.soniox?.tokens;

if (rawTokens) {
  rawTokens.forEach((token) => {
    // token.text - token text
    // token.start_ms - start time in milliseconds
    // token.end_ms - end time in milliseconds
    // token.language - detected language for this token
    // token.translation_status - translation status (if translation enabled)
    // token.speaker - speaker identifier
    // token.confidence - confidence score
  });
}

Note: When using translation, the API returns both transcription tokens (original) and translation tokens. The segments array always includes only transcription tokens. To access translation tokens, filter by translation_status === 'translation'.