Soniox

Vercel AI SDK

Soniox transcription provider for the Vercel AI SDK.

Soniox x Vercel AI SDK

Overview

Vercel AI SDK is a TypeScript toolkit for building AI applications. It provides a unified API that abstracts away the differences between various AI providers, allowing developers to switch models with just a few lines of code.

The @soniox/vercel-ai-sdk-provider package implements the SDK's Transcription Interface, enabling you to use Soniox's Speech-to-Text models directly within the standard Vercel AI workflow. Learn more about the Soniox provider in the Vercel AI SDK Community Providers documentation.

Installation

npm install @soniox/vercel-ai-sdk-provider

Authentication

Set SONIOX_API_KEY in your environment or pass apiKey when creating the provider.

Example

import { soniox } from '@soniox/vercel-ai-sdk-provider';
import { experimental_transcribe as transcribe } from 'ai';

const { text } = await transcribe({
  model: soniox.transcription('stt-async-v3'),
  audio: new URL(
    'https://soniox.com/media/examples/coffee_shop.mp3',
  ),
});

Provider options

Use createSoniox to customize the provider instance:

import { createSoniox } from '@soniox/vercel-ai-sdk-provider';

const soniox = createSoniox({
  apiKey: process.env.SONIOX_API_KEY,
  apiBaseUrl: 'https://api.soniox.com',
});

Options:

  • apiKey: override SONIOX_API_KEY.
  • apiBaseUrl: custom API base URL. See list of regional API endpoints here.
  • headers: additional request headers.
  • fetch: custom fetch implementation.
  • pollingIntervalMs: transcription polling interval in milliseconds. Default is 1000ms.

Transcription options

Per-request options are passed via providerOptions:

const { text } = await transcribe({
  model: soniox.transcription('stt-async-v3'),
  audio,
  providerOptions: {
    soniox: {
      languageHints: ['en', 'es'],
      enableLanguageIdentification: true,
      enableSpeakerDiarization: true,
      context: {
        terms: ["Soniox", "Vercel"]
      },
    },
  },
});

Available options:

  • languageHints - Array of ISO language codes to bias recognition
  • languageHintsStrict - When true, rely more heavily on language hints (note: not supported by all models)
  • enableLanguageIdentification - Automatically detect spoken language
  • enableSpeakerDiarization - Identify and separate different speakers
  • context - Additional context to improve accuracy
  • clientReferenceId - Optional client-defined reference ID
  • webhookUrl - Webhook URL for transcription completion notifications
  • webhookAuthHeaderName - Webhook authentication header name
  • webhookAuthHeaderValue - Webhook authentication header value
  • translation - Translation configuration

For more information on the available options, see the Speech-to-Text API reference.