Soniox
Integrations

Vercel AI SDK

Soniox transcription provider for the Vercel AI SDK.

Soniox x Vercel AI SDK

Overview

Vercel AI SDK is a Typescript toolkit for building AI applications. It provides a unified API that abstracts away the differences between various AI providers, allowing developers to switch models with just a few lines of code.

This package (@soniox/vercel-ai-sdk-provider) implements the SDK's Transcription Interface, enabling you to use Soniox's Speech-to-Text models directly within the standard Vercel AI workflow.

Installation

npm install @soniox/vercel-ai-sdk-provider

Authentication

Set SONIOX_API_KEY in your environment or pass apiKey when creating the provider.

Example

import { soniox } from '@soniox/vercel-ai-sdk-provider';
import { experimental_transcribe as transcribe } from 'ai';

const { text } = await transcribe({
  model: soniox.transcription('stt-async-v3'),
  audio: new URL(
    'https://github.com/vercel/ai/raw/refs/heads/main/examples/ai-core/data/galileo.mp3',
  ),
});

Provider options

Use createSoniox to customize the provider instance:

import { createSoniox } from '@soniox/vercel-ai-sdk-provider';

const soniox = createSoniox({
  apiKey: process.env.SONIOX_API_KEY,
  apiBaseUrl: 'https://api.soniox.com',
});

Options:

  • apiKey: override SONIOX_API_KEY.
  • apiBaseUrl: custom API base URL. See list of regional API endpoints here.
  • headers: additional request headers.
  • fetch: custom fetch implementation.
  • pollingIntervalMs: transcription polling interval in milliseconds. Default is 1000ms.

Transcription options

Per-request options are passed via providerOptions:

const { text } = await transcribe({
  model: soniox.transcription('stt-async-v3'),
  audio,
  providerOptions: {
    soniox: {
      languageHints: ['en', 'es'],
      enableLanguageIdentification: true,
      enableSpeakerDiarization: true,
      context: {
        terms: ["Soniox", "Vercel"]
      },
    },
  },
});

Available options:

  • languageHints
  • enableLanguageIdentification
  • enableSpeakerDiarization
  • context
  • clientReferenceId
  • webhookUrl
  • webhookAuthHeaderName
  • webhookAuthHeaderValue
  • translation

For more information on the available options, see the Speech-to-Text API reference.