API Reference#

API Key#

Export your Soniox API Key as SONIOX_API_KEY environment variable:

export SONIOX_API_KEY="your_soniox_api_key_here"

Models#

Lists and describes the various models available in the API.

List models#

GET https://api.llm.soniox.com/v1/models

Lists the currently available models and provides basic information about each one, such as the owner and availability.

Examples#

Python:

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SONIOX_API_KEY"],
    base_url="https://api.llm.soniox.com/v1",
)

client.models.list()

curl:

curl https://api.llm.soniox.com/v1/models \
    -H "Authorization: Bearer $SONIOX_API_KEY"

Response#

A list of model objects.

{
  "object": "list",
  "data": [
    {
      "id": "omnio-chat-audio-preview",
      "object": "model",
      "created": 1728482400,
      "owned_by": "system"
    }
  ]
}

The model object#

Describes a Soniox Omnio model offering that can be used with the API.

id string

The model identifier, which can be referenced in the API endpoints.

created integer

The Unix timestamp (in seconds) when the model was created.

object string

The object type, which is always "model".

owned_by string

The organization that owns the model.

{
  "id": "omnio-chat-audio-preview",
  "object": "model",
  "created": 1720434075,
  "owned_by": "system"
}

Chat#

Given a list of messages comprising a conversation, the model will return a response.

Create chat completion#

POST https://api.llm.soniox.com/v1/chat/completions

Lists the currently available models, and provides basic information about each one such as the owner and availability.

Request body#

messages array Required

A list of messages comprising the conversation.

System message object

content string or array Required

The contents of the system message.

role string Required

The role of the messages author, in this case system.

name string Optional

An optional name for the participant. Provides the model information to differentiate between participants of the same role.

User message object

content string or array Required

The contents of the user message.

Text content string

The text contents of the message.

If your SDK does not support custom content parts, you can include audio data inside text between <audio_data_b64> and </audio_data_b64> tags.

For best results, audio data must be placed before text.

<audio_data_b64>BASE_64_ENCODED_AUDIO_DATA</audio_data_b64>

Write me a short summary of this audio file.

Array of content parts string

An array of content parts with a defined type, each can be of type text or audio_data_b64 when passing in audio. You can pass multiple audio files by adding multiple audio_data_b64 content parts.

For best results, audio data must be placed before text.

Text content part object

type string Required

The type of the content part, text in this case.

text string Required

The text contents of the message.

If your SDK does not support custom content parts, you can include audio data inside text between <audio_data_b64> and </audio_data_b64> tags.

For best results, audio data must be placed before text.

<audio_data_b64>BASE_64_ENCODED_AUDIO_DATA</audio_data_b64>

Write me a short summary of this audio file.

Audio content part object

type string Required

The type of the content part, audio_data_b64 in this case.

audio_data_b64 string Required

Base64 encoded audio data.

role string Required

The role of the messages author, in this case user.

name string Optional

An optional name for the participant. Provides the model information to differentiate between participants of the same role.

Assistant message object

content string or array Required

The contents of the assistant message.

role string Required

The role of the messages author, in this case assistant.

name string Optional

An optional name for the participant. Provides the model information to differentiate between participants of the same role.

model string Required

ID of the model to use.

max_tokens integer or null Optional

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

stream boolean or null Optional Defaults to false

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

stream_options object or null Optional Defaults to null

Options for streaming response. Only set this when you set stream: true.

include_usage boolean Optional

If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.

temperature number or null Optional Defaults to 1

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.

top_p number or null Optional Defaults to 1

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Examples#

Download the audio file podcast.mp3 and update the path in the code examples below to point to your downloaded file.

Python:

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SONIOX_API_KEY"],
    base_url="https://api.llm.soniox.com/v1",
)

with open("podcast.mp3", "rb") as audio_file:
    audio_data_b64 = base64.b64encode(audio_file.read()).decode("utf-8")

completion = client.chat.completions.create(
    model="omnio-chat-audio-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"audio_data_b64": audio_data_b64},
                {"text": "Write me a short summary of this audio file."},
            ],
        }
    ],
)

print(completion.choices[0].message.content)

Python (streaming response):

import base64
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["SONIOX_API_KEY"],
    base_url="https://api.llm.soniox.com/v1",
)

with open("podcast.mp3", "rb") as audio_file:
    audio_data_b64 = base64.b64encode(audio_file.read()).decode("utf-8")

response = client.chat.completions.create(
    model="omnio-chat-audio-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"audio_data_b64": audio_data_b64},
                {"text": "Write me a short summary of this audio file."},
            ],
        }
    ],
    stream=True,
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
    else:
        print()

curl:

curl https://api.llm.soniox.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $SONIOX_API_KEY" \
  -d @- <<EOF
{
  "model": "omnio-chat-audio-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "audio_data_b64": "$(cat podcast.mp3 | base64 | tr -d '\n')"
        },
        {
          "text": "Write me a short summary of this audio file."
        }
      ]
    }
  ]
}
EOF

Response#

Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.

Single response:

{
  "id": "cmpl-b2eb62cf-50c5-434a-9fde-089c633f1c77",
  "object": "chat.completion",
  "created": 1726684053,
  "model": "omnio-chat-audio-preview",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Short summary of the podcast."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1176,
    "completion_tokens": 53,
    "total_tokens": 1229
  }
}

Stream response:

data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"finish_reason":null}]}

data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":"Response"},"finish_reason":null}]}

data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}

data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":""},"finish_reason":null}]}

data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

The chat completion object#

Represents a chat completion response returned by model, based on the provided input.

id string

A unique identifier for the chat completion.

choices array

A list of chat completion choices. There will be zero or one item.

finish_reason string

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or length if the maximum number of tokens specified in the request was reached or the model size limit was hit.

index integer

The index of the choice in the list of choices.

message object

A chat completion message generated by the model.

content string or null

The contents of the message.

role string

The role of the author of this message.

created integer

The Unix timestamp (in seconds) of when the chat completion was created.

model string

The model used for the chat completion.

object string

The object type, which is always chat.completion.

usage object

Usage statistics for the completion request.

completion_tokens integer

Number of tokens in the generated completion.

prompt_tokens integer

Number of tokens in the prompt.

total_tokens object

Total number of tokens used in the request (prompt + completion).

The chat completion chunk object#

Represents a streamed chunk of a chat completion response returned by model, based on the provided input.

id string

A unique identifier for the chat completion. Each chunk has the same ID.

choices array

A list of chat completion choices. There will be zero or one item. Can also be empty for the last chunk if you set stream_options: {"include_usage": true}.

delta object

A chat completion delta generated by streamed model responses.

content string or null

The contents of the chunk message.

role string

The role of the author of this message.

finish_reason string

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or length if the maximum number of tokens specified in the request was reached or the model size limit was hit.

index integer

The index of the choice in the list of choices.

created integer

The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.

model string

The model used for the chat completion.

object string

The object type, which is always chat.completion.chunk.

usage object

An optional field that will only be present when you set stream_options: {"include_usage": true} in your request. When present, it contains a null value except for the last chunk which contains the token usage statistics for the entire request.

completion_tokens integer

Number of tokens in the generated completion.

prompt_tokens integer

Number of tokens in the prompt.

total_tokens object

Total number of tokens used in the request (prompt + completion).