API Reference

Omnio API is fully compatible with OpenAI API. The API is available at https://api.llm.soniox.com/v1. We suggest using the OpenAI client to interact with the API.

List models

GET https://api.llm.soniox.com/v1/models

Lists the currently available models and provides basic information about each one, such as the owner and availability.

Response

A list of model objects.

{
  "object": "list",
  "data": [
    {
      "id": "omnio-chat-audio-preview",
      "object": "model",
      "created": 1728482400,
      "owned_by": "system"
    }
  ]
}

Model object

Describes a Omnio model offering that can be used with the API.

idstring

The model identifier, which can be referenced in the API endpoints.

createdinteger

The Unix timestamp (in seconds) when the model was created.

objectstring

The object type, which is always model.

owned_bystring

The organization that owns the model.

Example

import os
from openai import OpenAI
 
client = OpenAI(
    api_key=os.environ["SONIOX_API_KEY"],
    base_url="https://api.llm.soniox.com/v1",
)
 
client.models.list()

An array of content parts with a defined type, each can be of type text or audio_data_b64 when passing in audio. You can pass multiple audio files by adding multiple audio_data_b64 content parts.

For best results, audio data must be placed before text.

Text content part

typeRequiredstring

The type of the content part, text in this case.

textRequiredstring

The text contents of the message.

If your SDK does not support custom content parts, you can include audio data inside text between <audio_data_b64> and </audio_data_b64> tags.

For best results, audio data must be placed before text.

<audio_data_b64>BASE_64_ENCODED_AUDIO_DATA</audio_data_b64>
Write me a short summary of this audio file.

Audio content part

typeRequiredstring

The type of the content part, audio_data_b64 in this case.

audio_data_b64Requiredstring

Base64 encoded audio data.

roleRequiredstring

The role of the messages author, in this case user.

namestring

An optional name for the participant. Provides the model information to differentiate between participants of the same role.

Assistant message

contentRequiredstring or array

The contents of the assistant message.

roleRequiredstring

The role of the messages author, in this case assistant.

namestring

An optional name for the participant. Provides the model information to differentiate between participants of the same role.

modelRequiredstring

ID of the model to use.

max_tokensinteger or null

The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

streamDefault: falseboolean or null

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available, with the stream terminated by a data: [DONE] message.

stream_optionsDefault: nullobject or null

Options for streaming response. Only set this when you set stream: true.

include_usageDefault: falseboolean

If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.

temperatureDefault: 1number or null

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

We generally recommend altering this or top_p but not both.

top_pDefault: 1number or null

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

Response

Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.

{
  "id": "cmpl-b2eb62cf-50c5-434a-9fde-089c633f1c77",
  "object": "chat.completion",
  "created": 1726684053,
  "model": "omnio-chat-audio-preview",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Short summary of the podcast."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 1176,
    "completion_tokens": 53,
    "total_tokens": 1229
  }
}

Stream response:

data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"finish_reason":null}]}
 
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":"Response"},"finish_reason":null}]}
 
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
 
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":""},"finish_reason":null}]}
 
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
 
data: [DONE]

Chat completion object

Represents a chat completion response returned by model, based on the provided input.

idstring

A unique identifier for the chat completion.

choicesarray

A list of chat completion choices. There will be zero or one item.

finish_reasonstring

The reason the model stopped generating tokens. This will be stop if the model hit a natural stop point or length if the maximum number of tokens specified in the request was reached or the model size limit was hit.

indexinteger

The index of the choice in the list of choices.

messageobject

A chat completion message generated by the model.

contentstring or null

The contents of the message.

rolestring

The role of the author of this message.

createdinteger

The Unix timestamp (in seconds) of when the chat completion was created.

modelstring

The model used for the chat completion.

objectstring

The object type, which is always chat.completion.

usageobject

Usage statistics for the completion request.

completion_tokensinteger

Number of tokens in the generated completion.

prompt_tokensinteger

Number of tokens in the prompt.

total_tokensobject

Total number of tokens used in the request (prompt + completion).

Chat completion chunk object

Represents a streamed chunk of a chat completion response returned by model, based on the provided input.

idstring

A unique identifier for the chat completion. Each chunk has the same ID.

choicesarray

A list of chat completion choices. There will be zero or one item. Can also be empty for the last chunk if you set stream_options: {"include_usage": true}.

deltaobject

A chat completion delta generated by streamed model responses.

contentstring or null

The contents of the chunk message.

rolestring

The role of the author of this message.

finish_reasonstring

indexinteger

The index of the choice in the list of choices.

createdinteger

The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.

modelstring

The model used for the chat completion.

objectstring

The object type, which is always chat.completion.chunk.

usageobject

An optional field that will only be present when you set stream_options: {"include_usage": true} in your request. When present, it contains a null value except for the last chunk which contains the token usage statistics for the entire request.

completion_tokensinteger

Number of tokens in the generated completion.

prompt_tokensinteger

Number of tokens in the prompt.

total_tokensobject

Total number of tokens used in the request (prompt + completion).

Example

Download the audio file podcast.mp3 and update the path in the code examples below to point to your downloaded file.

import base64
import os
from openai import OpenAI
 
client = OpenAI(
    api_key=os.environ["SONIOX_API_KEY"],
    base_url="https://api.llm.soniox.com/v1",
)
 
with open("podcast.mp3", "rb") as audio_file:
    audio_data_b64 = base64.b64encode(audio_file.read()).decode("utf-8")
 
completion = client.chat.completions.create(
    model="omnio-chat-audio-preview",
    messages=[
        {
            "role": "user",
            "content": [
                {"audio_data_b64": audio_data_b64},
                {"text": "Write me a short summary of this audio file."},
            ],
        }
    ],
)
 
print(completion.choices[0].message.content)

API Reference

List models

Response

Model object

Example

Create chat completion

Request body

Response

Chat completion object

Chat completion chunk object

Example

On this page