API Reference#
API Key#
Export your Soniox API Key as SONIOX_API_KEY
environment variable:
export SONIOX_API_KEY="your_soniox_api_key_here"
Models#
Lists and describes the various models available in the API.
List models#
GET https://api.llm.soniox.com/v1/models
Lists the currently available models and provides basic information about each one, such as the owner and availability.
Examples#
Python:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SONIOX_API_KEY"],
base_url="https://api.llm.soniox.com/v1",
)
client.models.list()
curl:
curl https://api.llm.soniox.com/v1/models \
-H "Authorization: Bearer $SONIOX_API_KEY"
Response#
A list of model objects.
{
"object": "list",
"data": [
{
"id": "omnio-chat-audio-preview",
"object": "model",
"created": 1728482400,
"owned_by": "system"
}
]
}
The model object#
Describes a Soniox Omnio model offering that can be used with the API.
id string
The model identifier, which can be referenced in the API endpoints.
created integer
The Unix timestamp (in seconds) when the model was created.
object string
The object type, which is always "model"
.
owned_by string
The organization that owns the model.
{
"id": "omnio-chat-audio-preview",
"object": "model",
"created": 1720434075,
"owned_by": "system"
}
Chat#
Given a list of messages comprising a conversation, the model will return a response.
Create chat completion#
POST https://api.llm.soniox.com/v1/chat/completions
Lists the currently available models, and provides basic information about each one such as the owner and availability.
Request body#
messages array Required
A list of messages comprising the conversation.
System message object
content string or array Required
The contents of the system message.
role string Required
The role of the messages author, in this case system.
name string Optional
An optional name for the participant. Provides the model information to differentiate between participants of the same role.
User message object
content string or array Required
The contents of the user message.
Text content string
The text contents of the message.
If your SDK does not support custom
content parts, you can include audio data inside text
between <audio_data_b64>
and </audio_data_b64>
tags.
For best results, audio data must be placed before text.
<audio_data_b64>BASE_64_ENCODED_AUDIO_DATA</audio_data_b64>
Write me a short summary of this audio file.
Array of content parts string
An array of content parts with a defined type, each can be of type
text
or audio_data_b64
when passing in audio. You can pass
multiple audio files by adding multiple audio_data_b64
content
parts.
For best results, audio data must be placed before text.
Text content part object
type string Required
The type of the content part, text
in this case.
text string Required
The text contents of the message.
If your SDK does not support custom
content parts, you can include audio data inside text
between <audio_data_b64>
and </audio_data_b64>
tags.
For best results, audio data must be placed before text.
<audio_data_b64>BASE_64_ENCODED_AUDIO_DATA</audio_data_b64>
Write me a short summary of this audio file.
Audio content part object
type string Required
The type of the content part, audio_data_b64
in this case.
audio_data_b64 string Required
Base64 encoded audio data.
role string Required
The role of the messages author, in this case user.
name string Optional
An optional name for the participant. Provides the model information to differentiate between participants of the same role.
Assistant message object
content string or array Required
The contents of the assistant message.
role string Required
The role of the messages author, in this case assistant.
name string Optional
An optional name for the participant. Provides the model information to differentiate between participants of the same role.
model string Required
ID of the model to use.
max_tokens integer or null Optional
The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
stream boolean or null Optional Defaults to false
If set, partial message deltas will be sent. Tokens will be
sent as data-only server-sent events
as they become available, with the stream terminated by a data: [DONE]
message.
stream_options object or null Optional Defaults to null
Options for streaming response. Only set this when you set stream: true
.
include_usage boolean Optional
If set, an additional chunk will be streamed before the data: [DONE]
message. The usage
field on this chunk shows the token usage statistics
for the entire request, and the choices
field will always be an empty
array. All other chunks will also include a usage
field, but with a
null
value.
temperature number or null Optional Defaults to 1
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p
but not both.
top_p number or null Optional Defaults to 1
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature
but not both.
Examples#
Download the audio file podcast.mp3 and update the path in the code examples below to point to your downloaded file.
Python:
import base64
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SONIOX_API_KEY"],
base_url="https://api.llm.soniox.com/v1",
)
with open("podcast.mp3", "rb") as audio_file:
audio_data_b64 = base64.b64encode(audio_file.read()).decode("utf-8")
completion = client.chat.completions.create(
model="omnio-chat-audio-preview",
messages=[
{
"role": "user",
"content": [
{"audio_data_b64": audio_data_b64},
{"text": "Write me a short summary of this audio file."},
],
}
],
)
print(completion.choices[0].message.content)
Python (streaming response):
import base64
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["SONIOX_API_KEY"],
base_url="https://api.llm.soniox.com/v1",
)
with open("podcast.mp3", "rb") as audio_file:
audio_data_b64 = base64.b64encode(audio_file.read()).decode("utf-8")
response = client.chat.completions.create(
model="omnio-chat-audio-preview",
messages=[
{
"role": "user",
"content": [
{"audio_data_b64": audio_data_b64},
{"text": "Write me a short summary of this audio file."},
],
}
],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
else:
print()
curl:
curl https://api.llm.soniox.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $SONIOX_API_KEY" \
-d @- <<EOF
{
"model": "omnio-chat-audio-preview",
"messages": [
{
"role": "user",
"content": [
{
"audio_data_b64": "$(cat podcast.mp3 | base64 | tr -d '\n')"
},
{
"text": "Write me a short summary of this audio file."
}
]
}
]
}
EOF
Response#
Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.
Single response:
{
"id": "cmpl-b2eb62cf-50c5-434a-9fde-089c633f1c77",
"object": "chat.completion",
"created": 1726684053,
"model": "omnio-chat-audio-preview",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Short summary of the podcast."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 1176,
"completion_tokens": 53,
"total_tokens": 1229
}
}
Stream response:
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"role":"assistant","content":"","refusal":null},"finish_reason":null}]}
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":"Response"},"finish_reason":null}]}
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":"."},"finish_reason":null}]}
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{"content":""},"finish_reason":null}]}
data: {"id":"cmpl-0849fc6c-2a79-4b76-9465-5e9c35722c77","object":"chat.completion.chunk","created":1726684137,"model":"omnio-chat-audio-preview","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
The chat completion object#
Represents a chat completion response returned by model, based on the provided input.
id string
A unique identifier for the chat completion.
choices array
A list of chat completion choices. There will be zero or one item.
finish_reason string
The reason the model stopped generating tokens. This will be stop
if
the model hit a natural stop point or length
if the maximum number of
tokens specified in the request was reached or the model size limit was
hit.
index integer
The index of the choice in the list of choices.
message object
A chat completion message generated by the model.
content string or null
The contents of the message.
role string
The role of the author of this message.
created integer
The Unix timestamp (in seconds) of when the chat completion was created.
model string
The model used for the chat completion.
object string
The object type, which is always chat.completion
.
usage object
Usage statistics for the completion request.
completion_tokens integer
Number of tokens in the generated completion.
prompt_tokens integer
Number of tokens in the prompt.
total_tokens object
Total number of tokens used in the request (prompt + completion).
The chat completion chunk object#
Represents a streamed chunk of a chat completion response returned by model, based on the provided input.
id string
A unique identifier for the chat completion. Each chunk has the same ID.
choices array
A list of chat completion choices. There will be zero or one item. Can also be empty for the last chunk if you set stream_options: {"include_usage": true}
.
delta object
A chat completion delta generated by streamed model responses.
content string or null
The contents of the chunk message.
role string
The role of the author of this message.
finish_reason string
The reason the model stopped generating tokens. This will be stop
if
the model hit a natural stop point or length
if the maximum number of
tokens specified in the request was reached or the model size limit was
hit.
index integer
The index of the choice in the list of choices.
created integer
The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
model string
The model used for the chat completion.
object string
The object type, which is always chat.completion.chunk
.
usage object
An optional field that will only be present when you set stream_options: {"include_usage": true}
in your request. When present, it contains a null value except for the last chunk which contains the token usage statistics for the entire request.
completion_tokens integer
Number of tokens in the generated completion.
prompt_tokens integer
Number of tokens in the prompt.
total_tokens object
Total number of tokens used in the request (prompt + completion).