API Reference
Learn how to use and integrate Soniox Omnio API.
Omnio API is fully compatible with OpenAI API. The API is available at https://api.llm.soniox.com/v1
. We suggest using the OpenAI client to interact with the API.
List models
Lists the currently available models and provides basic information about each one, such as the owner and availability.
Response
A list of model objects.
Model object
Describes a Omnio model offering that can be used with the API.
id
stringThe model identifier, which can be referenced in the API endpoints.
created
integerThe Unix timestamp (in seconds) when the model was created.
object
stringThe object type, which is always model
.
owned_by
stringThe organization that owns the model.
Example
Create chat completion
Given a list of messages comprising a conversation, the model will return a response.
Request body
messages
RequiredarrayA list of messages comprising the conversation.
System message
content
Requiredstring or arrayThe contents of the system message.
role
RequiredstringThe role of the messages author, in this case system.
name
stringAn optional name for the participant. Provides the model information to differentiate between participants of the same role.
User message
content
Requiredstring or arrayThe contents of the user message.
Text content
The text contents of the message.
If your SDK does not support custom content parts, you can include audio data inside text between <audio_data_b64>
and </audio_data_b64>
tags.
For best results, audio data must be placed before text.
Array of content parts
An array of content parts with a defined type, each can be of type text
or audio_data_b64
when passing in audio. You can pass multiple audio files by adding multiple audio_data_b64
content parts.
For best results, audio data must be placed before text.
Text content part
type
RequiredstringThe type of the content part, text
in this case.
text
RequiredstringThe text contents of the message.
If your SDK does not support custom content parts, you can include audio data inside text between <audio_data_b64>
and </audio_data_b64>
tags.
For best results, audio data must be placed before text.
Audio content part
type
RequiredstringThe type of the content part, audio_data_b64
in this case.
audio_data_b64
RequiredstringBase64 encoded audio data.
role
RequiredstringThe role of the messages author, in this case user.
name
stringAn optional name for the participant. Provides the model information to differentiate between participants of the same role.
Assistant message
content
Requiredstring or arrayThe contents of the assistant message.
role
RequiredstringThe role of the messages author, in this case assistant.
name
stringAn optional name for the participant. Provides the model information to differentiate between participants of the same role.
model
RequiredstringID of the model to use.
max_tokens
integer or nullThe maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.
stream
Default: falseboolean or nullIf set, partial message deltas will be sent. Tokens will be sent as data-only
server-sent
events
as they become available, with the stream terminated by a data: [DONE]
message.
stream_options
Default: nullobject or nullOptions for streaming response. Only set this when you set stream: true
.
include_usage
Default: falsebooleanIf set, an additional chunk will be streamed before the data: [DONE]
message. The usage
field on this chunk shows the token usage statistics for the entire request, and the choices
field will always be an empty array. All other chunks will also include a usage
field, but with a null
value.
temperature
Default: 1number or nullWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
We generally recommend altering this or top_p
but not both.
top_p
Default: 1number or nullAn alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p
probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature
but not both.
Response
Returns a chat completion object, or a streamed sequence of chat completion chunk objects if the request is streamed.
Stream response:
Chat completion object
Represents a chat completion response returned by model, based on the provided input.
id
stringA unique identifier for the chat completion.
choices
arrayA list of chat completion choices. There will be zero or one item.
finish_reason
stringThe reason the model stopped generating tokens. This will be stop
if the model hit a natural stop point or length
if the maximum number of tokens specified in the request was reached or the model size limit was hit.
index
integerThe index of the choice in the list of choices.
message
objectA chat completion message generated by the model.
content
string or nullThe contents of the message.
role
stringThe role of the author of this message.
created
integerThe Unix timestamp (in seconds) of when the chat completion was created.
model
stringThe model used for the chat completion.
object
stringThe object type, which is always chat.completion
.
usage
objectUsage statistics for the completion request.
completion_tokens
integerNumber of tokens in the generated completion.
prompt_tokens
integerNumber of tokens in the prompt.
total_tokens
objectTotal number of tokens used in the request (prompt + completion).
Chat completion chunk object
Represents a streamed chunk of a chat completion response returned by model, based on the provided input.
id
stringA unique identifier for the chat completion. Each chunk has the same ID.
choices
arrayA list of chat completion choices. There will be zero or one item. Can also be empty for the last chunk if you set stream_options: {"include_usage": true}
.
delta
objectA chat completion delta generated by streamed model responses.
content
string or nullThe contents of the chunk message.
role
stringThe role of the author of this message.
finish_reason
stringThe reason the model stopped generating tokens. This will be stop
if the model hit a natural stop point or length
if the maximum number of tokens specified in the request was reached or the model size limit was hit.
index
integerThe index of the choice in the list of choices.
created
integerThe Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
model
stringThe model used for the chat completion.
object
stringThe object type, which is always chat.completion.chunk
.
usage
objectAn optional field that will only be present when you set stream_options: {"include_usage": true}
in your request. When present, it contains a null value except for the last chunk which contains the token usage statistics for the entire request.
completion_tokens
integerNumber of tokens in the generated completion.
prompt_tokens
integerNumber of tokens in the prompt.
total_tokens
objectTotal number of tokens used in the request (prompt + completion).
Example
Download the audio file podcast.mp3
and update the path in the code examples below to point to your downloaded file.