Limitations#

Context Window#

The omnio-chat-audio-preview context window is constrained by the sum of the input and output tokens. The current model supports up to 45 minutes of input audio with up to 4096 input text tokens, and up to 16,384 output text tokens. Support for longer input audio coming soon.

When the model reaches its context limit, the finish reason will be labeled as length, similar to when the max_tokens parameter is reached.

Response timeout#

The model will abort an HTTP request if there is no response for more than 100 seconds. This can occur with long audio processing tasks (e.g., transcription of long audios). In such cases, enable streaming response and join the returned outputs on the client side. A request timing out will result in an HTTP status code 429.

HTTP request body size#

The maximum size of an HTTP request body is 512 MB. If your audio file exceeds this size, consider encoding the audio in a more efficient format, such as MP3.