OtherSDKsPythonFull SDK reference

Types

Soniox Python SDK - Types Reference


Token

Token metadata emitted during realtime streaming transcriptions.

Properties

PropertyTypeDescription
textstrThe transcribed text.
start_msint | NoneStart time in milliseconds relative to audio start.
end_msint | NoneEnd time in milliseconds relative to audio start.
confidencefloat | NoneConfidence score (0.0 to 1.0).
is_finalbool | NoneWhether this is a finalized token.
speakerstr | NoneSpeaker identifier (if diarization enabled).
translation_statusstr | NoneTranslation status of this token.
languagestr | NoneDetected language code (if language identification enabled).
source_languagestr | NoneSource language for translated tokens.

ApiError

Structured representation of a non-2xx API response payload.

Properties

PropertyTypeDescription
status_codeintHTTP status code.
error_typestrHigh-level error code (e.g., 'bad_request', 'quota_exceeded') for programmatic handling.
messagestrDetailed error message describing the failure.
validation_errorslist[ApiErrorValidationError]List of specific field validation failures, if applicable.
request_idstr | NoneUnique identifier for the request, useful for troubleshooting.
more_infostr | NoneOptional URL pointing to documentation for resolving this error.

ApiErrorValidationError

Details a single validation error reported by the Soniox API.

Properties

PropertyTypeDescription
error_typestrThe category of validation error.
locationstrThe location of the error, e.g. ['body', 'audio_url'].
messagestrA human-readable description of the validation failure.

CreateTemporaryApiKeyPayload

Payload for requesting a temporary API key (e.g., websocket).

Properties

PropertyTypeDescription
usage_typeTemporaryApiKeyUsageTypeIntended usage of the temporary API key.
expires_in_secondsintDuration in seconds until the temporary API key expires
client_reference_idstr | NoneOptional tracking identifier string. Does not need to be unique
single_usebool | NoneWhen true, restricts the temporary API key to a single use.
max_session_duration_secondsint | NoneMaximum connection duration in seconds for WebSocket and TTS HTTP streaming endpoints.

CreateTemporaryApiKeyResponse

Response data for a temp API key request.

Properties

PropertyTypeDescription
api_keystrCreated temporary API key.
expires_atdatetimeUTC timestamp indicating when generated temporary API key will expire

CreateTtsPayload

Payload sent to generate speech audio from text via REST.

Properties

PropertyTypeDescription
modelstrText-to-Speech model to use.
languagestrLanguage code for Text-to-Speech (e.g., "en").
voicestrVoice identifier to generate speech audio with.
audio_formatTtsAudioFormatRequested output audio format.
sample_rateTtsSampleRate | NoneOutput sample rate in Hz.
bitrateTtsBitrate | NoneOutput bitrate in bits-per-second for compressed formats.
textstrInput text to generate into speech.

ConcurrencyCurrentValues

Live counts of concurrent sessions.

Properties

PropertyTypeDescription
transcribe_concurrentintNumber of concurrent realtime STT sessions currently active.
tts_concurrentintNumber of concurrent realtime TTS sessions currently active.

ConcurrencyLimitValues

Configured concurrency limits. None means no limit.

Properties

PropertyTypeDescription
transcribe_concurrentint | NoneMaximum concurrent realtime STT sessions, or None if unlimited.
tts_concurrentint | NoneMaximum concurrent realtime TTS sessions, or None if unlimited.

ConcurrencyScopeValues

Current and limit values for a single scope (project or organization).

Properties

PropertyTypeDescription
currentConcurrencyCurrentValuesLive counts of active sessions.
limitsConcurrencyLimitValuesConfigured limits.

CreateTtsConfig

Helper config used when building Text-to-Speech payloads.

Properties

PropertyTypeDescription
modelstr | NoneText-to-Speech model to use.
languagestr | NoneLanguage code for Text-to-Speech (e.g., "en").
voicestr | NoneVoice identifier to generate speech audio with.
audio_formatTtsAudioFormat | NoneRequested output audio format.
sample_rateTtsSampleRate | NoneOutput sample rate in Hz.
bitrateTtsBitrate | NoneOutput bitrate in bits-per-second for compressed formats.

CreateTranscriptionPayload

Payload sent to create an asynchronous transcription job.

Properties

PropertyTypeDescription
modelstrSpeech-to-text model to use.
audio_urlstr | NoneURL of a publicly accessible audio file.
file_idstr | NoneID of a previously uploaded file (UUID).
language_hintslist[str] | NoneArray of expected ISO language codes to bias recognition.
language_hints_strictbool | NoneWhen true, model relies more heavily on language hints (best results with one language hint set).
enable_speaker_diarizationbool | NoneEnable speaker diarization to identify different speakers.
enable_language_identificationbool | NoneEnable automatic language identification.
translationTranslationConfigInput | NoneTranslation configuration.
contextStructuredContextInput | NoneAdditional context to improve transcription accuracy and formatting of specialized terms.
webhook_urlstr | NoneURL to receive webhook notifications when transcription is completed or fails.
webhook_auth_header_namestr | NoneName of the authentication header sent with webhook notifications
webhook_auth_header_valuestr | NoneAuthentication header value sent with webhook notifications.
client_reference_idstr | NoneOptional tracking identifier.

CreateTranscriptionConfig

Helper config used when building transcription payloads.

Properties

PropertyTypeDescription
modelstr | NoneSpeech-to-text model to use.
language_hintslist[str] | NoneArray of expected ISO language codes to bias recognition.
language_hints_strictbool | NoneWhen true, model relies more heavily on language hints.
enable_speaker_diarizationbool | NoneEnable speaker diarization to identify different speakers.
enable_language_identificationbool | NoneEnable automatic language identification
translationTranslationConfigInput | NoneTranslation configuration
contextStructuredContextInput | NoneAdditional context to improve transcription accuracy and formatting of specialized terms.
webhook_urlstr | NoneURL to receive webhook notifications when transcription is completed or fails.
webhook_auth_header_namestr | NoneName of the authentication header sent with webhook notifications
webhook_auth_header_valuestr | NoneAuthentication header value sent with webhook notifications
client_reference_idstr | NoneOptional tracking identifier

File

Metadata describing an uploaded file in the Soniox API.

Properties

PropertyTypeDescription
idstrUnique identifier of the file (UUID).
filenamestrName of the file.
sizeintSize of the file in bytes.
created_atdatetimeUTC timestamp indicating when the file was uploaded.
client_reference_idstr | NoneOptional tracking identifier string.

GetConcurrencyLimitsResponse

Response returned when fetching concurrency limits.

Properties

PropertyTypeDescription
projectConcurrencyScopeValuesProject-scoped current counts and configured limits.
organizationConcurrencyScopeValuesOrganization-scoped current counts and configured limits.

GetFilesCountResponse

Breakdown of uploaded file counts by source.

Properties

PropertyTypeDescription
totalintTotal number of files across all sources.
public_apiintNumber of files uploaded via Public API.
playgroundintNumber of files uploaded via the Playground.

GetFilesPayload

Parameters accepted by the file listing endpoint.

Properties

PropertyTypeDescription
limitintMaximum number of files to return.
cursorstr | NonePagination cursor for the next page of results.

GetFilesResponse

Paginated response returned when listing uploaded files.

Properties

PropertyTypeDescription
fileslist[File]List of uploaded files.
next_page_cursorstr | NoneA pagination token that references the next page of results. When None, no additional results are available.

GetModelsResponse

Response returned when listing available models.

Properties

PropertyTypeDescription
modelslist[Model]List of all available models.

GetTTSModelsResponse

GetTTSModelsResponse = GetTtsModelsResponse

GetTtsModelsResponse

Response returned when listing available Text-to-Speech models.

Properties

PropertyTypeDescription
modelslist[TtsModel]List of available Text-to-Speech models.

GetTranscriptionsCountResponse

Breakdown of transcription counts by scope.

Properties

PropertyTypeDescription
totalintTotal number of transcriptions across all scopes.
public_apiintNumber of transcriptions created via Public API.
playgroundintNumber of transcriptions created via the Playground.

GetTranscriptionsPayload

Parameters for listing transcription jobs.

Properties

PropertyTypeDescription
limitintMaximum number of transcriptions to return.
cursorstr | NonePagination cursor for the next page of results.

GetTranscriptionsResponse

Paginated response for transcription listings.

Properties

PropertyTypeDescription
transcriptionslist[Transcription]List of transcriptions.
next_page_cursorstr | NoneA pagination token that references the next page of results. When None, no additional results are available.

GetUsageLogsPayload

Parameters accepted by the usage logs listing endpoint.

Properties

PropertyTypeDescription
start_timestrStart of the time window (inclusive). Filters by request end time.
end_timestrEnd of the time window (exclusive). Filters by request end time.
limitintMaximum number of usage log entries to return.
sortUsageLogsSortSort order by end_time. Use end_time_desc to get the most recent entries first.
cursorstr | NonePagination cursor for the next page of results.

GetUsageLogsResponse

Paginated response for usage-log listings.

Properties

PropertyTypeDescription
usage_logslist[UsageLogEntry]Per-request usage log entries ordered by end_time.
next_page_cursorstr | NonePagination cursor for the next page of results. None if no more pages.

Language

Represents a supported language for transcription or translation.

Properties

PropertyTypeDescription
codestr2-letter language code (ISO format).
namestrLanguage name.

Model

Describes a Soniox transcription model.

Properties

PropertyTypeDescription
idstrUnique identifier of the model.
aliased_model_idstr | NoneIf this is an alias, the id of the aliased model. None for non-alias models.
namestrName of the model.
context_versionint | NoneVersion of context supported.
transcription_modeTranscriptionModeTranscription mode of the model.
languageslist[Language]List of languages supported by the model.
supports_language_hints_strictboolIf model supports 'language_hints_strict' option.
supports_max_endpoint_delayboolIf model supports 'max_endpoint_delay_ms' option.
translation_targetslist[TranslationTarget]List of supported one-way translation targets. If list is empty, check for one_way_translation field.
two_way_translation_pairslist[str]List of supported two-way translation pairs. If list is empty, check for one_way_translation field.
one_way_translationstr | NoneWhen contains string 'all_languages', any language from languages can be used
two_way_translationstr | NoneWhen contains string 'all_languages',' any language pair from languages can be used

RealtimeSTTAudioFormat

RealtimeSTTAudioFormat = Literal["auto"] | RealtimeSTTHeaderFormat | RealtimeSTTRawFormat

Audio formats accepted by the realtime STT websocket.


RealtimeSTTHeaderFormat

RealtimeSTTHeaderFormat = Literal[
    "aac", "aiff", "amr", "asf", "flac", "mp3", "ogg", "wav", "webm",
]

Container formats whose header carries sample rate and channels.


RealtimeSTTRawFormat

RealtimeSTTRawFormat = Literal[
    "pcm_s8",
    "pcm_s16le", "pcm_s16be",
    "pcm_s24le", "pcm_s24be",
    "pcm_s32le", "pcm_s32be",
    "pcm_u8",
    "pcm_u16le", "pcm_u16be",
    "pcm_u24le", "pcm_u24be",
    "pcm_u32le", "pcm_u32be",
    "pcm_f32le", "pcm_f32be",
    "pcm_f64le", "pcm_f64be",
    "mulaw", "alaw",
]

Raw formats with no header - require sample_rate and num_channels.


StructuredContext

Optional structured context provided to the transcription engine.

For ergonomics, general and translation_terms also accept a plain dict in addition to the typed item lists:

  • general={"domain": "Healthcare"} (dict of key -> value)
  • translation_terms={"Mr. Smith": "Sr. Smith"} (dict of source -> target)

Properties

PropertyTypeDescription
generalAnnotated[StructuredContextGeneralInput, Field(union_mode='left_to_right')] | NoneStructured key-value pairs describing domain, topic, intent, participant names, etc.
textstr | NoneLonger free-form background text, prior interaction history, reference documents, or meeting notes.
termslist[str] | NoneDomain-specific or uncommon words to recognize.
translation_termsAnnotated[StructuredContextTranslationTermsInput, Field(union_mode='left_to_right')] | NoneCustom translations for ambiguous terms.

StructuredContextGeneralInput

StructuredContextGeneralInput = list[StructuredContextGeneralItem] | dict[str, str]

Accepted input shapes for StructuredContext.general.


StructuredContextGeneralItem

Single general context key/value pair for transcription context.

Properties

PropertyTypeDescription
keystrThe key describing the context type (e.g., "domain", "topic", "doctor").
valuestrThe value for the context key.

StructuredContextInput

StructuredContextInput = StructuredContext | dict[str, Any]

Accepted input for the context field - typed object or a plain dict.


StructuredContextTranslationTerm

Defines a translation term mapping used in structured context.

Properties

PropertyTypeDescription
sourcestrThe source term to translate.
targetstrThe target translation for the term.

StructuredContextTranslationTermsInput

StructuredContextTranslationTermsInput = list[StructuredContextTranslationTerm] | dict[str, str]

Accepted input shapes for StructuredContext.translation_terms.


Transcription

Represents a transcription job tracked by Soniox.

Properties

PropertyTypeDescription
idstrUnique identifier of the transcription (UUID).
statusTranscriptionStatusCurrent status of the transcription.
created_atdatetimeUTC timestamp when the transcription was created.
modelstrSpeech-to-text model used.
audio_urlstr | NoneURL of the audio file being transcribed.
file_idstr | NoneID of the uploaded file being transcribed (UUID).
filenamestrName of the file being transcribed.
language_hintslist[str] | NoneExpected languages in the audio. If not specified, languages are automatically detected.
enable_speaker_diarizationboolWhen true, speakers are identified and separated in the transcription output.
enable_language_identificationboolWhen true, language is detected for each part of the transcription.
audio_duration_msint | NoneDuration of the audio in milliseconds. Only available after processing begins.
error_typestr | NoneError type if transcription failed. None for successful or in-progress transcriptions.
error_messagestr | NoneError message if transcription failed. None for successful or in-progress transcriptions.
webhook_urlstr | NoneURL to receive webhook notifications when transcription is completed or fails.
webhook_auth_header_namestr | NoneName of the authentication header sent with webhook notifications.
webhook_auth_header_valuestr | NoneAuthentication header value. Always returned masked.
webhook_status_codeint | NoneHTTP status code received from your server when webhook was delivered. None if not yet sent.
client_reference_idstr | NoneOptional tracking identifier.

TranscriptionStatus

TranscriptionStatus = Literal["queued", "processing", "completed", "error"]

Current status of the transcription job.


TranscriptionTranscript

Transcript data including the full text and tokens.

Properties

PropertyTypeDescription
idstrUnique identifier of the transcription this transcript belongs to (UUID).
textstrComplete transcribed text content.
tokenslist[Token]List of detailed token information with timestamps and metadata.

TranslationConfig

Configuration describing how translation should be performed.

Properties

PropertyTypeDescription
typeTranslationTypeTranslation type.
target_languagestr | NoneTarget language code for translation (e.g., "fr", "es", "de") (one_way).
language_astr | NoneFirst language code (two_way).
language_bstr | NoneSecond language code (two_way).

validate_logic()

validate_logic() -> TranslationConfig

Returns

TranslationConfig


TranslationConfigInput

TranslationConfigInput = TranslationConfig | dict[str, Any]

Accepted input for the translation field - typed object or a plain dict.


TranslationTarget

Describes translation targets offered by a model.

Properties

PropertyTypeDescription
target_languagestrTarget language code for translation (e.g., "fr", "es", "de") (one_way).
source_languageslist[str]List of source language codes.
exclude_source_languageslist[str]Source language codes excluded for this target.

TranslationType

TranslationType = Literal["one_way", "two_way"]

Supported translation configuration types.


TTSModel

TTSModel = TtsModel

TTSVoice

TTSVoice = TtsVoice

TtsAudioFormat

TtsAudioFormat = Literal[
    "pcm_f32le",
    "pcm_s16le",
    "pcm_mulaw",
    "pcm_alaw",
    "wav",
    "aac",
    "mp3",
    "opus",
    "flac",
]

Allowed audio formats for Text-to-Speech output.


TtsBitrate

TtsBitrate = Literal[32000, 64000, 96000, 128000, 192000, 256000, 320000]

Allowed output bitrates in bits-per-second for compressed Text-to-Speech formats.


TtsModel

Represents a Text-to-Speech model.

Properties

PropertyTypeDescription
idstrUnique identifier of the model.
aliased_model_idstr | NoneIf this is an alias, the id of the aliased model. None for non-alias models.
namestrName of the model.
voiceslist[TtsVoice]Voices supported by this model.
languageslist[Language]Languages supported by this model.

TtsSampleRate

TtsSampleRate = Literal[8000, 16000, 24000, 44100, 48000]

Allowed output sample rates in Hz for Text-to-Speech.


TtsVoice

Represents a Text-to-Speech voice.

Properties

PropertyTypeDescription
idstrUnique identifier of the voice.
descriptionstrDescription of the voice.
genderTtsVoiceGenderGender of the voice.

TtsVoiceGender

TtsVoiceGender = Literal["male", "female", "neutral"]

Reported gender of a Text-to-Speech voice.


TemporaryApiKeyUsageType

TemporaryApiKeyUsageType = Literal["transcribe_websocket", "tts_rt"]

Intended usage for temporary API keys.


UploadFilePayload

Optional metadata supplied at upload time.

Properties

PropertyTypeDescription
client_reference_idstr | NoneOptional tracking identifier string. Does not need to be unique

UsageLogEntry

A single usage-log entry describing one API request.

Properties

PropertyTypeDescription
uuidstrUnique identifier of the request.
request_scopestrScope of the request (api / playground).
client_reference_idstrClient reference ID supplied on the original request. Empty string if none.
modelstrModel identifier.
start_timedatetimeWhen the request started.
end_timedatetimeWhen the request ended.
input_text_tokensint-
input_audio_tokensint-
input_audio_duration_msint-
output_text_tokensint-
output_audio_tokensint-
output_audio_duration_msint-
cost_usdstr-
input_cost_usdstr-
input_text_cost_usdstr-
input_audio_cost_usdstr-
output_cost_usdstr-
output_text_cost_usdstr-
output_audio_cost_usdstr-

UsageLogsSort

UsageLogsSort = Literal["end_time_asc", "end_time_desc"]

Sort order for usage-log entries by end_time.


RealtimeEvent

Event payload received from the realtime STT websocket.

Properties

PropertyTypeDescription
tokenslist[Token]Tokens in this result.
final_audio_proc_msint | NoneMilliseconds of audio that have been finalized.
total_audio_proc_msint | NoneTotal milliseconds of audio processed.
finishedboolWhether this is the final result (session ending).
error_codeint | NoneError code if the realtime operation failed.
error_messagestr | NoneHuman-readable description of the error.

validate_event()

validate_event(raw: str | bytes) -> RealtimeEvent

Parameters

ParameterTypeDescription
rawstr | bytesRaw event payload from the realtime API.

Returns

RealtimeEvent


RealtimeSTTConfig

Configuration for initiating a realtime transcription session.

Properties

PropertyTypeDescription
api_keystr | NoneAPI key for real-time sessions.
modelstrSpeech-to-text model to use.
audio_formatRealtimeSTTAudioFormatAudio format. Use 'auto' for automatic detection of container formats.
num_channelsint | NoneNumber of audio channels (required for raw audio formats).
sample_rateint | NoneSample rate in Hz (required for PCM formats).
language_hintslist[str] | NoneExpected languages in the audio (ISO language codes).
language_hints_strictbool | NoneWhen true, recognition is strongly biased toward language hints (best results when using one language in language_hints).
contextStructuredContextInput | NoneAdditional context to improve transcription accuracy.
enable_speaker_diarizationbool | NoneEnable speaker identification.
enable_language_identificationbool | NoneEnable automatic language detection.
enable_endpoint_detectionbool | NoneEnable endpoint detection for utterance boundaries.
max_endpoint_delay_msint | NoneMaximum delay between the end of speech and returned endpoint. Allowed values for maximum delay are between 500ms and 3000ms. The default value is 2000ms
translationTranslationConfigInput | NoneTranslation configuration.
client_reference_idstr | NoneOptional tracking identifier (max 256 chars).

build_payload()

build_payload(api_key: str) -> RealtimeSTTConfig

Parameters

ParameterTypeDescription
api_keystrAPI key used for authentication.

Returns

RealtimeSTTConfig


RealtimeTTSConfig

Configuration for initiating a realtime Text-to-Speech stream.

Properties

PropertyTypeDescription
api_keystr | NoneAPI key for real-time sessions.
stream_idstrClient stream identifier unique among active streams on a connection.
modelstrText-to-Speech model to use.
languagestrLanguage code for Text-to-Speech (e.g., "en").
voicestrVoice identifier to generate speech audio with.
audio_formatTtsAudioFormatRequested output audio format.
sample_rateTtsSampleRate | NoneOutput sample rate in Hz.
bitrateTtsBitrate | NoneOutput bitrate in bits-per-second for compressed formats.

build_payload()

build_payload(api_key: str) -> RealtimeTTSConfig

Parameters

ParameterTypeDescription
api_keystrAPI key used for authentication.

Returns

RealtimeTTSConfig


RealtimeTTSEvent

Event payload received from the realtime Text-to-Speech websocket.

Properties

PropertyTypeDescription
stream_idstr | NoneStream identifier associated with this event.
audiostr | NoneBase64 encoded audio chunk, when present.
audio_endboolWhether this event contains the last audio payload for the stream.
terminatedboolWhether the stream has been fully terminated.
error_codeint | NoneError code if the Text-to-Speech stream failed.
error_messagestr | NoneHuman-readable error message.

validate_event()

validate_event(raw: str | bytes) -> RealtimeTTSEvent

Parameters

ParameterTypeDescription
rawstr | bytesRaw event payload from the realtime API.

Returns

RealtimeTTSEvent


audio_bytes()

audio_bytes() -> bytes | None

Decode and return the audio bytes for this event, if present.

Returns

bytes | None


RealtimeTTSTextMessage

Text chunk message sent over realtime Text-to-Speech websocket.

Properties

PropertyTypeDescription
textstrText chunk to generate into speech.
text_endboolWhether this message marks the final text chunk for the stream.
stream_idstrStream identifier the chunk belongs to.

Headers

Headers = Mapping[str, str]

WebhookAuthConfig

Configuration for webhook authentication headers.

Properties

PropertyTypeDescription
namestrExpected header name (case-insensitive comparison).
valuestrExpected header value (exact match).

WebhookEvent

Basic webhook event metadata.

Properties

PropertyTypeDescription
idstrTranscription ID (UUID).
statusLiteral['completed', 'error']Transcription result status.

On this page

Token
Properties
ApiError
Properties
ApiErrorValidationError
Properties
CreateTemporaryApiKeyPayload
Properties
CreateTemporaryApiKeyResponse
Properties
CreateTtsPayload
Properties
ConcurrencyCurrentValues
Properties
ConcurrencyLimitValues
Properties
ConcurrencyScopeValues
Properties
CreateTtsConfig
Properties
CreateTranscriptionPayload
Properties
CreateTranscriptionConfig
Properties
File
Properties
GetConcurrencyLimitsResponse
Properties
GetFilesCountResponse
Properties
GetFilesPayload
Properties
GetFilesResponse
Properties
GetModelsResponse
Properties
GetTTSModelsResponse
GetTtsModelsResponse
Properties
GetTranscriptionsCountResponse
Properties
GetTranscriptionsPayload
Properties
GetTranscriptionsResponse
Properties
GetUsageLogsPayload
Properties
GetUsageLogsResponse
Properties
Language
Properties
Model
Properties
RealtimeSTTAudioFormat
RealtimeSTTHeaderFormat
RealtimeSTTRawFormat
StructuredContext
Properties
StructuredContextGeneralInput
StructuredContextGeneralItem
Properties
StructuredContextInput
StructuredContextTranslationTerm
Properties
StructuredContextTranslationTermsInput
Transcription
Properties
TranscriptionStatus
TranscriptionTranscript
Properties
TranslationConfig
Properties
validate_logic()
TranslationConfigInput
TranslationTarget
Properties
TranslationType
TTSModel
TTSVoice
TtsAudioFormat
TtsBitrate
TtsModel
Properties
TtsSampleRate
TtsVoice
Properties
TtsVoiceGender
TemporaryApiKeyUsageType
UploadFilePayload
Properties
UsageLogEntry
Properties
UsageLogsSort
RealtimeEvent
Properties
validate_event()
RealtimeSTTConfig
Properties
build_payload()
RealtimeTTSConfig
Properties
build_payload()
RealtimeTTSEvent
Properties
validate_event()
audio_bytes()
RealtimeTTSTextMessage
Properties
Headers
WebhookAuthConfig
Properties
WebhookEvent
Properties