Soniox
SDKsPythonFull SDK reference

Types

Soniox Python SDK - Types Reference


Token

Token metadata emitted during realtime streaming transcriptions.

Properties

PropertyTypeDescription
textstrThe transcribed text.
start_msint | NoneStart time in milliseconds relative to audio start.
end_msint | NoneEnd time in milliseconds relative to audio start.
confidencefloat | NoneConfidence score (0.0 to 1.0).
is_finalbool | NoneWhether this is a finalized token.
speakerstr | NoneSpeaker identifier (if diarization enabled).
translation_statusstr | NoneTranslation status of this token.
languagestr | NoneDetected language code (if language identification enabled).
source_languagestr | NoneSource language for translated tokens.

ApiError

Structured representation of a non-2xx API response payload.

Properties

PropertyTypeDescription
status_codeintHTTP status code.
error_typestrHigh-level error code (e.g., 'bad_request', 'quota_exceeded') for programmatic handling.
messagestrDetailed error message describing the failure.
validation_errorslist[ApiErrorValidationError]List of specific field validation failures, if applicable.
request_idstr | NoneUnique identifier for the request, useful for troubleshooting.

ApiErrorValidationError

Details a single validation error reported by the Soniox API.

Properties

PropertyTypeDescription
error_typestrThe category of validation error.
locationstrThe location of the error, e.g. ['body', 'audio_url'].
messagestrA human-readable description of the validation failure.

CreateTemporaryApiKeyPayload

Payload for requesting a temporary API key (e.g., websocket).

Properties

PropertyTypeDescription
usage_typeTemporaryApiKeyUsageTypeIntended usage of the temporary API key.
expires_in_secondsintDuration in seconds until the temporary API key expires
client_reference_idstr | NoneOptional tracking identifier string. Does not need to be unique

CreateTemporaryApiKeyResponse

Response data for a temp API key request.

Properties

PropertyTypeDescription
api_keystrCreated temporary API key.
expires_atdatetimeUTC timestamp indicating when generated temporary API key will expire

CreateTranscriptionPayload

Payload sent to create an asynchronous transcription job.

Properties

PropertyTypeDescription
modelstrSpeech-to-text model to use.
audio_urlstr | NoneURL of a publicly accessible audio file.
file_idstr | NoneID of a previously uploaded file (UUID).
language_hintslist[str] | NoneArray of expected ISO language codes to bias recognition.
language_hints_strictbool | NoneWhen true, model relies more heavily on language hints (best results with one language hint set).
enable_speaker_diarizationbool | NoneEnable speaker diarization to identify different speakers.
enable_language_identificationbool | NoneEnable automatic language identification.
translationTranslationConfig | NoneTranslation configuration.
contextStructuredContext | NoneAdditional context to improve transcription accuracy and formatting of specialized terms.
webhook_urlstr | NoneURL to receive webhook notifications when transcription is completed or fails.
webhook_auth_header_namestr | NoneName of the authentication header sent with webhook notifications
webhook_auth_header_valuestr | NoneAuthentication header value sent with webhook notifications.
client_reference_idstr | NoneOptional tracking identifier.

CreateTranscriptionConfig

Helper config used when building transcription payloads.

Properties

PropertyTypeDescription
modelstr | NoneSpeech-to-text model to use.
language_hintslist[str] | NoneArray of expected ISO language codes to bias recognition.
language_hints_strictbool | NoneWhen true, model relies more heavily on language hints.
enable_speaker_diarizationbool | NoneEnable speaker diarization to identify different speakers.
enable_language_identificationbool | NoneEnable automatic language identification
translationTranslationConfig | NoneTranslation configuration
contextStructuredContext | NoneAdditional context to improve transcription accuracy and formatting of specialized terms.
webhook_urlstr | NoneURL to receive webhook notifications when transcription is completed or fails.
webhook_auth_header_namestr | NoneName of the authentication header sent with webhook notifications
webhook_auth_header_valuestr | NoneAuthentication header value sent with webhook notifications
client_reference_idstr | NoneOptional tracking identifier

File

Metadata describing an uploaded file in the Soniox API.

Properties

PropertyTypeDescription
idstrUnique identifier of the file (UUID).
filenamestrName of the file.
sizeintSize of the file in bytes.
created_atdatetimeUTC timestamp indicating when the file was uploaded.
client_reference_idstr | NoneOptional tracking identifier string.

GetFilesPayload

Parameters accepted by the file listing endpoint.

Properties

PropertyTypeDescription
limitintMaximum number of files to return.
cursorstr | NonePagination cursor for the next page of results.

GetFilesResponse

Paginated response returned when listing uploaded files.

Properties

PropertyTypeDescription
fileslist[File]List of uploaded files.
next_page_cursorstr | NoneA pagination token that references the next page of results. When None, no additional results are available.

GetModelsResponse

Response returned when listing available models.

Properties

PropertyTypeDescription
modelslist[Model]List of all available models.

GetTranscriptionsPayload

Parameters for listing transcription jobs.

Properties

PropertyTypeDescription
limitintMaximum number of transcriptions to return.
cursorstr | NonePagination cursor for the next page of results.

GetTranscriptionsResponse

Paginated response for transcription listings.

Properties

PropertyTypeDescription
transcriptionslist[Transcription]List of transcriptions.
next_page_cursorstr | NoneA pagination token that references the next page of results. When None, no additional results are available.

Model

Describes a Soniox transcription model.

Properties

PropertyTypeDescription
idstrUnique identifier of the model.
aliased_model_idstr | NoneIf this is an alias, the id of the aliased model. None for non-alias models.
namestrName of the model.
context_versionint | NoneVersion of context supported.
transcription_modeTranscriptionModeTranscription mode of the model.
languageslist[Language]List of languages supported by the model.
supports_language_hints_strictboolIf model supports 'language_hints_strict' option.
translation_targetslist[TranslationTarget]List of supported one-way translation targets. If list is empty, check for one_way_translation field.
two_way_translation_pairslist[str]List of supported two-way translation pairs. If list is empty, check for one_way_translation field.
one_way_translationstr | NoneWhen contains string 'all_languages', any language from languages can be used
two_way_translationstr | NoneWhen contains string 'all_languages',' any language pair from languages can be used

StructuredContext

Optional structured context provided to the transcription engine.

Properties

PropertyTypeDescription
generallist[StructuredContextGeneralItem] | NoneStructured key-value pairs describing domain, topic, intent, participant names, etc.
textstr | NoneLonger free-form background text, prior interaction history, reference documents, or meeting notes.
termslist[str] | NoneDomain-specific or uncommon words to recognize.
translation_termslist[StructuredContextTranslationTerm] | NoneCustom translations for ambiguous terms.

StructuredContextGeneralItem

Single general context key/value pair for transcription context.

Properties

PropertyTypeDescription
keystrThe key describing the context type (e.g., "domain", "topic", "doctor").
valuestrThe value for the context key.

StructuredContextTranslationTerm

Defines a translation term mapping used in structured context.

Properties

PropertyTypeDescription
sourcestrThe source term to translate.
targetstrThe target translation for the term.

Transcription

Represents a transcription job tracked by Soniox.

Properties

PropertyTypeDescription
idstrUnique identifier of the transcription (UUID).
statusTranscriptionStatusCurrent status of the transcription.
created_atdatetimeUTC timestamp when the transcription was created.
modelstrSpeech-to-text model used.
audio_urlstr | NoneURL of the audio file being transcribed.
file_idstr | NoneID of the uploaded file being transcribed (UUID).
filenamestrName of the file being transcribed.
language_hintslist[str] | NoneExpected languages in the audio. If not specified, languages are automatically detected.
enable_speaker_diarizationboolWhen true, speakers are identified and separated in the transcription output.
enable_language_identificationboolWhen true, language is detected for each part of the transcription.
audio_duration_msint | NoneDuration of the audio in milliseconds. Only available after processing begins.
error_typestr | NoneError type if transcription failed. None for successful or in-progress transcriptions.
error_messagestr | NoneError message if transcription failed. None for successful or in-progress transcriptions.
webhook_urlstr | NoneURL to receive webhook notifications when transcription is completed or fails.
webhook_auth_header_namestr | NoneName of the authentication header sent with webhook notifications.
webhook_auth_header_valuestr | NoneAuthentication header value. Always returned masked.
webhook_status_codeint | NoneHTTP status code received from your server when webhook was delivered. None if not yet sent.
client_reference_idstr | NoneOptional tracking identifier.

TranscriptionStatus

TranscriptionStatus = Literal["queued", "processing", "completed", "error"]

Current status of the transcription job.


TranscriptionTranscript

Transcript data including the full text and tokens.

Properties

PropertyTypeDescription
idstrUnique identifier of the transcription this transcript belongs to (UUID).
textstrComplete transcribed text content.
tokenslist[Token]List of detailed token information with timestamps and metadata.

TranslationConfig

Configuration describing how translation should be performed.

Properties

PropertyTypeDescription
typeTranslationTypeTranslation type.
target_languagestr | NoneTarget language code for translation (e.g., "fr", "es", "de") (one_way).
language_astr | NoneFirst language code (two_way).
language_bstr | NoneSecond language code (two_way).

TranslationTarget

Describes translation targets offered by a model.

Properties

PropertyTypeDescription
target_languagestrTarget language code for translation (e.g., "fr", "es", "de") (one_way).
source_languageslist[str]List of source language codes.
exclude_source_languageslist[str]Source language codes excluded for this target.

TranslationType

TranslationType = Literal["one_way", "two_way"]

Supported translation configuration types.


TemporaryApiKeyUsageType

TemporaryApiKeyUsageType = Literal["transcribe_websocket"]

Intended usage for temporary API keys.


UploadFilePayload

Optional metadata supplied at upload time.

Properties

PropertyTypeDescription
client_reference_idstr | NoneOptional tracking identifier string. Does not need to be unique

RealtimeEvent

Event payload received from the realtime STT websocket.

Properties

PropertyTypeDescription
tokenslist[Token]Tokens in this result.
final_audio_proc_msint | NoneMilliseconds of audio that have been finalized.
total_audio_proc_msint | NoneTotal milliseconds of audio processed.
finishedboolWhether this is the final result (session ending).
error_codeint | NoneError code if the realtime operation failed.
error_messagestr | NoneHuman-readable description of the error.

validate_event()

validate_event(raw: str | bytes) -> RealtimeEvent

Parameters

ParameterTypeDescription
rawstr | bytesRaw event payload from the realtime API.

Returns

RealtimeEvent


RealtimeSTTConfig

Configuration for initiating a realtime transcription session.

Properties

PropertyTypeDescription
api_keystr | NoneAPI key for real-time sessions.
modelstrSpeech-to-text model to use.
audio_formatstrAudio format. Use 'auto' for automatic detection of container formats.
num_channelsint | NoneNumber of audio channels (required for raw audio formats).
sample_rateint | NoneSample rate in Hz (required for PCM formats).
language_hintslist[str] | NoneExpected languages in the audio (ISO language codes).
language_hints_strictbool | NoneWhen true, recognition is strongly biased toward language hints (best results when using one language in language_hints).
contextStructuredContext | NoneAdditional context to improve transcription accuracy.
enable_speaker_diarizationbool | NoneEnable speaker identification.
enable_language_identificationbool | NoneEnable automatic language detection.
enable_endpoint_detectionbool | NoneEnable endpoint detection for utterance boundaries.
translationTranslationConfig | NoneTranslation configuration.
client_reference_idstr | NoneOptional tracking identifier (max 256 chars).

build_payload()

build_payload(api_key: str) -> RealtimeSTTConfig

Parameters

ParameterTypeDescription
api_keystrAPI key used for authentication.

Returns

RealtimeSTTConfig


Headers

Headers = Mapping[str, str]

WebhookAuthConfig

Configuration for webhook authentication headers.

Properties

PropertyTypeDescription
namestrExpected header name (case-insensitive comparison).
valuestrExpected header value (exact match).

WebhookEvent

Basic webhook event metadata.

Properties

PropertyTypeDescription
idstrTranscription ID (UUID).
statusLiteral['completed', 'error']Transcription result status.