Core concepts
Timestamps
Learn how to use timestamps and understand their granularity.
Overview
Soniox Speech-to-Text AI provides precise timestamps for each recognized token (word or sub-word) in your transcription. These timestamps allow you to align text with the original audio for use cases like subtitles, audio indexing, keyword search, and real-time captioning.
Timestamps are returned by default — no configuration is required — and are supported in both asynchronous and real-time processing.
Output format
Each token in the response includes:
text
: The recognized word or tokenstart_ms
: The start time of the token in millisecondsend_ms
: The end time of the token in milliseconds
Example response
In this example, the word "beautiful" is split into three tokens with corresponding timestamp ranges.
Use cases
Use case | Description |
---|---|
Subtitles & captions | Sync spoken words with video playback. |
Audio editing | Locate and extract segments of interest. |
Keyword spotting | Jump to where specific words are spoken. |
Visualization | Build real-time transcript viewers with time markers. |
Live captioning | Stream partial results with timing for broadcast or accessibility tools. |