Shared concepts
Timestamps
Learn how to use timestamps and understand their granularity.
Overview
Soniox Speech-to-Text AI provides precise timestamps for every recognized token (word or sub-word). Timestamps let you align transcriptions with audio, so you know exactly when each word was spoken.
Timestamps are always included by default — no extra configuration needed.
Output format
Each token in the response includes:
text
→ The recognized token.start_ms
→ Token start time (in milliseconds).end_ms
→ Token end time (in milliseconds).
Example response
In this example, the word “Beautiful” is split into three tokens, each with its own timestamp range: