Final vs non-final tokens
This page explains the differences between final and non-final tokens and how to include non-final tokens in transcription results.
The distinction between final and non-final tokens is relevant only when using stream transcription
in low-latency mode, which is when the include_nonfinal
TranscriptionConfig
field is
set to true
. Otherwise, all tokens are final.
Non-final tokens are tokens that are instantaneously recognized as audio is being transcribed. Non-final tokens may or may not change in the future once more audio is transcribed and there is more context.
Final tokens are tokens that will not change in the future.
In each received result, non-final tokens always follow any final tokens.
Typically, when a token is first recognized, it is returned as non-final and it may be returned again as non-final a number of times. After a certain period, the token is returned as final. However, a token returned as non-final may later be returned as a different non-final token or may disappear. Users should not make any assumption about the relations of non-final tokens in subsequent received results.
Latest transcript
The latest transcript of the stream from the start is obtained by joining, in order:
- final tokens from all results received so far,
- non-final tokens from the last received result.
Here is an example of how this works:
Example
In the example below, we will include both final and non-final tokens when we transcribe a stream with low-latency mode enabled.
First, to transcribe a stream, we define a generator over the audio chunks from the stream. In this example,
we simulate the stream by reading audio chunks from a file. We then pass this generator to transcribe_stream()
which returns the transcription results as soon as they become available.
On each received result, we split the recognized tokens into final and non-final tokens. We then add the new final tokens to the list of all final tokens, and print out all final tokens and current non-final tokens.
Run
Output