Final vs non-final tokens

This page explains the differences between final and non-final tokens and how to include non-final tokens in transcription results.

This API is deprecated and is being phased out. Please switch to our new multilingual Speech-to-Text API.

The distinction between final and non-final tokens is relevant only when using stream transcription in low-latency mode, which is when the include_nonfinal TranscriptionConfig field is set to true. Otherwise, all tokens are final.

Non-final tokens are tokens that are instantaneously recognized as audio is being transcribed. Non-final tokens may or may not change in the future once more audio is transcribed and there is more context.

Final tokens are tokens that will not change in the future.

In each received result, non-final tokens always follow any final tokens.

Typically, when a token is first recognized, it is returned as non-final and it may be returned again as non-final a number of times. After a certain period, the token is returned as final. However, a token returned as non-final may later be returned as a different non-final token or may disappear. Users should not make any assumption about the relations of non-final tokens in subsequent received results.

Latest transcript

The latest transcript of the stream from the start is obtained by joining, in order:

final tokens from all results received so far,
non-final tokens from the last received result.

Here is an example of how this works:

1. Received result: final tokens: [], nonfinal tokens: [a, b]
All final tokens: []
Latest transcript: [a, b]
 
2. Received result: final tokens: [a], nonfinal tokens: [b, c]
All final tokens: [a]
Latest transcript: [a, b, c]
 
3. Received result: final tokens: [], nonfinal tokens: []
All final tokens: [a]
Latest transcript: [a]
 
4. Received result: final tokens: [], nonfinal tokens: [d, e]
All final tokens: [a]
Latest transcript: [a, d, e]
 
5. Received result: final tokens: [d, f]; nonfinal tokens: [g, h, i]
All final tokens: [a, d, f]
Latest transcript: [a, d, f, g, h, i]

Example

In the example below, we will include both final and non-final tokens when we transcribe a stream with low-latency mode enabled.

final_nonfinal_tokens.py

from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient
 
 
def iter_audio() -> Iterable[bytes]:
    # This function should yield audio bytes from your stream.
    # Here we simulate the stream by reading a file in small chunks.
    with open("../test_data/test_audio_long.flac", "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio
 
 
# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
    with SpeechClient() as client:
        # All final tokens will be collected in this list.
        all_final_tokens = []
 
        for result in transcribe_stream(
            iter_audio(),
            client,
            model="en_v2_lowlatency",
            include_nonfinal=True,
        ):
            # Split current result response into final and non-final tokens.
            final_tokens = []
            nonfinal_tokens = []
            for word in result.words:
                if word.is_final:
                    final_tokens.append(word.text)
                else:
                    nonfinal_tokens.append(word.text)
 
            # Append current final tokens to all final tokens.
            all_final_tokens += final_tokens
 
            # Print all final tokens and current non-final tokens.
            all_final_tokens_str = "".join(all_final_tokens)
            nonfinal_tokens_str = "".join(nonfinal_tokens)
            print(f"Final: {all_final_tokens_str}")
            print(f"Non-final: {nonfinal_tokens_str}")
            print("-----")
 
 
if __name__ == "__main__":
    main()

First, to transcribe a stream, we define a generator over the audio chunks from the stream. In this example, we simulate the stream by reading audio chunks from a file. We then pass this generator to transcribe_stream() which returns the transcription results as soon as they become available.

On each received result, we split the recognized tokens into final and non-final tokens. We then add the new final tokens to the list of all final tokens, and print out all final tokens and current non-final tokens.

Run

Terminal

python3 final_nonfinal_tokens.py

Output

-----
Final:
Non-final: But
-----
Final:
Non-final: But there
-----
Final:
Non-final: But there is
-----
Final:
Non-final: But there is always
-----
...
-----
Final:
Non-final: But there is always a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always
Non-final: a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always a stronger sense
Non-final: of life when the sun is brilliant after rain. And now he
-----

Latest transcript

Example

On this page