Final vs Non-Final Tokens#

The distinction between final and non-final tokens is relevant only when using stream transcription in low-latency mode, which is when the include_nonfinal TranscriptionConfig field is set to true. Otherwise, all tokens are final.

Non-final tokens are tokens that are instantaneously recognized as audio is being transcribed. Non-final tokens may or may not change in the future once more audio is transcribed and there is more context.

Final tokens are tokens that will not change in the future.

In each received result, non-final tokens always follow any final tokens.

Typically, when a token is first recognized, it is returned as non-final and it may be returned again as non-final a number of times. After a certain period, the token is returned as final. However, a token returned as non-final may later be returned as a different non-final token or may disappear. Users should not make any assumption about the relations of non-final tokens in subsequent received results.

Latest Transcript#

The latest transcript of the stream from the start is obtained by joining, in order:

final tokens from all results received so far,
non-final tokens from the last received result.

Here is an example of how this works:

Received result: final tokens: [], nonfinal tokens: [a, b]

All final tokens: []

Latest transcript: [a, b]
Received result: final tokens: [a], nonfinal tokens: [b, c]

All final tokens: [a]

Latest transcript: [a, b, c]
Received result: final tokens: [], nonfinal tokens: []

All final tokens: [a]

Latest transcript: [a]
Received result: final tokens: [], nonfinal tokens: [d, e]

All final tokens: [a]

Latest transcript: [a, d, e]
Received result: final tokens: [d, f]; nonfinal tokens: [g, h, i]

All final tokens: [a, d, f]

Latest transcript: [a, d, f, g, h, i]

Example#

In the example below, we will include both final and non-final tokens when we transcribe a stream with low-latency mode enabled.

final_nonfinal_tokens.py

from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient


def iter_audio() -> Iterable[bytes]:
    # This function should yield audio bytes from your stream.
    # Here we simulate the stream by reading a file in small chunks.
    with open("../test_data/test_audio_long.flac", "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio


# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
    with SpeechClient() as client:
        # All final tokens will be collected in this list.
        all_final_tokens = []

        for result in transcribe_stream(
            iter_audio(),
            client,
            model="en_v2_lowlatency",
            include_nonfinal=True,
        ):
            # Split current result response into final and non-final tokens.
            final_tokens = []
            nonfinal_tokens = []
            for word in result.words:
                if word.is_final:
                    final_tokens.append(word.text)
                else:
                    nonfinal_tokens.append(word.text)

            # Append current final tokens to all final tokens.
            all_final_tokens += final_tokens

            # Print all final tokens and current non-final tokens.
            all_final_tokens_str = "".join(all_final_tokens)
            nonfinal_tokens_str = "".join(nonfinal_tokens)
            print(f"Final: {all_final_tokens_str}")
            print(f"Non-final: {nonfinal_tokens_str}")
            print("-----")


if __name__ == "__main__":
    main()

First, to transcribe a stream, we define a generator over the audio chunks from the stream. In this example, we simulate the stream by reading audio chunks from a file. We then pass this generator to transcribe_stream() which returns the transcription results as soon as they become available.

On each received result, we split the recognized tokens into final and non-final tokens. We then add the new final tokens to the list of all final tokens, and print out all final tokens and current non-final tokens.

Run

python3 final_nonfinal_tokens.py

Output

-----
Final:
Non-final: But
-----
Final:
Non-final: But there
-----
Final:
Non-final: But there is
-----
Final:
Non-final: But there is always
-----
...
-----
Final:
Non-final: But there is always a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always
Non-final: a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always a stronger sense
Non-final: of life when the sun is brilliant after rain. And now he
-----

final_nonfinal_tokens.js

const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");

// Do not forget to set your API key in the SONIOX_API_KEY environment variable.
const speechClient = new SpeechClient();

(async function () {
    // All final tokens will be collected in this list.
    let all_final_tokens = [];

    const onDataHandler = async (result) => {
        if (!result) {
            return;
        }

        const final_tokens = [];
        const nonfinal_tokens = [];

        for (word of result.words) {
            if (word.is_final) {
                final_tokens.push(word.text);
            } else {
                nonfinal_tokens.push(word.text);
            }
        }

        // Append current final tokens to all final tokens.
        all_final_tokens = all_final_tokens.concat(final_tokens);

        console.log(`Final: ${all_final_tokens.join("")}`);
        console.log(`Non-final: ${nonfinal_tokens.join("")}`);
        console.log("-----");
    };

    const onEndHandler = (error) => {
        if (error) {
            console.log(`Transcription error: ${error}`);
        }
    };

    // transcribeStream returns an object with ".writeAsync()" and ".end()"
    // methods - use them to send data and end stream when done.
    const stream = speechClient.transcribeStream(
        {
            model: "en_v2_lowlatency",
            include_nonfinal: true
        },
        onDataHandler,
        onEndHandler
    );

    // Open file as stream.
    const CHUNK_SIZE = 1024;
    const streamSource = fs.createReadStream(
        "../test_data/test_audio_long.flac",
        {
            highWaterMark: CHUNK_SIZE,
        }
    );

    // Simulate data streaming
    for await (const audioChunk of streamSource) {
        await stream.writeAsync(audioChunk);
    }
    stream.end();
})();

To transcribe a stream, we define a generator over the audio chunks from the stream. In this example, we simulate the stream by reading audio chunks from a file.

When audio chunks become available, they are passed to stream.writeAsync() function for transcription, and as soon as the transcription results become available, function onDataHandler() is being called.

Run

node final_nonfinal_tokens.js

Output

-----
Final:
Non-final: But
-----
Final:
Non-final: But there
-----
Final:
Non-final: But there is
-----
Final:
Non-final: But there is always
-----
...
-----
Final:
Non-final: But there is always a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always
Non-final: a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always a stronger sense
Non-final: of life when the sun is brilliant after rain. And now he
-----