Final vs Non-Final Words

Non-final words are words that are instantaneously recognized as audio is being transcribed. Non-final words may or may not change in the future once more audio is transcribed and there is more context.

Final words are words that will not change in the future.

In each received result, non-final words always follow any final words.

Typically, when a word is first recognized, it is returned as non-final and it may be returned as non-final a number of times. After a certain period, the word is returned as final. However, a word returned as non-final may later be returned as a different non-final word or may disappear. Users should not make any assumption about the relations of non-final words in subsequent received results.

Example

In the example below, we will include both final and non-final words when we transcribe a stream in real-time and low-latency.

final_nonfinal_words.py

from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient, set_api_key

set_api_key("<YOUR-API-KEY>")


def iter_audio() -> Iterable[bytes]:
    # This function should yield audio bytes from your stream.

    # Here we simulate the stream by reading a file in small chunks.
    with open("../test_data/test_audio_long.flac", "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio


def main():
    with SpeechClient() as client:
        all_final_words = []

        for result in transcribe_stream(iter_audio(), client):
            # Split current result response into final words and non-final words.
            final_words = []
            non_final_words = []
            for word in result.words:
                if word.is_final:
                    final_words.append(word.text)
                else:
                    non_final_words.append(word.text)

            # Append current final words to a list of all final words.
            all_final_words += final_words

            # Print all final words and current non-final words.
            all_final_words_str = " ".join(all_final_words)
            non_final_words_str = " ".join(non_final_words)
            print(f"Final: {all_final_words_str}")
            print(f"Non-final: {non_final_words_str}")
            print("-----")


if __name__ == "__main__":
    main()

First, to transcribe a stream, we define a generator over the audio chunks from the stream. In this example, we simulate the stream by reading audio chunks from a file. We then pass this generator to transcribe_stream() which returns the transcription results as soon as they become available.

On each received result, we split the recognized words into final and non-final words. We then add the new final words to the list of all final words, and print out all final words and the current non-final words.

Run

python3 final_nonfinal_words.py

Output

-----
Final: 
Non-final: But
-----
Final: 
Non-final: But there
-----
Final: 
Non-final: But there is
-----
Final: 
Non-final: But there is always
-----
...
-----
Final: 
Non-final: But there is always a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always
Non-final: a stronger sense of life when the sun is brilliant after rain   
-----
Final: But there is always a stronger sense
Non-final: of life when the sun is brilliant after rain . And now he 
-----

final_nonfinal_words.js

const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");

// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();

(async function () {
    let all_final_words = [];
    const onDataHandler = async (result) => {
        if (!result) {
            return;
        }

        const final_words = [];
        const non_final_words = [];

        for (word of result.words) {
            if (word.is_final) {
                final_words.push(word.text);
            } else {
                non_final_words.push(word.text);
            }
        }

        all_final_words = all_final_words.concat(final_words);

        console.log(`Final: ${all_final_words.join(" ")}`);
        console.log(`Non-final: ${non_final_words.join(" ")}`);
        console.log("-----");
    };

    const onEndHandler = (error) => {
        console.log("END!", error);
    };

    // transcribeStream() returns object with ".writeAsync()" and ".end()" methods.
    // Use them to send data and end the stream when done.
    const stream = speechClient.transcribeStream(
        { include_nonfinal: true },
        onDataHandler,
        onEndHandler
    );

    // Here we simulate the stream by reading a file in small chunks.
    const CHUNK_SIZE = 1024;
    const streamSource = fs.createReadStream(
        "../test_data/test_audio_long.flac",
        {
            highWaterMark: CHUNK_SIZE,
        }
    );

    for await (const audioChunk of streamSource) {
        await stream.writeAsync(audioChunk);
    }

    stream.end();
})();

To transcribe a stream, we define a generator over the audio chunks from the stream. In this example, we simulate the stream by reading audio chunks from a file.

When audio chunks become available, they are passed to stream.writeAsync() function for transcription, and as soon as the transcription results become available, function onDataHandler() is being called.

On each received result, we split the recognized words into final and non-final words. We then add the new final words to the list of all final words, and print out all final words and the current non-final words.

Run

node final_nonfinal_words.js

Output

-----
Final: 
Non-final: But
-----
Final: 
Non-final: But there
-----
Final: 
Non-final: But there is
-----
Final: 
Non-final: But there is always
-----
...
-----
Final: 
Non-final: But there is always a stronger sense of life when the sun is brilliant after rain
-----
Final: But there is always
Non-final: a stronger sense of life when the sun is brilliant after rain   
-----
Final: But there is always a stronger sense
Non-final: of life when the sun is brilliant after rain . And now he 
-----

The full transcript of a stream can be obtained by joining

  1. All final words from all previous results.
  2. And, all words from the last received result (final and non-final words).
cookie Change your cookie preferences