Transcribe Streams

In this example, we will transcribe a stream in bidirectional streaming mode. We will simulate the stream by reading a file in small chunks. This will serve as a demonstration how to transcribe any stream of data including real-time streams.

transcribe_any_stream.py

from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient, set_api_key

set_api_key("<YOUR-API-KEY>")


def iter_audio() -> Iterable[bytes]:
    # This function should yield audio bytes from your stream.

    # Here we simulate the stream by reading a file in small chunks.
    with open("../test_data/test_audio_long.flac", "rb") as fh:
        while True:
            audio = fh.read(1024)
            if len(audio) == 0:
                break
            yield audio


def main():
    with SpeechClient() as client:
        for result in transcribe_stream(iter_audio(), client):
            print(" ".join(w.text for w in result.words))


if __name__ == "__main__":
    main()

To transcribe any stream, you only need to define a generator over the audio chunks from the stream. In our example, we simulate this by reading audio chunks from a file. We then pass this generator to transcribe_stream() which returns the transcription results as soon as they become available.

Run

python3 transcribe_any_stream.py

Output

But
But there
But there is
But there is always
But there is always a
But there is always a strong
But there is always a stronger
But there is always a stronger sense
But there is always a stronger sense of
But there is always a stronger sense of life
But there is always a stronger sense of life

transcribe_any_stream.js

const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");

// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();

(async function () {
    const onDataHandler = async (result) => {
        console.log(`Words: ${result.words.map((word) => word.text).join(" ")}`);
    };

    const onEndHandler = (error) => {
        console.log("END!", error);
    };

    // transcribeStream() returns object with ".writeAsync()" and ".end()" methods.
    // Use them to send data and end the stream when done.
    const stream = speechClient.transcribeStream(
        { include_nonfinal: true },
        onDataHandler,
        onEndHandler
    );

    // Here we simulate the stream by reading a file in small chunks.
    const CHUNK_SIZE = 1024;
    const readable = fs.createReadStream("../test_data/test_audio_long.flac", {
        highWaterMark: CHUNK_SIZE,
    });

    for await (const chunk of readable) {
        await stream.writeAsync(chunk);
    }

    stream.end();
})();    

To transcribe any stream, you only need to define a generator over the audio chunks from the stream. In our example, we simulate this by reading audio chunks from a file. When audio chunks become available, they are passed to stream.writeAsync() function for transcription, and as soon as the transcription results become available, function onDataHandler() is being called.

Run

node transcribe_any_stream.js

Output

Words: 
Words: But
Words: But there
Words: But there is
Words: But there is always
Words: But there is always a
Words: But there is always a strong
Words: But there is always a stronger
Words: But there is always a stronger sense
Words: But there is always a stronger sense of
Words: But there is always a stronger sense of life
Words: But there is always a stronger sense of life

Minimizing Latency

When transcribing a real-time stream, the lowest latency is achieved with raw audio encoded using PCM 16-bit little endian (pcm_s16le) at 16 kHz sample rate. The example below shows how to transcribe such audio.

transcribe_any_stream_audio_format.py

for result in transcribe_stream(
        iter_audio(), 
        client, 
        audio_format="pcm_s16le",
        sample_rate_hertz=16000,
        num_audio_channels=1):

transcribe_any_stream_audio_format.js

const stream = speechClient.transcribeStream(
    { 
        audio_format: "pcm_s16le",
        sample_rate_hertz: 16000,
        num_audio_channels: 1,
        include_nonfinal: true
    },
    onDataHandler,
    onEndHandler
);

It is possible to use other PCM formats or configurations as listed here at the cost of a small increase of latency.

cookie Change your cookie preferences