Transcribe Streams
In this example, we will transcribe a stream in bidirectional streaming mode. We will simulate the stream by reading a file in small chunks. This will serve as a demonstration how to transcribe any stream of data including real-time streams.
from typing import Iterable
from soniox.transcribe_live import transcribe_stream
from soniox.speech_service import SpeechClient, set_api_key
set_api_key("<YOUR-API-KEY>")
def iter_audio() -> Iterable[bytes]:
# This function should yield audio bytes from your stream.
# Here we simulate the stream by reading a file in small chunks.
with open("../test_data/test_audio_long.flac", "rb") as fh:
while True:
audio = fh.read(1024)
if len(audio) == 0:
break
yield audio
def main():
with SpeechClient() as client:
for result in transcribe_stream(iter_audio(), client):
print(" ".join(w.text for w in result.words))
if __name__ == "__main__":
main()
To transcribe any stream, you only need to define a generator over the audio chunks from the stream. In our example, we simulate this by reading audio chunks from a file.
We then pass this generator to transcribe_stream()
which returns the transcription results as soon as they become available.
Run
python3 transcribe_any_stream.py
Output
But
But there
But there is
But there is always
But there is always a
But there is always a strong
But there is always a stronger
But there is always a stronger sense
But there is always a stronger sense of
But there is always a stronger sense of life
But there is always a stronger sense of life
const fs = require("fs");
const { SpeechClient } = require("@soniox/soniox-node");
// Do not forget to set your Soniox API key.
const speechClient = new SpeechClient();
(async function () {
const onDataHandler = async (result) => {
console.log(`Words: ${result.words.map((word) => word.text).join(" ")}`);
};
const onEndHandler = (error) => {
console.log("END!", error);
};
// transcribeStream() returns object with ".writeAsync()" and ".end()" methods.
// Use them to send data and end the stream when done.
const stream = speechClient.transcribeStream(
{ include_nonfinal: true },
onDataHandler,
onEndHandler
);
// Here we simulate the stream by reading a file in small chunks.
const CHUNK_SIZE = 1024;
const readable = fs.createReadStream("../test_data/test_audio_long.flac", {
highWaterMark: CHUNK_SIZE,
});
for await (const chunk of readable) {
await stream.writeAsync(chunk);
}
stream.end();
})();
To transcribe any stream, you only need to define a generator over the audio chunks from the stream.
In our example, we simulate this by reading audio chunks from a file.
When audio chunks become available, they are passed to stream.writeAsync()
function for transcription, and
as soon as the transcription results become available, function onDataHandler()
is being called.
Run
node transcribe_any_stream.js
Output
Words:
Words: But
Words: But there
Words: But there is
Words: But there is always
Words: But there is always a
Words: But there is always a strong
Words: But there is always a stronger
Words: But there is always a stronger sense
Words: But there is always a stronger sense of
Words: But there is always a stronger sense of life
Words: But there is always a stronger sense of life
using System.Linq;
using System.Runtime.CompilerServices;
using Soniox.Types;
using Soniox.Client;
using Soniox.Client.Proto;
using var client = new SpeechClient();
// TranscribeStream requires the user to provide the audio to transcribe
// as an IAsyncEnumerable<bytes[]> instance. This can be implemented as
// an async function that uses "yield return". This example function
// reads a file in chunks.
async IAsyncEnumerable<byte[]> EnumerateAudioChunks(
[EnumeratorCancellation] CancellationToken cancellationToken = default(CancellationToken)
)
{
string filePath = "../../test_data/test_audio_long.flac";
int bufferSize = 1024;
await using var fileStream = new FileStream(
filePath, FileMode.Open, FileAccess.Read, FileShare.Read,
bufferSize: bufferSize, useAsync: true
);
while (true)
{
byte[] buffer = new byte[bufferSize];
int numRead = await fileStream.ReadAsync(buffer, cancellationToken);
if (numRead == 0)
{
break;
}
Array.Resize(ref buffer, numRead);
yield return buffer;
}
}
IAsyncEnumerable<Result> resultsEnumerable = client.TranscribeStream(
EnumerateAudioChunks(),
new TranscriptionConfig
{
IncludeNonfinal = true,
});
await foreach (var result in resultsEnumerable)
{
// Note result.Words contains final and non-final words,
// we do not print this this information in this example.
var wordsStr = string.Join(" ", result.Words.Select(word => word.Text).ToArray());
Console.WriteLine($"Words: {wordsStr}");
}
To transcribe any stream, you only need to define a generator over the audio chunks from the stream.
In our example, we simulate this by reading audio chunks from a file.
We then pass this generator to TranscribeStream()
which returns the transcription results as soon as they become available.
Run
cd soniox_examples/csharp/TranscribeAnyStream
dotnet run
Output
Words:
Words: But
Words: But there
Words: But there is
Words: But there is always
Words: But there is always a
Words: But there is always a strong
Words: But there is always a stronger
Words: But there is always a stronger sense
Words: But there is always a stronger sense of
Words: But there is always a stronger sense of life
Words: But there is always a stronger sense of life
Minimizing Latency
When transcribing a real-time stream, the lowest latency is achieved with raw audio encoded
using PCM 16-bit little endian (pcm_s16le
) at 16 kHz sample rate. The example below shows how to transcribe such audio.
transcribe_any_stream_audio_format.py
for result in transcribe_stream(
iter_audio(),
client,
audio_format="pcm_s16le",
sample_rate_hertz=16000,
num_audio_channels=1):
transcribe_any_stream_audio_format.js
const stream = speechClient.transcribeStream(
{
audio_format: "pcm_s16le",
sample_rate_hertz: 16000,
num_audio_channels: 1,
include_nonfinal: true
},
onDataHandler,
onEndHandler
);
TranscribeAnyStreamAudioFormat.cs
IAsyncEnumerable<Result> resultsEnumerable = client.TranscribeStream(
EnumerateAudioChunks(),
new TranscriptionConfig
{
IncludeNonfinal = true,
AudioFormat = "pcm_s16le",
SampleRateHertz = 16000,
NumAudioChannels = 1,
});
It is possible to use other PCM formats or configurations as listed here at the cost of a small increase of latency.