4. Transcribe Any Stream
In this example, we will transcribe a file in bidirectional streaming mode with non-final words. This will serve as a demonstration how to transcribe any stream of data including real-time streams.
from soniox.transcribe_live import transcribe_stream from soniox.speech_service import Client, set_api_key from soniox.test_data import TEST_AUDIO_LONG_FLAC set_api_key("<YOUR-API-KEY>") def iter_audio(): with open(TEST_AUDIO_LONG_FLAC, "rb") as fh: while True: audio = fh.read(1024) if len(audio) == 0: break yield audio def main(): with Client() as client: for result in transcribe_stream(iter_audio(), client): # Variable result contains final and non-final words. print(" ".join(w.text for w in result.words)) if __name__ == "__main__": main()
To transcribe any stream, we only need to define a generator over the audio chunks from the stream. In our example, we simulate this by reading audio chunks from a file. We then use transcribe_stream() with this generator which returns the transcription results as they become available.
The difference between transcribe_stream() and transcribe_file_stream() is that the former returns non-final words, but the latter only final words.
When transcribing a real-time stream, the lowest latency is achieved with raw audio encoded
using PCM 16-bit little endian (
pcm_s16le) at 16 kHz sample rate and using one audio
channel. The example below shows how to transcribe such audio. It is possible to use other
PCM formats or configurations as listed
the cost of a small increase of latency.
for result in transcribe_stream( iter_audio(), client, audio_format="pcm_s16le", sample_rate_hertz=16000, num_audio_channels=1):