Web Library API Reference#

The Soniox Web Voice library is based around the RecordTranscribe class, which represents a session of integrated audio recording and transcription.

Basic Usage#

To start recording and transcribing, create an instance of RecordTranscribe, set your API key, configuration and event handlers and then call start(). In the example below, all supported event handlers are set, but any of these can be left unset if it is not needed.

let recordTranscribe = new sonioxWebVoice.RecordTranscribe();
recordTranscribe.setApiKey("your-api-key");
recordTranscribe.setModel("en_v2_lowlatency");
recordTranscribe.setIncludeNonFinal(true);
recordTranscribe.setOnStarted(onStarted);
recordTranscribe.setOnPartialResult(onPartialResult);
recordTranscribe.setOnFinished(onFinished);
recordTranscribe.setOnError(onError);
recordTranscribe.start();

IncludeNonFinal specifies if low-latency transcription results are desired. If it is desired to receive recognized tokens as they are spoken, set this to true; if only the complete transcript after the end of the recording is of interest, set this to false.

When start() is called, the library first requests access to the user’s microphone and the user may be prompted to allow access and/or select the microphone. Then the library establishes a connection to the Soniox service and start transcription session. At this point, audio recording start and the onStarted event hander will be called if provided.

function onStarted() {
  console.log("Recoding/transcription started!");
}

After transcription/recording starts, partial transcription results will be provided to the setOnPartialResult handler if one is set. Note that if IncludeNonFinal is false, handling onPartialResult is likely not needed because the complete transcript will be available at the end of the transcription as described below.

function onPartialResult(result) {
  console.log("Partial result:");
  for (let i = 0; i < result.words.length; ++i) {
    let word = result.words[i];
    if (word.is_final) {
      console.log("  final token: '" + word.text + "'");
    } else {
      console.log("  non-final token: '" + word.text + "'");
    }
  }
}

As can be seen, tokens are returned as either final or non-final. Refer to How-to Guides / Final vs Non-final Tokens for an explanation. Non-final tokens are returned only if IncludeNonFinal is true.

Partial results provided to onPartialResult are frozen objects which will not change and can safely be stored for later use.

To stop recording and finish the transcription, call stop(). Recording will immediately stop but it will take a short time to finished the transcription; during this time setOnPartialResult may still be called if provided. When the transcription is finished, the onFinished handler will be called if provided. After that point no more handlers will be called.

// Call when you want to stop.
recordTranscribe.stop();

function onFinished() {
  let result = recordTranscribe.getResult();
  let text = result.words.map((word) => word.text).join("");
  console.log("Finished: " + text);
}

As demonstrated above, the complete result of the transcription can be retrieved using getResult(). If called after the transcription is finished, all tokens in the result will be final.

If getResult() is called before the transcription is finished, the result will only contain tokens recognized so far. In that case, if IncludeNonFinal is true, the result may contain non-final tokens at the end.

getResult() always returns the same internal object which is automatically updated based on new partial transcription results, and it is not permitted to modify this object. If you need to store the result before the transcription is finished, use getResultCopy(), which returns a frozen copy of the current result.

Errors, Canceling and States#

An error can occur at any point after start() is called and until the transcription is finished as indicated by onFinished. When an error occurs, recording and transcription are immediately stopped and the onError handler is called if it is set. The onError handler is provided a status string and an error message.

function onError(status, message) {
  console.log("Error: status=" + status + ", message=", message);
}

It is possible to cancel all operations at any time using cancel(). If this is called before the transcription is finished, recording and transcription are immediately stopped; any audio already recorded but not yet transcribed will not be transcribed.

If audio recording stops due to an external reason, such as the user revoking access to the microphone, the library behaves as if stop() was called, rather than treating that as an error.

// Call when you want to cancel without completing the transcription.
recordTranscribe.cancel();

After an error or cancel(), no more handlers will be called.

For convenience, a getState() function is provided to determine the current state of the RecordTranscribe object:

// If you want to check the state.
console.log("State: " + recordTranscribe.getState());

getState returns one of the following states as a string: - Init: start() has not been called yet. - Starting: start() has been called, not yet recording or transcribing. - Running: Recording and transcribing. - Finishing: stop() has been called, finishing the transcription. - Finished: The transcription is finished. - Error: An error has occurred. - Cancel: cancel() has been called in a state other than Init, Finished or Error.

Multiple Transcriptions#

A RecordTranscribe object can be used only for a single transcription. To perform another transcription, use a new RecordTranscribe object.

It is not permitted to perform multiple transcriptions at the same time, more specifically to have more than one RecordTranscribe object in Starting, Running or Finishing state. If you attempt to start() a transcription violating this, an exception will be thrown.

Transcription Features#

The following subsections show how to enable various features of Soniox speech recognition with Web Voice. Note that the respective options need to be set before calling recordTranscribe.start().

Specify Model#

Refer to How-to Guides / Models and Languages for available models.

recordTranscribe.setModel("en_v2");

Customization#

Refer to How-to Guides / Customization.

let speechContext = {
  entries: [
    {
        phrases: ["acetylcarnitine"],
        boost: 20,
    },
    {
        phrases: ["zestoretic"],
        boost: 20,
    },
  ],
};
recordTranscribe.setSpeechContext(speechContext)

To use a stored speech context:

let speechContext = {
  name: "my_context",
};
recordTranscribe.setSpeechContext(speechContext)

Speaker Diarization#

Refer to How-to Guides / Separate Speakers.

Global speaker diarization (requires a default model and includeNonFinal equal to false):

recordTranscribe.setModel("en_v2");
recordTranscribe.setIncludeNonFinal(false);
recordTranscribe.setEnableGlobalSpeakerDiarization(true);

Streaming speaker diarization (requires a low-latency model and includeNonFinal equal to true):

recordTranscribe.setModel("en_v2_lowlatency");
recordTranscribe.setIncludeNonFinal(true);
recordTranscribe.setEnableStreamingSpeakerDiarization(true);

Speaker numbers will be provided in word.speaker.

For example, to print words with speaker numbers in onFinished:

function onFinished() {
  console.log("Finished:");
  let result = recordTranscribe.getResult();
  for (let i = 0; i < result.words.length; ++i) {
    let word = result.words[i];
    console.log("[" + word.speaker + "] '" + word.text + "'");
  }
}

If you want to set the minimum/maximum number of speakers:

recordTranscribe.setMinNumSpeakers(1);
recordTranscribe.setMaxNumSpeakers(6);

Speaker Identification#

Refer to How-to Guides / Identify Speakers.

First enable either form of speaker diarization (see above), then enable speaker identification and set candidate speakers:

recordTranscribe.setEnableSpeakerIdentification(true);
recordTranscribe.setCandSpeakerNames(["John", "Judy"]);

Information about identified speakers is provided in result.speakers. This is a Map that maps speaker numbers to { speaker, name }, where speaker is the speaker number and name is the speaker name.

For example, to print tokens with speaker names in onFinished:

function onFinished() {
  let result = recordTranscribe.getResult();
  for (let i = 0; i < result.words.length; ++i) {
    let word = result.words[i];
    let speakerName = "unknown";
    if (result.speakers.has(word.speaker)) {
      speakerName = result.speakers.get(word.speaker).name;
    }
    console.log("[" + speakerName + "] '" + word.text + "'");
  }
}