Web Library API Reference#
The Soniox Web Voice library is based around the RecordTranscribe
class, which
represents a session of integrated audio recording and transcription.
Basic Usage#
To start recording and transcribing, create an instance of RecordTranscribe
, set your
API key, configuration and event handlers and then call start()
. In the example below,
all supported event handlers are set, but any of these can be left unset if it is not
needed.
let recordTranscribe = new sonioxWebVoice.RecordTranscribe();
recordTranscribe.setApiKey("your-api-key");
recordTranscribe.setModel("en_v2_lowlatency");
recordTranscribe.setIncludeNonFinal(true);
recordTranscribe.setOnStarted(onStarted);
recordTranscribe.setOnPartialResult(onPartialResult);
recordTranscribe.setOnFinished(onFinished);
recordTranscribe.setOnError(onError);
recordTranscribe.start();
IncludeNonFinal
specifies if low-latency transcription results are desired. If it is
desired to receive recognized tokens as they are spoken, set this to true; if only the
complete transcript after the end of the recording is of interest, set this to false.
When start()
is called, the library first requests access to the user’s microphone and
the user may be prompted to allow access and/or select the microphone. Then the library
establishes a connection to the Soniox service and start transcription session. At this
point, audio recording start and the onStarted
event hander will be called if provided.
function onStarted() {
console.log("Recoding/transcription started!");
}
After transcription/recording starts, partial transcription results will be provided to
the setOnPartialResult
handler if one is set. Note that if IncludeNonFinal
is false,
handling onPartialResult
is likely not needed because the complete transcript will
be available at the end of the transcription as described below.
function onPartialResult(result) {
console.log("Partial result:");
for (let i = 0; i < result.words.length; ++i) {
let word = result.words[i];
if (word.is_final) {
console.log(" final token: '" + word.text + "'");
} else {
console.log(" non-final token: '" + word.text + "'");
}
}
}
As can be seen, tokens are returned as either final or non-final. Refer to
How-to Guides / Final vs Non-final Tokens
for an explanation. Non-final tokens are returned only if IncludeNonFinal
is true.
Partial results provided to onPartialResult
are frozen objects which will not change and
can safely be stored for later use.
To stop recording and finish the transcription, call stop()
. Recording will immediately
stop but it will take a short time to finished the transcription; during this time
setOnPartialResult
may still be called if provided. When the transcription is finished,
the onFinished
handler will be called if provided. After that point no more handlers will
be called.
// Call when you want to stop.
recordTranscribe.stop();
function onFinished() {
let result = recordTranscribe.getResult();
let text = result.words.map((word) => word.text).join("");
console.log("Finished: " + text);
}
As demonstrated above, the complete result of the transcription can be retrieved using
getResult()
. If called after the transcription is finished, all tokens in the result will
be final.
If getResult()
is called before the transcription is finished, the result will only
contain tokens recognized so far. In that case, if IncludeNonFinal
is true, the result may
contain non-final tokens at the end.
getResult()
always returns the same internal object which is automatically updated based
on new partial transcription results, and it is not permitted to modify this object. If you
need to store the result before the transcription is finished, use getResultCopy()
, which
returns a frozen copy of the current result.
Errors, Canceling and States#
An error can occur at any point after start()
is called and until the transcription is
finished as indicated by onFinished
. When an error occurs, recording and transcription
are immediately stopped and the onError
handler is called if it is set. The onError
handler is provided a status string and an error message.
function onError(status, message) {
console.log("Error: status=" + status + ", message=", message);
}
It is possible to cancel all operations at any time using cancel()
. If this is called
before the transcription is finished, recording and transcription are immediately stopped;
any audio already recorded but not yet transcribed will not be transcribed.
If audio recording stops due to an external reason, such as the user revoking access to
the microphone, the library behaves as if stop()
was called, rather than treating that
as an error.
// Call when you want to cancel without completing the transcription.
recordTranscribe.cancel();
After an error or cancel()
, no more handlers will be called.
For convenience, a getState()
function is provided to determine the current state of the
RecordTranscribe
object:
// If you want to check the state.
console.log("State: " + recordTranscribe.getState());
getState
returns one of the following states as a string:
- Init
: start()
has not been called yet.
- Starting
: start()
has been called, not yet recording or transcribing.
- Running
: Recording and transcribing.
- Finishing
: stop()
has been called, finishing the transcription.
- Finished
: The transcription is finished.
- Error
: An error has occurred.
- Cancel
: cancel()
has been called in a state other than Init
, Finished
or Error
.
Multiple Transcriptions#
A RecordTranscribe
object can be used only for a single transcription. To perform
another transcription, use a new RecordTranscribe
object.
It is not permitted to perform multiple transcriptions at the same time, more specifically
to have more than one RecordTranscribe
object in Starting
, Running
or Finishing
state. If you attempt to start()
a transcription violating this, an exception will be
thrown.
Transcription Features#
The following subsections show how to enable various features of Soniox speech
recognition with Web Voice. Note that the respective options need to be set
before calling recordTranscribe.start()
.
Specify Model#
Refer to How-to Guides / Models and Languages for available models.
recordTranscribe.setModel("en_v2");
Customization#
Refer to How-to Guides / Customization.
let speechContext = {
entries: [
{
phrases: ["acetylcarnitine"],
boost: 20,
},
{
phrases: ["zestoretic"],
boost: 20,
},
],
};
recordTranscribe.setSpeechContext(speechContext)
To use a stored speech context:
let speechContext = {
name: "my_context",
};
recordTranscribe.setSpeechContext(speechContext)
Speaker Diarization#
Refer to How-to Guides / Separate Speakers.
Global speaker diarization (requires a default model and includeNonFinal
equal to false):
recordTranscribe.setModel("en_v2");
recordTranscribe.setIncludeNonFinal(false);
recordTranscribe.setEnableGlobalSpeakerDiarization(true);
Streaming speaker diarization (requires a low-latency model and includeNonFinal
equal to true):
recordTranscribe.setModel("en_v2_lowlatency");
recordTranscribe.setIncludeNonFinal(true);
recordTranscribe.setEnableStreamingSpeakerDiarization(true);
Speaker numbers will be provided in word.speaker
.
For example, to print words with speaker numbers in onFinished
:
function onFinished() {
console.log("Finished:");
let result = recordTranscribe.getResult();
for (let i = 0; i < result.words.length; ++i) {
let word = result.words[i];
console.log("[" + word.speaker + "] '" + word.text + "'");
}
}
If you want to set the minimum/maximum number of speakers:
recordTranscribe.setMinNumSpeakers(1);
recordTranscribe.setMaxNumSpeakers(6);
Speaker Identification#
Refer to How-to Guides / Identify Speakers.
First enable either form of speaker diarization (see above), then enable speaker identification and set candidate speakers:
recordTranscribe.setEnableSpeakerIdentification(true);
recordTranscribe.setCandSpeakerNames(["John", "Judy"]);
Information about identified speakers is provided in result.speakers
.
This is a Map
that maps speaker numbers to { speaker, name }
, where
speaker
is the speaker number and name
is the speaker name.
For example, to print tokens with speaker names in onFinished
:
function onFinished() {
let result = recordTranscribe.getResult();
for (let i = 0; i < result.words.length; ++i) {
let word = result.words[i];
let speakerName = "unknown";
if (result.speakers.has(word.speaker)) {
speakerName = result.speakers.get(word.speaker).name;
}
console.log("[" + speakerName + "] '" + word.text + "'");
}
}