Web library API reference
Soniox Web library is based around the RecordTranscribe class, which represents a session of integrated audio recording and transcription.
The Soniox Web Voice library is based around the RecordTranscribe
class, which
represents a session of integrated audio recording and transcription.
Basic usage
To start recording and transcribing, create an instance of RecordTranscribe
, set your
API key, configuration and event handlers and then call start()
. In the example below,
all supported event handlers are set, but any of these can be left unset if it is not
needed.
IncludeNonFinal
specifies if low-latency transcription results are desired. If it is
desired to receive recognized tokens as they are spoken, set this to true; if only the
complete transcript after the end of the recording is of interest, set this to false.
When start()
is called, the library first requests access to the user's microphone and
the user may be prompted to allow access and/or select the microphone. Then the library
establishes a connection to the Soniox service and start transcription session. At this
point, audio recording start and the onStarted
event hander will be called if provided.
After transcription/recording starts, partial transcription results will be provided to
the setOnPartialResult
handler if one is set. Note that if IncludeNonFinal
is false,
handling onPartialResult
is likely not needed because the complete transcript will
be available at the end of the transcription as described below.
As can be seen, tokens are returned as either final or non-final. Refer to
How-to guides / Final vs non-final tokens
for an explanation. Non-final tokens are returned only if IncludeNonFinal
is true.
Partial results provided to onPartialResult
are frozen objects which will not change and
can safely be stored for later use.
To stop recording and finish the transcription, call stop()
. Recording will immediately
stop but it will take a short time to finished the transcription; during this time
setOnPartialResult
may still be called if provided. When the transcription is finished,
the onFinished
handler will be called if provided. After that point no more handlers will
be called.
As demonstrated above, the complete result of the transcription can be retrieved using
getResult()
. If called after the transcription is finished, all tokens in the result will
be final.
If getResult()
is called before the transcription is finished, the result will only
contain tokens recognized so far. In that case, if IncludeNonFinal
is true, the result may
contain non-final tokens at the end.
getResult()
always returns the same internal object which is automatically updated based
on new partial transcription results, and it is not permitted to modify this object. If you
need to store the result before the transcription is finished, use getResultCopy()
, which
returns a frozen copy of the current result.
Errors, canceling and states
An error can occur at any point after start()
is called and until the transcription is
finished as indicated by onFinished
. When an error occurs, recording and transcription
are immediately stopped and the onError
handler is called if it is set. The onError
handler is provided a status string and an error message.
It is possible to cancel all operations at any time using cancel()
. If this is called
before the transcription is finished, recording and transcription are immediately stopped;
any audio already recorded but not yet transcribed will not be transcribed.
If audio recording stops due to an external reason, such as the user revoking access to
the microphone, the library behaves as if stop()
was called, rather than treating that
as an error.
After an error or cancel()
, no more handlers will be called.
For convenience, a getState()
function is provided to determine the current state of the
RecordTranscribe
object:
getState
returns one of the following states as a string:
Init
:start()
has not been called yet.Starting
:start()
has been called, not yet recording or transcribing.Running
: Recording and transcribing.Finishing
:stop()
has been called, finishing the transcription.Finished
: The transcription is finished.Error
: An error has occurred.Cancel
:cancel()
has been called in a state other thanInit
,Finished
orError
.
Multiple transcriptions
A RecordTranscribe
object can be used only for a single transcription. To perform
another transcription, use a new RecordTranscribe
object.
It is not permitted to perform multiple transcriptions at the same time, more specifically
to have more than one RecordTranscribe
object in Starting
, Running
or Finishing
state. If you attempt to start()
a transcription violating this, an exception will be
thrown.
Transcription features
The following subsections show how to enable various features of Soniox speech
recognition with Web Voice. Note that the respective options need to be set
before calling recordTranscribe.start()
.
Specify model
Refer to How-to guides / Models and languages for available models.
Customization
Refer to How-to Guides / Customization.
To use a stored speech context:
Speaker diarization
Refer to How-to Guides / Separate speakers.
Global speaker diarization (requires a default model and includeNonFinal
equal to false):
Streaming speaker diarization (requires a low-latency model and includeNonFinal
equal to true):
Speaker numbers will be provided in word.speaker
.
For example, to print words with speaker numbers in onFinished
:
If you want to set the minimum/maximum number of speakers:
Speaker identification
Refer to How-to Guides / Identify speakers.
First enable either form of speaker diarization (see above), then enable speaker identification and set candidate speakers:
Information about identified speakers is provided in result.speakers
.
This is a Map
that maps speaker numbers to { speaker, name }
, where
speaker
is the speaker number and name
is the speaker name.
For example, to print tokens with speaker names in onFinished
: