Response
Result
The Result
message is returned when transcribing audio and contains the information about the recognized words and other data.
message Result {
repeated Word words = 1;
int32 final_proc_time_ms = 2;
int32 total_proc_time_ms = 3;
repeated ResultSpeaker speakers = 6;
int32 channel = 7;
}
The words
field contains a sequence of Word
messages representing recognized words.
The final_proc_time_ms
and total_proc_time_ms
fields determine the duration of processed
audio in milliseconds, resulting in final and all words respectively. In a Transcribe
request, both values are equal. In a TranscribeStream
request, these behave as described in Final vs Non-Final Words section.
For the speakers
field see the ResultSpeaker section below.
When separate recognition per channel is used, the channel
field indicates the audio channel that the result is associated with (starting at 0).
Word
message Word {
string text = 1;
int32 start_ms = 2;
int32 duration_ms = 3;
bool is_final = 4;
int32 speaker = 5;
string orig_text = 8;
double confidence = 9;
}
The Word
message represents an individual recognized word, which is given in the
text
field.
The start_ms
and duration_ms
fields represent the time interval of the word in the audio.
Understood as half-open interval [start_ms
, start_ms
+ duration_ms
), it is guaranteed
that there is no overlap between transcribed words from the start.
The is_final
field specifies if the word is final. This distinction is relevant only when using
TranscribeStream
with include_nonfinal
=true; in other cases is_final
is always true,
Refer to Final vs Non-Final Words.
The speaker
field indicates the speaker number. Valid speaker numbers are greater than 0.
Only available when using Speaker Diarization.
The orig_text
field indicates the original word when the word in text
was masked by
the profanity filter, otherwise it is empty. Refer to Profanity Filter.
The confidence
field has the value between 0 and 1 indicating the system's confidence level of
the word being correctly recognized.
ResultSpeaker
message ResultSpeaker {
int32 speaker = 1;
string name = 2;
}
If using Speaker Identification, the Result.speakers
field
contains associations between speaker numbers and the specified candidate speaker names.
These associations (speaker-number -> speaker-name) give you complete information to
assign the specified registered speakers to the recognized words.
The Result.speakers
field contains the latest or best associations so far.
These associations hold for all words from the start of the audio, not just for words in the latest result.