The Result message is returned when transcribing audio and contains the information about the recognized words and other data.

message Result {
  repeated Word words = 1;
  int32 final_proc_time_ms = 2;
  int32 total_proc_time_ms = 3;
  repeated ResultSpeaker speakers = 6;
  int32 channel = 7;

The words field contains a sequence of Word messages representing recognized words.

The final_proc_time_ms and total_proc_time_ms fields determine the duration of processed audio in milliseconds, resulting in final and all words respectively. In a Transcribe request, both values are equal. In a TranscribeStream request, these behave as described in Final vs Non-Final Words section.

For the speakers field see the ResultSpeaker section below.

When separate recognition per channel is used, the channel field indicates the audio channel that the result is associated with (starting at 0).


message Word {
  string text = 1;
  int32 start_ms = 2;
  int32 duration_ms = 3;
  bool is_final = 4;
  int32 speaker = 5;
  string orig_text = 8;
  double confidence = 9;

The Word message represents an individual recognized word, which is given in the text field.

The start_ms and duration_ms fields represent the time interval of the word in the audio. Understood as half-open interval [start_ms, start_ms + duration_ms), it is guaranteed that there is no overlap between transcribed words from the start.

The is_final field specifies if the word is final. This distinction is relevant only when using TranscribeStream with include_nonfinal=true; in other cases is_final is always true, Refer to Final vs Non-Final Words.

The speaker field indicates the speaker number. Valid speaker numbers are greater than 0. Only available when using Speaker Diarization.

The orig_text field indicates the original word when the word in text was masked by the profanity filter, otherwise it is empty. Refer to Profanity Filter.

The confidence field has the value between 0 and 1 indicating the system's confidence level of the word being correctly recognized.


message ResultSpeaker {
  int32 speaker = 1;
  string name = 2;

If using Speaker Identification, the Result.speakers field contains associations between speaker numbers and the specified candidate speaker names. These associations (speaker-number -> speaker-name) give you complete information to assign the specified registered speakers to the recognized words.

The Result.speakers field contains the latest or best associations so far. These associations hold for all words from the start of the audio, not just for words in the latest result.

cookie Change your cookie preferences