Soniox
Docs

Transcription

Transcribe audio files and generate text from them.

Speaker labels

Transcription with speaker labels is the default transcription option. Omnio will infer the speaker labels from the audio.

Transcription output can be in Markdown or JSON format. The default option is Markdown.

Prompt

Transcribe the audio.
Transcribe the audio with speaker labels.

Output

Alice: Hello.

Bob: Hi, is this Alice?

Alice: It is, yeah.

Bob: Alice, how is it going? My name is Bob.


Prompt

Transcribe the audio as JSON.
Transcribe the audio with speaker labels as JSON.

Output

[
  {
    "speaker": "Alice",
    "text": "Hello."
  },
  {
    "speaker": "Bob",
    "text": "Hi, is this Alice?"
  },
  {
    "speaker": "Alice",
    "text": "It is, yeah."
  },
  {
    "speaker": "Bob",
    "text": "Alice, how is it going? My name is Bob."
  }
]

Speaker numbers

If you don't want Omnio to infer speaker labels, but you still want to distinguish speakers, you can label the speakers numerically.

Prompt

Transcribe the audio with speaker numbers.

Output

Speaker 1: Hello.

Speaker 2: Hi, is this Alice?

Speaker 1: It is, yeah.

Speaker 2: Alice, how is it going? My name is Bob.


Prompt

Transcribe the audio with speaker numbers as JSON.

Output

[
  {
    "speaker": "Speaker 1",
    "text": "Hello."
  },
  {
    "speaker": "Speaker 2",
    "text": "Hi, is this Alice?"
  },
  {
    "speaker": "Speaker 1",
    "text": "It is, yeah."
  },
  {
    "speaker": "Speaker 2",
    "text": "Alice, how is it going? My name is Bob."
  }
]

Without speaker separation

Transcribing without speaker separation will merge together speech from all speakers.

Prompt

Transcribe the audio without speaker separation.

Output

Hello. Hi, is this Alice? It is, yeah. Alice, how is it going? My name is Bob.

On this page