Soniox
Docs

Transcription advanced

Transcribe audio files and generate text from them with advanced features.

Context

Adding context can improve transcription accuracy. Context can be beneficial to correctly transcribe uncommon spoken words, such as:

  • entity names (people, company or product names, etc.)
  • technical jargon (in medicine, engineering, science, etc.)

Context can be any text (e.g. a summary or a related document) or just a list of relevant words. It can also contain text or phrases that might not be present in the audio. Omnio will only use context if necessary.

Prompt

Transcribe the audio. Here's the relevant context:
XiangXYZ Electronics
TitanHex Technologies
ExoBook ZX5
Diagon TekGear Z7
OmegaDrive T5
QuantumGear E9

Output

Alice: Hey Bob, did you hear about the new laptop from XiangXYZ Electronics?

Bob: Yeah, the ExoBook ZX5, right? It's supposed to be a contender against the Diagon TekGear Z7.

Acoustic tags

Acoustic tags provide additional information about unspoken acoustic elements in the audio, such as background sounds, music, speech tone (screaming, whispering), etc.

Prompt

Transcribe the audio with acoustic tag information.

Output

Show host: Welcome back to the show! [applause] Let's start the next round. [buzzer]

Timestamps

Transcription with timestamps includes timestamps in [MI:SS] format.

Prompt

Transcribe the audio with timestamps.

Output

Alice:

[00:00] Hello. [00:00]

Bob:

[00:01] Hi, is this Alice? [00:02]

Alice:

[00:03] It is, yeah. [00:05]

Bob:

[00:05] Alice, how is it going? [00:08]

[00:09] My name is Bob. [00:11]

You can also specify how frequently you want to insert timestamps.

Prompt

Transcribe the audio with timestamps.
Insert timestamps every 50 characters on average.

Output

Speaker:

[01:25] No, it's very hard on your horse, and they [01:27]

[01:27] recommend starting with duration first. So this [01:30]

[01:30] long, slow distance is you build up over the 3 [01:33]

[01:33] to 12 months to 45 to 60 minutes depending on [01:36]

[01:36] your discipline of walk, trot, canter. Your horse [01:40]

[01:40] should be able to do that and not be winded. [01:42]

Verbatim

Verbatim transcription is a word-for-word transcription of spoken language. This means that every single word, including fillers, pauses and false starts, is transcribed exactly as it was heard.

Prompt

Create a verbatim transcript of the audio.

Output

Alice: He-Hello.

Bob: Hi, um, is this Alice?

Alice: It is, yeah.

Bob: Alice, how-how is it going? My name, um, is Bob.

Clean verbatim

Clean verbatim transcription removes filler words, stammers and interjections from other speakers (e.g. "mm-hmm", "um").

Alice: Hello.

Bob: Hi, is this Alice?

Alice: It is, yeah.

Bob: Alice, how is it going? My name is Bob.

Profanity

Omnio can mask, remove or tag profanity.

Prompt

Transcribe the audio with profanity masked.

Output

Person: I'm done with this s***, you f****** a******.


Prompt

Transcribe the audio with profanity removed.

Output

Person: I'm done with this [profanity removed], you [profanity removed] [profanity removed].


Prompt

Transcribe the audio with profanity tagged.

Output

Person: I'm done with this [profanity:shit], you [profanity:fucking] [profanity:asshole].

Personally identifiable information

Omnio can remove or tag personally identifiable information (PII), such as names, addresses, dates of birth, and phone numbers.

Prompt

Transcribe the audio with personal information removed.

Output

Interviewer: Please state your full name and address.

Interviewee: My name is [name and surname removed]. I currently live on [address removed].


Prompt

Transcribe the audio with personal information tagged.

Output

Interviewer: Please state your full name and address.

Interviewee: My name is [pii:Jane Doe]. I currently live on [pii:North 15th Street].

On this page