Document formatting
Soniox Document Formatting enables you to convert a transcript into a custom formatted document.
Soniox Document Formatting enables you to convert a transcript into a custom formatted document when using Soniox in async mode. You can create a high-quality document output without significant post-recognition editing. In addition, it provides the ability to annotate sections of text given certain configuration parameters. Thus saving significant time and expense in the production of final documents for a variety of industries and applications.
The document formatting and annotations are configurable on-the-fly with each async transcribe request. You can specify various format configurations (e.g. dates, numbers, measurement units, spacing) which are then applied on top of the transcript to obtain the formatted text that you require. You can also specify the annotation configurations, which allows you to split and annotate the text into discrete sections using specified key phrases.
Format a transcript according to custom configuration.
Annotate and split the text into section depending on the configuration.
Example
Dictated speech
Formatted document
Document formmatting configuration is specified in the transcription configuration as a separate
JSON object in a string field (document_formatting_config.config_json
). This JSON object can
have two fields, format
and annotation
, containing specific configuration parameters.
The formatted document is returned from the GetTranscribeAsyncResult
API call using the Document
data structure. Refer to speech_service.proto
for the definition of this data structure.