Soniox Docs
  • Quickstart
  • How-to Guides
    • 1. Transcribe Audio
    • 2. Configure Requests
    • 3. Models and Languages
    • 4. Transcription Results
    • 5. Separate Speakers
    • 6. Identify Speakers
    • 7. Storage and Search
      • Configuration
      • Stored Data
      • Search
      • Get Object
      • Get Audio
      • List Objects
      • Delete Object
    • 8. Document Formatting
  • API Frameworks
  • Web Library
  • Soniox GitHub
  • Privacy and Security
Soniox Docs » How-to Guides » Storage and Search » Search

Search#

Once the audio has been transcribed and stored, you can search over the data by id, metadata, datetime and transcript content. This is done using the Search API call.

Example#

In this example we are going to search for the top 20 phone calls that were transcribed after 2023-01-31 from company="Nike" and agent="12345", and the call was about jordan shoes.

search.py

from datetime import datetime
from soniox.speech_service import SpeechClient
from soniox.storage import search_objects


# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
    with SpeechClient() as client:
        # Search for objects.
        search_response = search_objects(
            client,
            datetime_from=datetime.fromisoformat("2023-01-31T00:00+00:00"),
            metadata_query='company="Nike" AND agent="12345"',
            text_query="air jordan",
            num=20,
        )

        # Print search results.
        print(f"Results: {search_response.num_found}")
        for result in search_response.results:
            print(f"Object ID: {result.object_id}")
            print(f"Preview: {result.preview}")


if __name__ == "__main__":
    main()

Run

python3 search.py

Output

Results: 1
Object ID: my_id_for_audio
Preview: This, my friends, is the <em>Air</em> <em>Jordan</em> 6 in what's being labeled currently the Toro colorway.

Parameters#

The search API takes in several paramaters which configure the search and retrieval of objects from Soniox storage.

Object ID#

You can search for objects by object ID. If the object with the specified ID exists in the storage, search results will contain a single result. Otherwise, search results will be empty.

Metadata#

Search over metadata is supported by a simple boolean grammar that supports conjuntion, disjunction and nesting. An example metadata query is presented below.

Let the stored metadata be:

{
    "key1": "val1",
    "key2": "val2",
    "key3": "val3
}

The metadata query language supports expressions of the form:

(key1 = val1 OR key2 = someval) AND key3 = val3

Terms AND and OR can be used interchangeably with && and ||, respectively.

Values in the query are preferably represented within double-quotes. In this case, the value can contain any character, but any occurrence of double quote (") or backslash (\) must be escaped by adding a backslash in front. For example, to search for key1 equal to a\b, use the query key1 = "a\\b". Note that this is in addition to any escaping required by your programming language.

Values can also be specified without using double-quotes, but this is only possible if the value contains only A-Z 0-9 - _ (space is not allowed).

Datetime#

You can search for objects by datetime. You can specify datetime_from (inclusive) and/or datetime_to (non-inclusive).

If you want to search based on date rather than datetime:

  • To search from some date, set datetime_from to that date and time 00:00.
  • To search up to some date, set datetime_to to that date plus one day and time 00:00.
  • To search for a specific date, do both of the above.

Text Query#

You can search for objects by title and transcript content. Simply specify the search query (e.g. “jordan shoes”) and Soniox will return search results ordered by relevance for the given search query.

Result Range#

You can obtain only a specific range of search results using the start and num parameters.

  • start defines the index of the first returned result. If it is 0 (the default), results are returned from the start.
  • num defines the desired number of results to return. If it is 0 (the default), a specific default number (currently 20) will be used. Otherwise, the number will still be limited to a specific maximum number (currently 100).

These parameters can be used to implement pagination of search results in combination with te num_found field in the search response (see below).

Response#

The SearchResponse structure contains all information returned by a search request.

message SearchResponse {
    int32 num_found = 1;
    int32 start = 2;
    repeated SearchResult results = 3;
}
  • num_found is the total number of objects matching the search query. It is useful to implement pagination in combination with the start and num parameters, which do not affect num_found.
  • start is the index of the first returned result, the same as specified in the request.
  • results is the list of returned search results.
message SearchResult {
    string object_id = 1;
    map<string, string> metadata = 2;
    string title = 3;
    google.protobuf.Timestamp datetime = 4;
    int32 duration_ms = 5;
    string preview = 6;
}
  • object_id is the object ID.
  • metadata is the user-specified metadata.
  • title is the user-specified title.
  • datetime is the user-specifed or auto-generated datetime.
  • duration_ms is the audio duration in milliseconds.
  • preview is a short extract from the transcript that matched the text query, or empty if not available. Words or phrases in the preview that match the text query are enclosed within <em>...</em> tags.
Previous Stored Data
Next Get Object
On this page:
  • Search
    • Example
    • Parameters
      • Object ID
      • Metadata
      • Datetime
      • Text Query
      • Result Range
    • Response

This website uses cookies to improve user experience and marketing. Read more about cookies on our privacy policy page.