Search#
Once the audio has been transcribed and stored, you can search over the data by id, metadata, datetime and transcript content. This is done using the Search API call.
Example#
In this example we are going to search for the top 20
phone calls that were transcribed after
2023-01-31
from company="Nike"
and agent="12345"
, and the call was about jordan shoes
.
from datetime import datetime
from soniox.speech_service import SpeechClient
from soniox.storage import search_objects
# Do not forget to set your API key in the SONIOX_API_KEY environment variable.
def main():
with SpeechClient() as client:
# Search for objects.
search_response = search_objects(
client,
datetime_from=datetime.fromisoformat("2023-01-31T00:00+00:00"),
metadata_query='company="Nike" AND agent="12345"',
text_query="air jordan",
num=20,
)
# Print search results.
print(f"Results: {search_response.num_found}")
for result in search_response.results:
print(f"Object ID: {result.object_id}")
print(f"Preview: {result.preview}")
if __name__ == "__main__":
main()
Run
python3 search.py
Output
Results: 1
Object ID: my_id_for_audio
Preview: This, my Friends, is the <em>air</em> <em>Jordan</em> 6 in the what's being labeled currently the toro colorway
Parameters#
The search API takes in several paramaters which configure the search and retrieval of objects from Soniox storage.
Object ID#
You can search for objects by object ID. If the object with the specified ID exists in the storage, search results will contain a single result. Otherwise, search results will be empty.
Metadata#
Search over metadata is supported by a simple boolean grammar that supports conjuntion, disjunction and nesting. An example metadata query is presented below.
Let the stored metadata be:
{
"key1": "val1",
"key2": "val2",
"key3": "val3
}
The metadata query language supports expressions of the form:
(key1 = val1 OR key2 = someval) AND key3 = val3
Terms AND
and OR
can be used interchangeably with &&
and ||
, respectively.
Values in the query are preferably represented within double-quotes. In this case,
the value can contain any character, but any occurrence of double quote ("
) or
backslash (\
) must be escaped by adding a backslash in front. For example, to
search for key1
equal to a\b
, use the query key1 = "a\\b"
.
Note that this is in addition to any escaping required by your programming language.
Values can also be specified without using double-quotes, but this is only possible
if the value contains only A-Z 0-9 - _
(space is not allowed).
Datetime#
You can search for objects by datetime.
You can specify datetime_from
(inclusive) and/or datetime_to
(non-inclusive).
If you want to search based on date rather than datetime:
- To search from some date, set
datetime_from
to that date and time 00:00. - To search up to some date, set
datetime_to
to that date plus one day and time 00:00. - To search for a specific date, do both of the above.
Text Query#
You can search for objects by title and transcript content. Simply specify the search query (e.g. “jordan shoes”) and Soniox will return search results ordered by relevance for the given search query.
Result Range#
You can obtain only a specific range of search results using the start
and num
parameters.
start
defines the index of the first returned result. If it is 0 (the default), results are returned from the start.num
defines the desired number of results to return. If it is 0 (the default), a specific default number (currently 20) will be used. Otherwise, the number will still be limited to a specific maximum number (currently 100).
These parameters can be used to implement pagination of search results in combination with
te num_found
field in the search response (see below).
Response#
The SearchResponse
structure contains all information returned by a search request.
message SearchResponse {
int32 num_found = 1;
int32 start = 2;
repeated SearchResult results = 3;
}
num_found
is the total number of objects matching the search query. It is useful to implement pagination in combination with thestart
andnum
parameters, which do not affectnum_found
.start
is the index of the first returned result, the same as specified in the request.results
is the list of returned search results.
message SearchResult {
string object_id = 1;
map<string, string> metadata = 2;
string title = 3;
google.protobuf.Timestamp datetime = 4;
int32 duration_ms = 5;
string preview = 6;
}
object_id
is the object ID.metadata
is the user-specified metadata.title
is the user-specified title.datetime
is the user-specifed or auto-generated datetime.duration_ms
is the audio duration in milliseconds.preview
is a short extract from the transcript that matched the text query, or empty if not available. Words or phrases in the preview that match the text query are enclosed within<em>...</em>
tags.