Soniox Voice Agent

Overview

Soniox Voice Agent is a demo app that shows how to build a complete voice-to-voice conversational AI assistant. It demonstrates how to integrate streaming speech-to-text, a large language model (LLM), and streaming text-to-speech (TTS) into a seamless, low-latency application.

The demo bot is pre-configured as an appointment booking assistant for a fictional car repair shop, "Soniox AutoWorks." It can book appointments for services such as oil changes and car repairs, collect customer names and vehicle information, provide available appointment slots, and interactively guide users through the booking process.

The entire voice bot codebase is designed for easy customization and extension to other domains. You can quickly adapt the bot to different business needs, integrate new tools, or change its persona, making it a flexible starting point for any conversational AI application.

Features

End-to-end real-time: Fully streaming architecture (voice-in, voice-out) for natural, low-latency conversations
Multilingual: Understands and responds to users in multiple languages
Customizable AI: The bot's persona and business logic are defined in a single, easy-to-edit file
Extensible tools: Connect the LLM to your own APIs and databases to perform real-world actions
Multiple ways to interact: Web frontend, Twilio phone call, or any other WebSocket-based connection

Usage flow

Connect via web browser or phone call
Speak naturally in any language to the AI agent
The bot transcribes your speech, understands intent, and generates a response
Listen to the AI's spoken response in real-time
The conversation continues with full context awareness

Architecture

Server (Python): Orchestrates the conversation with modular processors (VAD, STT, LLM, TTS)
Frontend (React): Captures microphone audio, streams it to the backend, and plays back responses
Twilio proxy (Python): Optional bridge to connect phone calls to the voice bot backend

We provide all the implementations with links to GitHub:

Python server
React frontend (web)
Twilio proxy (phone integration)

Customize the bot's persona and instructions in server/tools.py
Implement your own tools to connect to external APIs and databases
Adapt the frontend or Twilio integration to your needs

Soniox Voice Agent

Overview

Features

Usage flow

Architecture

How it works

Voice Activity Detection (VAD) Processor

Speech-to-Text (STT) Processor

Language Model (LLM) Processor

Text-to-Speech (TTS) Processor

Next steps

On this page