The Voice AI Wiki

A knowledge base for voice AI: speech recognition, synthesis, real-time translation, voice agents, and everything in between.

Foundations

Orientation: what the field is, what the words mean, and how it got here, before you drill into a topic.

Speech-to-text

The core. Every concept that sits between a microphone and a transcript.

Text-to-speech

Making machines speak: how voices are built, streamed, and judged.

Speech translation

Translating speech as it happens, before the sentence is even finished.

Voice agents

Closing the loop: software that listens, decides, and speaks back fast enough to hold a conversation.

Audio intelligence

Everything you can pull out of audio once the words are no longer the point.

Glossary

Terms used through The Voice AI Wiki