About the role
Soniox is pushing the boundaries of real-time speech AI, and we’re looking for an engineer to help us scale the world’s most advanced language models across a low-latency, high-throughput, production-grade inference stack.
In this role, you’ll work at the intersection of deep learning, systems engineering, and performance optimization — helping us squeeze every FLOP out of our GPUs, reduce latency to the millisecond, and keep our systems running at global scale.
In this role, you will:
- Work closely with researchers, engineers, and product teams to bring cutting-edge AI models into real-world production.
- Architect and optimize our inference infrastructure to deliver low-latency, high-reliability performance across thousands of concurrent requests.
- Identify and eliminate system bottlenecks, improving throughput and GPU utilization across the fleet.
- Introduce and implement tools and techniques to monitor, debug, and improve model inference at scale.
- Tune our VM fleet to maximize compute, memory, and network efficiency — down to the last GPU cycle.
- Support advanced research workflows by building robust, scalable systems that enable rapid experimentation.
You might thrive in this role if you:
- Have a strong intuition for optimizing modern ML architectures for inference performance.
- Are deeply familiar with PyTorch, CUDA, NCCL, and GPU internals — or excited to become an expert quickly.
- Understand HPC fundamentals and have worked with technologies like InfiniBand, NVLink, or MPI.
- Have experience building and scaling distributed systems in production, ideally performance-critical ones.
- Have rebuilt or refactored systems due to 10x+ scale increases — and know what to watch out for.
- Are a self-starter who thrives in fast-moving environments and finds clarity amidst ambiguity.
- Care about reliability, simplicity, and performance — and take ownership from design to deployment.
- Have at least 5 years of professional software engineering experience.
Why Soniox
You’ll help build one of the most technically advanced AI platforms in the world — and shape how it reaches and supports users globally.
You’ll work directly with a world-class team of engineers and researchers solving frontier problems in speech and language AI.
You'll have a voice in how our company grows, how our customers succeed, and how AI transforms human communication.
Ready to join Soniox? Apply now