Software engineer, GPU inference

Ljubljana, Slovenia

Apply now

About the role

Soniox is pushing the boundaries of real-time speech AI, and we’re looking for an engineer to help us scale the world’s most advanced language models across a low-latency, high-throughput, production-grade inference stack.

In this role, you’ll work at the intersection of deep learning, systems engineering, and performance optimization — helping us squeeze every FLOP out of our GPUs, reduce latency to the millisecond, and keep our systems running at global scale.

In this role, you will:

  • Work closely with researchers, engineers, and product teams to bring cutting-edge AI models into real-world production.
  • Architect and optimize our inference infrastructure to deliver low-latency, high-reliability performance across thousands of concurrent requests.
  • Identify and eliminate system bottlenecks, improving throughput and GPU utilization across the fleet.
  • Introduce and implement tools and techniques to monitor, debug, and improve model inference at scale.
  • Tune our VM fleet to maximize compute, memory, and network efficiency — down to the last GPU cycle.
  • Support advanced research workflows by building robust, scalable systems that enable rapid experimentation.

You might thrive in this role if you:

  • Have a strong intuition for optimizing modern ML architectures for inference performance.
  • Are deeply familiar with PyTorch, CUDA, NCCL, and GPU internals — or excited to become an expert quickly.
  • Understand HPC fundamentals and have worked with technologies like InfiniBand, NVLink, or MPI.
  • Have experience building and scaling distributed systems in production, ideally performance-critical ones.
  • Have rebuilt or refactored systems due to 10x+ scale increases — and know what to watch out for.
  • Are a self-starter who thrives in fast-moving environments and finds clarity amidst ambiguity.
  • Care about reliability, simplicity, and performance — and take ownership from design to deployment.
  • Have at least 5 years of professional software engineering experience.

Why Soniox

You’ll help build one of the most technically advanced AI platforms in the world — and shape how it reaches and supports users globally.

You’ll work directly with a world-class team of engineers and researchers solving frontier problems in speech and language AI.

You'll have a voice in how our company grows, how our customers succeed, and how AI transforms human communication.

Ready to join Soniox? Apply now