[< BACK]
// POSTED: Apr 15, 2026

AI Inference Engineer

APPLY NOW
Be part of the team creating the software foundation for next-generation AI compute platforms. In this role, you’ll work across the full stack — from low-level kernels and hardware-optimized operators to large-scale ML deployment frameworks — in close collaboration with compiler developers, ML scientists, and hardware specialists. This position offers the chance to contribute to state-of-the-art AI infrastructure, fine-tune software for custom hardware, and deepen your expertise in system software and machine learning. Responsibilities (some of the following) - Design, develop, and maintain components of the deployment stack and software kernels for AI compute platforms - Optimize and implement core ML operators (e.g., GEMMs, convolutions, BLAS routines, SIMD kernels) - Translate computational graphs from ML frameworks onto the underlying hardware - Contribute to compiler infrastructure together with compiler and hardware teams - Investigate and resolve issues through system-level debugging and performance analysis - Deliver scalable software solutions under ambitious development schedules - Define and apply practices for testing, deployment, and scaling AI systems - Minimum qualifications - Bachelor’s degree in Computer Science, Engineering, Mathematics, or related discipline, with 3+ years of professional software development experience - Solid knowledge of computer architecture, system software, data structures - Strong programming skills in C/C++ or Python in Linux environments using common development tools - Hands-on experience implementing algorithms in high-level languages (C/C++/Python) - Exposure to specialized hardware (GPUs, FPGAs, DSPs, AI accelerators) and frameworks such as OpenCL or CUDA - Experience designing or working with high-performance software systems - Solid knowledge of ML fundamentals - Motivated team player with a strong sense of responsibility You are a great fit if you have experience in at least one of the following areas: - Model serving frameworks (e.g., Triton Inference Server, DeepSpeed Inference, vLLM) - Deep learning frameworks (e.g., PyTorch, TensorFlow) - ML runtimes (e.g., ONNX Runtime, TVM, IREE, XLA) - Distributed collectives (e.g., Gloo, MPI) - Software testing and validation methodologies - Deploying ML workloads (LLMs, VLMs, NLP, etc.) across distributed systems - Implementation of ML operators and kernels (e.g., SIMD routines, Activation functions, Pooling layers, Quantization layers) - Hardware-aware optimizations and performance tuning - 2+ years of experience developing software targeting AI hardware - Contribution to open-source projects (e.g., LLVM, PyTorch, TensorFlow, ONNX Runtime, xDSL, IREE) is a big plus.
Interested in this role?Apply on iHire