What does it take to build a machine learning inference engine in Rust - and what does that experience reveal about Rust as a language for ML infrastructure?
This talk presents lessons learned from designing and implementing a GPU-accelerated ONNX runtime in Rust. The project, Onyxia, parses ONNX graphs, compiles them through a multi-pass pipeline, generates WGSL compute shaders, and executes them via wgpu across desktop, mobile, and the web.
The session focuses on how Rust's language, tooling, and ecosystem made the project possible. A friendly compiler and a strong type system make working with graphs a joy. Tests and documentation colocated with the code make it easy to understand the internals of the crates. Together, these properties make it possible to build a runtime that is both efficient and understandable.
The talk also reflects on Rust's growing adoption in machine learning infrastructure and the ecosystem crates that make such projects practical today.