Library Liberation: Competitive Performance Matmul Through Compiler-composed Nanokernels
Arun Thangamani, Md Asghar Ahmad Shahid, Adam Siemieniuk, Rolf Morel, Renato Golin, Alexander Heinecke

TL;DR
This paper presents an MLIR-based compilation scheme that automatically generates high-performance, scalable microkernels for AI workloads, reducing reliance on hand-crafted libraries and improving hardware utilization.
Contribution
It introduces a novel compiler technique for composing nanokernels from IR constructs, enabling automatic generation of near-peak performance microkernels tailored to hardware.
Findings
Generated nanokernels are of production quality.
Performance is competitive with state-of-the-art libraries.
Supports both vector and tile CPU instructions.
Abstract
The rapidly evolving landscape of AI and machine learning workloads has widened the gap between high-level domain operations and efficient hardware utilization. Achieving near-peak performance still demands deep hardware expertise-experts either handcraft target-specific kernels (e.g., DeepSeek) or rely on specialized libraries (e.g., CUTLASS)-both of which add complexity and limit scalability for most ML practitioners. This paper introduces a compilation scheme that automatically generates scalable, high-performance microkernels by leveraging the MLIR dialects to bridge domain-level operations and processor capabilities. Our approach removes dependence on low-level libraries by enabling the compiler to auto-generate near-optimal code directly. At its core is a mechanism for composing nanokernels from low-level IR constructs with near-optimal register utilization, forming efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Big Data and Digital Economy
