Hexagon-MLIR: An AI Compilation Stack For Qualcomm's Neural Processing Units (NPUs)
Mohammed Javed Absar, Muthu Baskaran, Abhikrant Sharma, Abhilash Bhandari, Ankit Aggarwal, Arun Rangasamy, Dibyendu Das, Fateme Hosseini, Franck Slama, Iulian Brumar, Jyotsna Verma, Krishnaprasad Bindumadhavan, Mitesh Kothari, Mohit Gupta, Ravishankar Kolachana, Richard Lethin

TL;DR
Hexagon-MLIR is an open-source compilation framework that optimizes AI workloads on Qualcomm's NPUs by automating kernel-to-binary translation and enhancing data locality, thereby accelerating deployment and performance.
Contribution
It introduces a unified, open-source MLIR-based compilation stack for Qualcomm NPUs that streamlines deployment of Triton kernels and PyTorch models with optimized data handling.
Findings
Automates compilation from Triton kernels to NPU binaries.
Maximizes data locality in TCM to reduce bandwidth bottlenecks.
Supports faster deployment of AI workloads on Qualcomm NPUs.
Abstract
In this paper, we present Hexagon-MLIR,an open-source compilation stack that targets Qualcomm Hexagon Neural Processing Unit (NPU) and provides unified support for lowering Triton kernels and PyTorch models . Built using the MLIR framework, our compiler applies a structured sequence of passes to exploit NPU architectural features to accelerate AI workloads. It enables faster deployment of new Triton kernels (hand-written or subgraphs from PyTorch 2.0), for our target by providing automated compilation from kernel to binary. By ingesting Triton kernels, we generate mega-kernels that maximize data locality in the NPU's Tightly Coupled Memory (TCM), reducing the bandwidth bottlenecks inherent in library-based approaches. This initiative complements our commercial toolchains by providing developers with an open-source MLIR-based compilation stack that gives them a path to advance AI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Machine Learning in Materials Science
