Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation
Ege Beysel, Maximilian Bartel, Jan Moritz Joseph

TL;DR
This paper introduces vector-length-aware packed data layouts and compiler extensions to enable scalable, vector-length-agnostic ML code generation, improving performance and portability across Arm CPUs with SVE.
Contribution
It presents a novel approach integrating vector-length-aware layouts into MLIR/IREE, extending tiling, fusion, and vectorization for scalable vector lengths in ML compilation.
Findings
Generated SVE code often outperforms NEON-based code within IREE.
Achieved up to 1.45× speedup on real-world ML workloads.
Code scales with increasing SVE vector length, supporting portability.
Abstract
Scalable vector instruction sets such as Arm SVE enable vector-length-agnostic (VLA) execution, allowing a single implementation to adapt across hardware with different vector lengths. However, they complicate compiler code generation, as tiling and data layout decisions can no longer be fixed at compile time. We present an approach for enabling VLA code generation in an end-to-end ML compilation pipeline through vector-length-aware packed data layouts and corresponding compiler extensions. We integrate these mechanisms into MLIR/IREE and extend tiling, fusion, and vectorization to operate with scalable vector lengths. Evaluated on real-world ML workloads on Arm CPUs, our approach generates SVE code that is competitive with, and often outperforms, existing NEON-based code generation within IREE, achieving up to speedup. We also outperform PyTorch ecosystem frameworks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
