Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers

Bohan Hou; Hongyi Jin; Guanjie Wang; Jinqi Chen; Yaxing Cai; Lijie Yang; Zihao Ye; Yaoyao Ding; Ruihang Lai; Tianqi Chen

arXiv:2601.19092·cs.DC·January 30, 2026

Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers

Bohan Hou, Hongyi Jin, Guanjie Wang, Jinqi Chen, Yaxing Cai, Lijie Yang, Zihao Ye, Yaoyao Ding, Ruihang Lai, Tianqi Chen

PDF

Open Access

TL;DR

Axe Layout introduces a hardware-aware abstraction for mapping tensor coordinates to physical device space, unifying various layout strategies to optimize deep learning workloads across heterogeneous hardware.

Contribution

We propose Axe Layout, a unified, hardware-aware layout abstraction that simplifies and optimizes data placement and collective operations in machine learning compilers.

Findings

01

Achieves performance close to hand-tuned kernels on GPUs and multi-device setups.

02

Unifies tiling, sharding, and replication across device hierarchies.

03

Enables a single kernel design with collective primitives for diverse hardware.

Abstract

Scaling modern deep learning workloads demands coordinated placement of data and compute across device meshes, memory hierarchies, and heterogeneous accelerators. We present Axe Layout, a hardware-aware abstraction that maps logical tensor coordinates to a multi-axis physical space via named axes. Axe unifies tiling, sharding, replication, and offsets across inter-device distribution and on-device layouts, enabling collective primitives to be expressed consistently from device meshes to threads. Building on Axe, we design a multi-granularity, distribution-aware DSL and compiler that composes thread-local control with collective operators in a single kernel. Experiments show that our unified approach can bring performance close to hand-tuned kernels on across latest GPU devices and multi-device environments and accelerator backends.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Cloud Computing and Resource Management