TileLoom: Automatic Dataflow Planning for Tile-Based Languages on Spatial Dataflow Accelerators
Wei Li, Zhenyu Bai, Heru Wang, Pranav Dangi, Zhiqiang Zhang, Cheng Tan, Huiying Lan, Weng-Fai Wong, Tulika Mitra

TL;DR
TileLoom is an MLIR-based framework that automates dataflow planning for tile-based programs on spatial dataflow accelerators, improving performance and programmability.
Contribution
It introduces a hardware-aware compiler framework that distributes tile instances across cores, optimizing data reuse and communication for spatial dataflow architectures.
Findings
Achieves performance comparable to vendor libraries on multiple kernels.
Effectively exploits on-chip network and distributed memories for data reuse.
Supports diverse spatial dataflow targets through architecture-specific optimizations.
Abstract
Spatial dataflow accelerators are a promising direction for next-generation computer systems because they can reduce the memory bottlenecks of traditional von Neumann machines such as CPUs and GPUs. They organize computation around explicit, compiler-managed data movement over on-chip networks, allowing operands to be forwarded directly between processing elements and reducing reliance on high-latency, bandwidth-limited global shared memory. However, their performance depends strongly on how workloads are mapped to hardware. Naive mappings can perform poorly, and most users rely on hand-tuned vendor libraries. Thus, despite their potential for high performance, energy efficiency, and cost efficiency, limited programmability remains a major barrier to wider adoption. This paper presents TileLoom, an MLIR-based end-to-end framework that compiles tile-based programs, such as Triton…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
