An MLIR Lowering Pipeline for Stencils at Wafer-Scale
Nicolai Stawinoga, David Katz, Anton Lydike, Justs Zarins, Nick Brown, George Bisbas, Tobias Grosser

TL;DR
This paper introduces a compiler pipeline that automatically transforms stencil-based HPC kernels into optimized code for the Cerebras WSE, achieving significant performance gains without application code modifications.
Contribution
The paper presents a novel MLIR-based lowering pipeline that enables automatic targeting of the WSE for stencil computations, bridging the gap between mathematical models and hardware execution.
Findings
Performance on WSE3 is 14x faster than 128 Nvidia A100 GPUs.
Performance on WSE3 is 20x faster than 128 nodes of a CPU supercomputer.
The approach matches or exceeds manually optimized code performance.
Abstract
The Cerebras Wafer-Scale Engine (WSE) delivers performance at an unprecedented scale of over 900,000 compute units, all connected via a single-wafer on-chip interconnect. Initially designed for AI, the WSE architecture is also well-suited for High Performance Computing (HPC). However, its distributed asynchronous programming model diverges significantly from the simple sequential or bulk-synchronous programs that one would typically derive for a given mathematical program description. Targeting the WSE requires a bespoke re-implementation when porting existing code. The absence of WSE support in compilers such as MLIR, meant that there was little hope for automating this process. Stencils are ubiquitous in HPC, and in this paper we explore the hypothesis that domain specific information about stencils can be leveraged by the compiler to automatically target the WSE without requiring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Interconnection Networks and Systems
