Stencil Computations on Cerebras Wafer-Scale Engine
Elia Belli, Daniele De Sensi

TL;DR
This paper demonstrates that the Cerebras WSE-3 can efficiently run 2D stencil computations, achieving significant speedups over GPU implementations by leveraging its on-chip memory and parallel architecture.
Contribution
It introduces CStencil, a framework for stencil computations on WSE-3, and shows it outperforms GPU-based solvers, bridging AI hardware and scientific computing.
Findings
CStencil achieves up to 342x speedup over GPU-based solver.
WSE-3's on-chip memory eliminates off-chip bottlenecks.
The architecture effectively saturates compute and memory resources.
Abstract
Stencil computations are a fundamental kernel in scientific computing, critical for simulations in domains such as fluid dynamics and climate modeling. However, these computations are often memory-bound on traditional High-Performance Computing architectures like GPUs, struggling against the "Memory Wall". Simultaneously, the rise of AI-oriented hardware, such as the Cerebras Wafer-Scale Engine, offers massive core parallelism and high-bandwidth on-chip memory, though typically optimized for lower-precision workloads. This work investigates the viability of bridging this divergence by mapping stencil algorithms onto the Cerebras WSE-3. The study introduces CStencil, a novel framework designed to implement two-dimensional stencil computations on the WSE-3. To ensure a rigorous and fair performance evaluation, the research also adapts ConvStencil, a state-of-the-art GPU stencil solver,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
