Stencil Computations on Cerebras Wafer-Scale Engine

Elia Belli; Daniele De Sensi

arXiv:2605.07954·cs.DC·May 11, 2026

Stencil Computations on Cerebras Wafer-Scale Engine

Elia Belli, Daniele De Sensi

PDF

TL;DR

This paper demonstrates that the Cerebras WSE-3 can efficiently run 2D stencil computations, achieving significant speedups over GPU implementations by leveraging its on-chip memory and parallel architecture.

Contribution

It introduces CStencil, a framework for stencil computations on WSE-3, and shows it outperforms GPU-based solvers, bridging AI hardware and scientific computing.

Findings

01

CStencil achieves up to 342x speedup over GPU-based solver.

02

WSE-3's on-chip memory eliminates off-chip bottlenecks.

03

The architecture effectively saturates compute and memory resources.

Abstract

Stencil computations are a fundamental kernel in scientific computing, critical for simulations in domains such as fluid dynamics and climate modeling. However, these computations are often memory-bound on traditional High-Performance Computing architectures like GPUs, struggling against the "Memory Wall". Simultaneously, the rise of AI-oriented hardware, such as the Cerebras Wafer-Scale Engine, offers massive core parallelism and high-bandwidth on-chip memory, though typically optimized for lower-precision workloads. This work investigates the viability of bridging this divergence by mapping stencil algorithms onto the Cerebras WSE-3. The study introduces CStencil, a novel framework designed to implement two-dimensional stencil computations on the WSE-3. To ensure a rigorous and fair performance evaluation, the research also adapts ConvStencil, a state-of-the-art GPU stencil solver,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.