Beyond Exascale: Dataflow Domain Translation on a Cerebras Cluster
Tomas Oppelstrup, Nicholas Giamblanco, Delyan Z. Kalchev, Ilya Sharapov, Mark Taylor, Dirk Van Essendelft, Sivasankaran Rajamanickam, Michael James

TL;DR
This paper presents the Domain Translation algorithm that significantly enhances simulation performance on Cerebras clusters, achieving unprecedented throughput and efficiency for physical system modeling at exascale levels.
Contribution
The paper introduces a novel Domain Translation algorithm that improves simulation rates and scaling efficiency on Cerebras clusters, surpassing traditional domain decomposition methods.
Findings
Simulations exceeding 1.6 million time steps per second
Perfect weak scaling at 88% of peak performance
112 PFLOP/s in a power-unconstrained environment
Abstract
Simulation of physical systems is essential across scientific and engineering domains. Commonly used domain decomposition methods are unable to simultaneously deliver both high simulation rate and high utilization in network computing environments. In particular, Exascale systems deliver only a small fraction their peak performance for these workloads. This paper introduces the novel Domain Translation algorithm, designed to overcome these limitations. On a cluster of 64 Cerebras CS-3 systems, we use this method to demonstrate unprecedented cluster performance across a range of metrics: we show simulations running in excess of 1.6 million time steps per second; we also demonstrate perfect weak scaling at 88% of peak performance. At this cluster scale, our implementation provides 112 PFLOP/s in a power-unconstrained environment, and 57 GFLOP/J in a power-limited environment. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
