Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model
Holger Stengel, Jan Treibig, Georg Hager, Gerhard Wellein

TL;DR
This paper refines the Execution-Cache-Memory (ECM) model to analyze and predict performance bottlenecks in stencil computations on modern Intel processors, aiding optimization efforts.
Contribution
It advances the ECM model to accurately quantify performance bottlenecks and the effects of various optimizations for stencil algorithms on contemporary hardware.
Findings
ECM model accurately predicts single-core performance and scalability.
Layer conditions significantly influence data traffic estimates.
Optimization techniques like spatial and temporal blocking show expected benefits.
Abstract
Stencil algorithms on regular lattices appear in many fields of computational science, and much effort has been put into optimized implementations. Such activities are usually not guided by performance models that provide estimates of expected speedup. Understanding the performance properties and bottlenecks by performance modeling enables a clear view on promising optimization opportunities. In this work we refine the recently developed Execution-Cache-Memory (ECM) model and use it to quantify the performance bottlenecks of stencil algorithms on a contemporary Intel processor. This includes applying the model to arrive at single-core performance and scalability predictions for typical corner case stencil loop kernels. Guided by the ECM model we accurately quantify the significance of "layer conditions," which are required to estimate the data traffic through the memory hierarchy, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
