Quantifying performance bottlenecks of stencil computations using the   Execution-Cache-Memory model

Holger Stengel; Jan Treibig; Georg Hager; Gerhard Wellein

arXiv:1410.5010·cs.PF·January 28, 2016

Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model

Holger Stengel, Jan Treibig, Georg Hager, Gerhard Wellein

PDF

TL;DR

This paper refines the Execution-Cache-Memory (ECM) model to analyze and predict performance bottlenecks in stencil computations on modern Intel processors, aiding optimization efforts.

Contribution

It advances the ECM model to accurately quantify performance bottlenecks and the effects of various optimizations for stencil algorithms on contemporary hardware.

Findings

01

ECM model accurately predicts single-core performance and scalability.

02

Layer conditions significantly influence data traffic estimates.

03

Optimization techniques like spatial and temporal blocking show expected benefits.

Abstract

Stencil algorithms on regular lattices appear in many fields of computational science, and much effort has been put into optimized implementations. Such activities are usually not guided by performance models that provide estimates of expected speedup. Understanding the performance properties and bottlenecks by performance modeling enables a clear view on promising optimization opportunities. In this work we refine the recently developed Execution-Cache-Memory (ECM) model and use it to quantify the performance bottlenecks of stencil algorithms on a contemporary Intel processor. This includes applying the model to arrive at single-core performance and scalability predictions for typical corner case stencil loop kernels. Guided by the ECM model we accurately quantify the significance of "layer conditions," which are required to estimate the data traffic through the memory hierarchy, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.