Efficient multicore-aware parallelization strategies for iterative   stencil computations

Jan Treibig; Gerhard Wellein; Georg Hager

arXiv:1004.1741·cs.PF·March 1, 2012

Efficient multicore-aware parallelization strategies for iterative stencil computations

Jan Treibig, Gerhard Wellein, Georg Hager

PDF

TL;DR

This paper presents optimized multicore-aware parallelization strategies for iterative stencil computations, focusing on cache efficiency and multi-threading to improve performance of Jacobi and Gauss-Seidel smoothers.

Contribution

It refines temporal cache blocking techniques specifically for multicore architectures and demonstrates performance gains using simultaneous multi-threading for Gauss-Seidel smoothers.

Findings

01

Temporal cache blocking reduces memory bus pressure

02

SMT significantly improves Gauss-Seidel smoother performance

03

Optimized strategies outperform baseline implementations

Abstract

Stencil computations consume a major part of runtime in many scientific simulation codes. As prototypes for this class of algorithms we consider the iterative Jacobi and Gauss-Seidel smoothers and aim at highly efficient parallel implementations for cache-based multicore architectures. Temporal cache blocking is a known advanced optimization technique, which can reduce the pressure on the memory bus significantly. We apply and refine this optimization for a recently presented temporal blocking strategy designed to explicitly utilize multicore characteristics. Especially for the case of Gauss-Seidel smoothers we show that simultaneous multi-threading (SMT) can yield substantial performance improvements for our optimized algorithm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.