Multicore-aware parallel temporal blocking of stencil codes for shared   and distributed memory

Markus Wittmann; Georg Hager; Gerhard Wellein

arXiv:0912.4506·cs.PF·March 1, 2012

Multicore-aware parallel temporal blocking of stencil codes for shared and distributed memory

Markus Wittmann, Georg Hager, Gerhard Wellein

PDF

TL;DR

This paper introduces a pipelined, multicore-aware temporal blocking algorithm for stencil codes that leverages shared caches and hybrid memory architectures to improve performance on bandwidth-limited multicore systems.

Contribution

It presents a novel pipelined approach to temporal blocking that explicitly utilizes shared caches and extends to hybrid shared/distributed-memory clusters.

Findings

01

Enhanced stencil code performance on multicore chips

02

Effective use of shared caches reduces memory bandwidth pressure

03

Successful application in hybrid shared/distributed-memory environments

Abstract

New algorithms and optimization techniques are needed to balance the accelerating trend towards bandwidth-starved multicore chips. It is well known that the performance of stencil codes can be improved by temporal blocking, lessening the pressure on the memory interface. We introduce a new pipelined approach that makes explicit use of shared caches in multicore environments and minimizes synchronization and boundary overhead. For clusters of shared-memory nodes we demonstrate how temporal blocking can be employed successfully in a hybrid shared/distributed-memory environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.