An Efficient Vectorization Scheme for Stencil Computation

Kun Li; Liang Yuan; Yunquan Zhang; Yue Yue; Hang Cao; Pengqi Lu

arXiv:2103.08825·cs.DC·March 19, 2021

An Efficient Vectorization Scheme for Stencil Computation

Kun Li, Liang Yuan, Yunquan Zhang, Yue Yue, Hang Cao, Pengqi Lu

PDF

Open Access

TL;DR

This paper introduces a novel transpose layout and a time loop unroll-and-jam strategy to enhance vectorization and data locality in stencil computations, achieving improved performance on modern CPUs.

Contribution

A new transpose layout combined with a time loop unroll-and-jam technique that addresses data alignment and locality issues in stencil vectorization.

Findings

01

Achieves competitive performance on AVX-2 and AVX-512 CPUs.

02

Reduces data reorganization overhead in vectorized stencil computation.

03

Improves data reuse at register level through multistep computation.

Abstract

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data locality respectively. In this paper, the downsides of existing vectorization schemes are analyzed. Briefly, they either incur data alignment conflicts or hurt the data locality when integrated with tiling. Then we propose a novel transpose layout to preserve the data locality for tiling and reduce the data reorganization overhead for vectorization simultaneously. To further improve the data reuse at the register level, a time loop unroll-and-jam strategy is designed to perform multistep stencil computation along the time dimension. Experimental results on the AVX-2 and AVX-512 CPUs show that our approach obtains a competitive performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies