Temporal Vectorization for Stencils
Liang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu and, Yue Yue

TL;DR
This paper introduces a novel temporal vectorization method for stencil computations, improving CPU performance by addressing data sharing conflicts and enabling efficient vectorization of both Jacobi and Gauss-Seidel stencils.
Contribution
It proposes a new temporal vectorization scheme that vectorizes in iteration space, reducing reorganizations and extending to Gauss-Seidel stencils, which are less studied.
Findings
Significant performance improvements demonstrated on various stencil types.
Applicable to both Jacobi and Gauss-Seidel stencils with minimal reorganizations.
Fixed number of reorganizations independent of vector length and stencil order.
Abstract
Stencil computations represent a very common class of nested loops in scientific and engineering applications. Exploiting vector units in modern CPUs is crucial to achieving peak performance. Previous vectorization approaches often consider the data space, in particular the innermost unit-strided loop. It leads to the well-known data alignment conflict problem that vector loads are overlapped due to the data sharing between continuous stencil computations. This paper proposes a novel temporal vectorization scheme for stencils. It vectorizes the stencil computation in the iteration space and assembles points with different time coordinates in one vector. The temporal vectorization leads to a small fixed number of vector reorganizations that is irrelevant to the vector length, stencil order, and dimension. Furthermore, it is also applicable to Gauss-Seidel stencils, whose vectorization is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
