Exploiting long vectors with a CFD code: a co-design show case
Marc Blancafort, Roger Ferrer, Guillaume Houzeaux, Marta, Garcia-Gasulla, Filippo Mantovani

TL;DR
This paper presents an iterative methodology to optimize vectorization in CFD codes using autovectorization, achieving significant speedups on RISC-V hardware while maintaining portability across architectures.
Contribution
It introduces a detailed, iterative approach to enhance autovectorization efficiency in CFD applications, demonstrating substantial performance gains on RISC-V and portability to other architectures.
Findings
Single-core speedup of 7.6× on RISC-V
Methodology improves autovectorization efficiency
Performance benefits maintained across architectures
Abstract
A current trend in HPC systems is the utilization of architectures with SIMD or vector extensions to exploit data parallelism. There are several ways to take advantage of such modern vector architectures, each with a different impact on the code and its portability. For example, the use of intrinsics, guided vectorization via pragmas, or compiler autovectorization. Our objectives are to maximize vectorization efficiency and minimize code specialization. To achieve these objectives, we rely on compiler autovectorization. We leverage a set of hardware and software tools that allow us to analyze in detail where autovectorization is suboptimal. Thus, we apply an iterative methodology that allows us to incrementally improve the efficient use of the underlying hardware. In this paper, we apply this methodology to a CFD production code. We evaluate the performance on an innovative configurable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
