Locality Optimized Unstructured Mesh Algorithms on GPUs
Andr\'as Attila Sulyok, G\'abor D\'aniel Balogh, Istv\'an Zolt\'an, Reguly, Gihan R. Mudalige

TL;DR
This paper introduces novel locality-aware optimizations for unstructured-mesh algorithms on GPUs, significantly improving performance by reducing data movement and enhancing memory locality, demonstrated through substantial speedups on NVIDIA GPUs.
Contribution
The paper presents new reordering and partitioning techniques that, combined with a two-layered coloring strategy, optimize unstructured-mesh algorithms for GPU execution, achieving notable speedups.
Findings
Speedups of 1.1 to 1.75 times over state-of-the-art methods.
Improved GPU occupancy and data reuse factors.
Enhanced understanding of performance bottlenecks.
Abstract
Unstructured-mesh based numerical algorithms such as finite volume and finite element algorithms form an important class of applications for many scientific and engineering domains. The key difficulty in achieving higher performance from these applications is the indirect accesses that lead to data-races when parallelized. Current methods for handling such data-races lead to reduced parallelism and suboptimal performance. Particularly on modern many-core architectures, such as GPUs, that has increasing core/thread counts, reducing data movement and exploiting memory locality is vital for gaining good performance. In this work we present novel locality-exploiting optimizations for the efficient execution of unstructured-mesh algorithms on GPUs. Building on a two-layered coloring strategy for handling data races, we introduce novel reordering and partitioning techniques to further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
