TL;DR
This paper introduces a hardware-agnostic, high-performance implementation strategy for lattice Boltzmann simulations using C++17 Parallel Algorithms, enabling efficient execution on both CPUs and GPUs without vendor-specific code.
Contribution
It presents a novel, portable approach for LB simulations that achieves state-of-the-art performance across diverse many-core platforms using standard C++17 features.
Findings
Achieves comparable performance on CPUs and GPUs with a single codebase
Demonstrates versatility across six implementation schemes and nine collision models
Shows modern CPUs can narrow the performance gap with GPUs
Abstract
We present a novel, hardware-agnostic implementation strategy for lattice Boltzmann (LB) simulations, which yields massive performance on homogeneous and heterogeneous many-core platforms. Based solely on C++17 Parallel Algorithms, our approach does not rely on any language extensions, external libraries, vendor-specific code annotations, or pre-compilation steps. Thanks in particular to a recently proposed GPU back-end to C++17 Parallel Algorithms, it is shown that a single code can compile and reach state-of-the-art performance on both many-core CPU and GPU environments for the solution of a given non trivial fluid dynamics problem. The proposed strategy is tested with six different, commonly used implementation schemes to test the performance impact of memory access patterns on different platforms. Nine different LB collision models are included in the tests and exhibit good…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
