Performance engineering for the Lattice Boltzmann method on GPGPUs: Architectural requirements and performance results
Johannes Habich, Christian Feichtinger, Harald K\"ostler and, Georg Hager, Gerhard Wellein

TL;DR
This paper presents advanced optimization strategies for implementing the lattice Boltzmann method on GPGPUs, focusing on memory bandwidth improvements, resulting in high-performance flow simulations.
Contribution
It introduces novel data layout and algorithmic rearrangements for efficient GPU implementation of the lattice Boltzmann method, achieving significant performance gains.
Findings
GPU implementation reaches 650 MLUPS in single precision
Achieves 290 MLUPS in double precision on NVIDIA Tesla C2070
Optimizations improve memory access and concurrency
Abstract
GPUs offer several times the floating point performance and memory bandwidth of current standard two socket CPU servers, e.g. NVIDIA C2070 vs. Intel Xeon Westmere X5650. The lattice Boltzmann method has been established as a flow solver in recent years and was one of the first flow solvers to be successfully ported and that performs well on GPUs. We demonstrate advanced optimization strategies for a D3Q19 lattice Boltzmann based incompressible flow solver for GPGPUs and CPUs based on NVIDIA CUDA and OpenCL. Since the implemented algorithm is limited by memory bandwidth, we concentrate on improving memory access. Basic data layout issues for optimal data access are explained and discussed. Furthermore, the algorithmic steps are rearranged to improve scattered access of the GPU memory. The importance of occupancy is discussed as well as optimization strategies to improve overall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLattice Boltzmann Simulation Studies · Aerosol Filtration and Electrostatic Precipitation · Generative Adversarial Networks and Image Synthesis
