A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
Christian Feichtinger, Johannes Habich, Harald Koestler, Georg Hager,, Ulrich Ruede, Gerhard Wellein

TL;DR
This paper presents a multi-GPU lattice Boltzmann flow solver within the WaLBerla framework, demonstrating near-perfect weak scalability on InfiniBand clusters and analyzing load balancing and performance in heterogeneous CPU-GPU environments.
Contribution
It introduces a flexible block-structured MPI parallelization for multi-GPU lattice Boltzmann simulations, optimizing load balancing and performance on heterogeneous CPU-GPU clusters.
Findings
Achieves nearly perfect weak scalability on InfiniBand clusters.
GPU implementation sustains kernel performance with manageable overhead.
Heterogeneous CPU-GPU simulations show effective weak scaling across different node configurations.
Abstract
Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail and it is demonstrated that the kernel performance can be sustained to a large extent. With our GPU implementation, we achieve nearly perfect weak scalability on InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost analysis must determine the best course of action for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
