A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for   Heterogeneous GPU-CPU Clusters

Christian Feichtinger; Johannes Habich; Harald Koestler; Georg Hager,; Ulrich Ruede; Gerhard Wellein

arXiv:1007.1388·cs.DC·March 1, 2012

A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters

Christian Feichtinger, Johannes Habich, Harald Koestler, Georg Hager,, Ulrich Ruede, Gerhard Wellein

PDF

TL;DR

This paper presents a multi-GPU lattice Boltzmann flow solver within the WaLBerla framework, demonstrating near-perfect weak scalability on InfiniBand clusters and analyzing load balancing and performance in heterogeneous CPU-GPU environments.

Contribution

It introduces a flexible block-structured MPI parallelization for multi-GPU lattice Boltzmann simulations, optimizing load balancing and performance on heterogeneous CPU-GPU clusters.

Findings

01

Achieves nearly perfect weak scalability on InfiniBand clusters.

02

GPU implementation sustains kernel performance with manageable overhead.

03

Heterogeneous CPU-GPU simulations show effective weak scaling across different node configurations.

Abstract

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing and heterogeneous computations on CPUs and GPUs. The overhead required for multi-GPU simulations is discussed in detail and it is demonstrated that the kernel performance can be sustained to a large extent. With our GPU implementation, we achieve nearly perfect weak scalability on InfiniBand clusters. However, in strong scaling scenarios multi-GPUs make less efficient use of the hardware than IBM BG/P and x86 clusters. Hence, a cost analysis must determine the best course of action for a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.