Multi-GPU thermal lattice Boltzmann simulations using OpenACC and MPI
Ao Xu, Bo-Tao Li

TL;DR
This paper evaluates a hybrid OpenACC and MPI approach for multi-GPU thermal lattice Boltzmann simulations, demonstrating high performance and scalability up to 16 GPUs.
Contribution
It introduces an optimized hybrid OpenACC and MPI method for efficient multi-GPU thermal lattice Boltzmann simulations, achieving near-maximum performance and excellent scalability.
Findings
Single GPU 2D simulation: 1.93 GLUPS
Single GPU 3D simulation: 1.04 GLUPS
Strong scaling with 16 GPUs: 30.42 GLUPS (2D), 14.52 GLUPS (3D)
Abstract
We assess the performance of the hybrid Open Accelerator (OpenACC) and Message Passing Interface (MPI) approach for multi-graphics processing units (GPUs) accelerated thermal lattice Boltzmann (LB) simulation. The OpenACC accelerates computation on a single GPU, and the MPI synchronizes the information between multiple GPUs. With a single GPU, the two-dimension (2D) simulation achieved 1.93 billion lattice updates per second (GLUPS) with a grid number of , and the three-dimension (3D) simulation achieved 1.04 GLUPS with a grid number of , which is more than 76% of the theoretical maximum performance. On multi-GPUs, we adopt block partitioning, overlapping communications with computations, and concurrent computation to optimize parallel efficiency. We show that in the strong scaling test, using 16 GPUs, the 2D simulation achieved 30.42 GLUPS and the 3D simulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLattice Boltzmann Simulation Studies · Caching and Content Delivery · Generative Adversarial Networks and Image Synthesis
