An Improved Framework of GPU Computing for CFD Applications on Structured Grids using OpenACC
Weicheng Xue, Charles W. Jackson, Christoper J. Roy

TL;DR
This paper presents an enhanced GPU computing framework for CFD applications on structured grids, achieving significant speedups by optimizing multi-GPU performance with OpenACC and MPI directives.
Contribution
The paper introduces new optimization techniques for multi-GPU CFD simulations, improving scalability and performance on modern GPU architectures.
Findings
30x speedup with 16 P100 GPUs over CPUs
70x speedup with 16 V100 GPUs over CPUs
Performance improvements from specific MPI and memory optimizations
Abstract
This paper is focused on improving multi-GPU performance of a research CFD code on structured grids. MPI and OpenACC directives are used to scale the code up to 16 GPUs. This paper shows that using 16 P100 GPUs and 16 V100 GPUs can be 30 and 70 faster than 16 Xeon CPU E5-2680v4 cores for three different test cases, respectively. A series of performance issues related to the scaling for the multi-block CFD code are addressed by applying various optimizations. Performance optimizations such as the pack/unpack message method, removing temporary arrays as arguments to procedure calls, allocating global memory for limiters and connected boundary data, reordering non-blocking MPI I\_send/I\_recv and Wait calls, reducing unnecessary implicit derived type member data movement between the host and the device and the use of GPUDirect can improve the compute utilization, memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
