Multi-GPU Performance Optimization of a CFD Code using OpenACC on Different Platforms
Weicheng Xue, Christopher J. Roy

TL;DR
This paper explores multi-GPU performance optimization of a CFD code using OpenACC across various platforms, focusing on domain decomposition, communication strategies, and hardware-specific enhancements to improve scalability and efficiency.
Contribution
The paper introduces a set of performance optimizations, including data packing, variable-based data transfer, and GPUDirect, tailored for multi-GPU CFD simulations with significant scalability improvements.
Findings
Performance improved by a factor of 2 with data packing/unpacking.
Communication overhead reduced by variable-based data transfer.
GPUDirect enhances GPU-to-GPU communication efficiency.
Abstract
This paper investigates the multi-GPU performance of a 3D buoyancy driven cavity solver using MPI and OpenACC directives on different platforms. The paper shows that decomposing the total problem in different dimensions affects the strong scaling performance significantly for the GPU. Without proper performance optimizations, it is shown that 1D domain decomposition scales poorly on multiple GPUs due to the noncontiguous memory access. The performance using whatever decompositions can be benefited from a series of performance optimizations in the paper. Since the buoyancy driven cavity code is latency-bounded on the clusters examined, a series of optimizations both agnostic and tailored to the platforms are designed to reduce the latency cost and improve memory throughput between hosts and devices efficiently. First, the parallel message packing/unpacking strategy developed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Underwater Vehicles and Communication Systems · Parallel Computing and Optimization Techniques
