Accelerating lattice QCD simulations with 2 flavours of staggered fermions on multiple GPUs using OpenACC - a first attempt
Sourendu Gupta, Pushan Majumdar

TL;DR
This paper demonstrates a significant speed-up in lattice QCD simulations with 2 flavors of staggered fermions by using multiple GPUs and OpenACC, highlighting the potential of directive-based programming for scientific computing.
Contribution
First attempt to accelerate lattice QCD simulations on multiple GPUs using OpenACC, achieving over three times speed-up without CUDA.
Findings
Over three times speed-up compared to CPU MPI implementation
Bandwidth bound nature limits parallelization gains
OpenACC enables effective GPU acceleration for lattice QCD
Abstract
We present the results of an effort to accelerate a Rational Hybrid Monte Carlo (RHMC) program for lattice quantum chromodynamics (QCD) simulation for 2 flavours of staggered fermions on multiple Kepler K20X GPUs distributed on different nodes of a Cray XC30. We do not use CUDA but adopt a higher level directive based programming approach using the OpenACC platform. The lattice QCD algorithm is known to be bandwidth bound; our timing results illustrate this clearly, and we discuss how this limits the parallelization gains. We achieve more than a factor three speed-up compared to the CPU only MPI program.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
