Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Ronald Babich, Michael A. Clark, B\'alint Jo\'o

TL;DR
This paper discusses the parallelization of the QUDA library for multi-GPU lattice QCD calculations, demonstrating significant scalability and performance improvements using MPI across up to 32 GPUs.
Contribution
It introduces a parallelization strategy for QUDA on multiple GPUs with MPI, enabling larger problem sizes and improved scalability in lattice QCD simulations.
Findings
Achieved over 4 Tflops performance on 32 GPUs
Demonstrated effective weak and strong scaling
Implemented communication-computation overlap strategies
Abstract
Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
