Scaling Lattice QCD beyond 100 GPUs
R. Babich, M. A. Clark, B. Jo\'o, G. Shi, R. C. Brower, S. Gottlieb

TL;DR
This paper demonstrates the first successful strong scaling of lattice QCD gauge field generation beyond 100 GPUs using multi-dimensional parallelization and domain decomposition, enabling large-scale supercomputing simulations.
Contribution
It introduces a novel multi-dimensional parallelization strategy and domain-decomposed preconditioner for scaling gauge field generation in lattice QCD beyond 100 GPUs.
Findings
Achieved scaling with up to 256 GPUs on the Edge cluster.
Applied methods to Wilson-clover and improved staggered discretizations.
Enabled large-scale supercomputing for gauge field generation.
Abstract
Over the past five years, graphics processing units (GPUs) have had a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations in nuclear and particle physics. While GPUs have been applied with great success to the post-Monte Carlo "analysis" phase which accounts for a substantial fraction of the workload in a typical LQCD calculation, the initial Monte Carlo "gauge field generation" phase requires capability-level supercomputing, corresponding to O(100) GPUs or more. Such strong scaling has not been previously achieved. In this contribution, we demonstrate that using a multi-dimensional parallelization strategy and a domain-decomposed preconditioner allows us to scale into this regime. We present results for two popular discretizations of the Dirac operator, Wilson-clover and improved staggered, employing up to 256 GPUs on the Edge cluster at Lawrence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
