Maximizing the Bang Per Bit
M. A. Clark, Dean Howarth, Jiqun Tu, Mathias Wagner, Evan Weinberg

TL;DR
This paper discusses strategies to improve mixed-precision solvers for Lattice QCD computations, focusing on stability and precision enhancements through customized numerical formats to reduce memory traffic and accelerate calculations.
Contribution
It introduces customized numerical storage formats in QUDA that enhance precision and stability of mixed-precision conjugate gradient solvers for Lattice QCD.
Findings
Customized formats significantly improve solver precision.
Enhanced stability allows for more reliable mixed-precision computations.
Demonstrated improvements with BiCGStab(l) and multi-shift CG solvers.
Abstract
Reducing memory traffic is critical to accelerate Lattice QCD computations on modern processors, given that such computations are memory-bandwidth bound. A commonly used strategy is mixed-precision solvers, however, these require careful treatment to ensure stable convergence. We give an overview of the strategies employed in QUDA to stabilize mixed-precision variants of Conjugate Gradient (CG), and its multi-shift brethren. Through the use of customized numerical storage formats we can significantly improve upon the precision achievable compared to IEEE numerical formats, increasing both the solver precision and stability achievable at fixed word size. We give examples using BiCGStab(l) and multi-shift CG solvers using the HISQ operator.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
