Two-link Staggered Quark Smearing in QUDA
Steven Gottlieb, Hwancheol Jeong, Alexei Strelchenko

TL;DR
This paper introduces a two-link staggered quark smearing method implemented in QUDA, significantly improving computational efficiency for hadronic correlator measurements on modern GPUs.
Contribution
It presents a novel two-link staggered quark smearing technique in QUDA, optimized for GPU architectures, with substantial performance improvements over previous methods.
Findings
Reduces smearing time for baryon correlators by 100-120 times.
Decreases overall measurement time by 60-70%.
Demonstrates scalability on NVIDIA A100 and AMD MI250X GPUs.
Abstract
Gauge covariant smearing based on the 3D lattice Laplacian can be used to create extended operators that have better overlap with hadronic ground states. For staggered quarks, we make use of two-link parallel transport to preserve taste properties. We have implemented the procedure in QUDA. We present the performance of this code on the NVIDIA A100 GPUs in Indiana University's Big Red 200 supercomputer and on the AMD MI250X GPUs in Oak Ridge Leadership Computer Facility's (OLCF's) Crusher and discuss its scalability. We also study the performance improvement from using NVSHMEM on OLCF's Summit. Reusing precomputed two-link products for all sources and sinks, it reduces the total smearing time for a baryon correlator measurement by a factor of 100-120 as compared with the original MILC code and reduces the overall time by 60-70%.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParticle physics theoretical and experimental studies · High-Energy Particle Collisions Research · Quantum Chromodynamics and Particle Interactions
