Fast Algorithms for Scheduling Many-body Correlation Functions on Accelerators
Oguz Selvitopi, Emin Ozturk, Jie Chen, Ponnuswamy Sadayappan, Robert G. Edwards, Ayd{\i}n Bulu\c{c}

TL;DR
This paper introduces two novel scheduling algorithms that optimize tensor contraction sequences in Lattice QCD simulations on GPUs, significantly reducing memory usage and computation time.
Contribution
The paper presents new scheduling algorithms tailored for binary tensor contractions in LQCD, enhancing performance by exploiting application-specific features.
Findings
Up to 2.1x reduction in peak memory usage
Up to 4.2x fewer evictions
Up to 1.9x faster computation time
Abstract
Computation of correlation functions is a key operation in Lattice quantum chromodynamics (LQCD) simulations to extract nuclear physics observables. These functions involve many binary batch tensor contractions, each tensor possibly occupying hundreds of MBs of memory. Performing these contractions on GPU accelerators poses the challenge of scheduling them as to optimize tensor reuse and reduce data traffic. In this work we propose two fast novel scheduling algorithms that reorder contractions to increase temporal locality via input/intermediate tensor reuse. Our schedulers take advantage of application-specific features, such as contractions being binary and locality within contraction trees, to optimize the objective of minimizing peak memory. We integrate them into the LQCD analysis software suite Redstar and improve time-to-solution. Our schedulers attain upto 2.1x improvement in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Quantum Computing Algorithms and Architecture · Tensor decomposition and applications
