FusionRCG: Orchestrating Recursive Computation Graphs across GPU Memory Hierarchies
Yihong Zhang, Xinran Wei, Junshi Chen, Fusong Ju, Wei Hu, Jinlong Yang, Huanhuan Xia

TL;DR
FusionRCG is a framework that optimizes recursive computation graphs on GPUs, reducing memory usage and significantly improving performance in quantum chemistry calculations.
Contribution
It introduces a novel approach combining graph orchestration, algebraic reduction, and multi-tier kernel architecture to enhance GPU efficiency for high-dimensional integrals.
Findings
Up to 3.09x speedup over existing GPU implementations.
Shrinks intermediate footprints by up to 7.7x.
Maintains 75% parallel efficiency at 64 GPUs.
Abstract
Evaluating high-dimensional integrals via deep hierarchical recurrences is a dominant cost in quantum chemistry. While CPUs manage these efficiently, GPUs suffer a critical mismatch: limited per-thread memory is quickly overwhelmed by an explosion of simultaneously live intermediate variables. As recurrence scales, this forces massive data spilling to global memory, collapsing performance into a severe memory-bound regime. We present FusionRCG, a framework that jointly optimizes computation graph structure and GPU memory mapping. Exploiting the inherent topological flexibility of recurrence graphs, using electron repulsion integrals as an example, we contribute: (1) liveness-aware graph orchestration to minimize peak live intermediates; (2) algebraic dimensionality reduction via stepwise Cartesian-to-spherical fusion, shrinking intermediate footprints by up to ; and (3) an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
