Adaptively restarted block Krylov subspace methods with low-synchronization skeletons
Kathryn Lund

TL;DR
This paper introduces adaptively restarted block Krylov subspace methods with low-synchronization skeletons, enhancing scalability and stability for high-performance computing applications like QR factorization on exascale systems.
Contribution
It transforms low-synchronization block Gram-Schmidt variants into block Arnoldi methods and develops an adaptive restarting heuristic to improve stability and performance.
Findings
Improved scalability of Krylov methods on exascale systems.
Enhanced stability with adaptive restarting heuristic.
Benchmarking shows competitive performance and accuracy.
Abstract
With the recent realization of exascale performace by Oak Ridge National Laboratory's Frontier supercomputer, reducing communication in kernels like QR factorization has become even more imperative. Low-synchronization Gram-Schmidt methods, first introduced in [K. \'{S}wirydowicz, J. Langou, S. Ananthan, U. Yang, and S. Thomas, Low Synchronization Gram-Schmidt and Generalized Minimum Residual Algorithms, Numer. Lin. Alg. Appl., Vol. 28(2), e2343, 2020], have been shown to improve the scalability of the Arnoldi method in high-performance distributed computing. Block versions of low-synchronization Gram-Schmidt show further potential for speeding up algorithms, as column-batching allows for maximizing cache usage with matrix-matrix operations. In this work, low-synchronization block Gram-Schmidt variants from [E. Carson, K. Lund, M. Rozlo\v{z}n\'{i}k, and S. Thomas, Block Gram-Schmidt…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Error Correcting Code Techniques · Stochastic Gradient Optimization Techniques
