Inexact Gauss Seidel and Coarse Solvers for AMG and s-step CG
Stephen Thomas, Pasqua D'Ambra

TL;DR
This paper introduces a low-synchronization method using Forward Gauss--Seidel for Krylov methods, enabling scalable large-scale computations on GPUs and improving coarse-grid solves in AMG without dense matrix assembly.
Contribution
It presents a novel FGS-based approach that replaces traditional orthogonalization and dense coarse-grid solves, enhancing scalability and efficiency in large-scale Krylov and AMG methods.
Findings
FGS sweep is equivalent to MGS orthogonalization in the A-norm.
Scalability maintained with 20-30 FGS iterations on 64 GPUs for large problems.
Eliminates the need for dense coarse operator assembly in AMG.
Abstract
Communication-avoiding Krylov methods require solving small dense Gram systems at each outer iteration. We present a low-synchronization approach based on Forward Gauss--Seidel (FGS), which exploits the structure of Gram matrices arising from Chebyshev polynomial bases. We show that a single FGS sweep is mathematically equivalent to Modified Gram--Schmidt (MGS) orthogonalization in the -norm and provide corresponding backward error bounds. For weak scaling on AMD MI-series GPUs, we demonstrate that 20--30 FGS iterations preserve scalability up to 64 GPUs with problem sizes exceeding 700 million unknowns. We further extend this approach to Algebraic MultiGrid (AMG) coarse-grid solves, removing the need to assemble or factor dense coarse operators
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Model Reduction and Neural Networks
