The Performance of Low-Synchronization Variants of Reorthogonalized Block Classical Gram--Schmidt

Erin Carson; Yuxin Ma

arXiv:2507.21791·cs.DC·July 30, 2025

The Performance of Low-Synchronization Variants of Reorthogonalized Block Classical Gram--Schmidt

Erin Carson, Yuxin Ma

PDF

TL;DR

This paper evaluates low-synchronization variants of the block classical Gram-Schmidt algorithm for distributed systems, demonstrating significant speedups and recommending the most stable variants for practical QR factorization.

Contribution

It provides a performance comparison of recent low-synchronization BCGS variants, highlighting the stability and efficiency of BCGSI+P-1S and BCGSI+P-2S in distributed environments.

Findings

01

BCGSI+P-1S achieves up to 4x speedup over classical BCGS.

02

BCGSI+P-2S achieves up to 2x speedup.

03

Both variants outperform less stable counterparts in stability and performance.

Abstract

Numerous applications, such as Krylov subspace solvers, make extensive use of the block classical Gram-Schmidt (BCGS) algorithm and its reorthogonalized variants for orthogonalizing a set of vectors. For large-scale problems in distributed memory settings, the communication cost, particularly the global synchronization cost, is a major performance bottleneck. In recent years, many low-synchronization BCGS variants have been proposed in an effort to reduce the number of synchronization points. The work [E. Carson, Y. Ma, arXiv preprint 2411.07077] recently proposed stable one-synchronization and two-synchronization variants of BCGS, i.e., BCGSI+P-1S and BCGSI+P-2S. In this work, we evaluate the performance of BCGSI+P-1S and BCGSI+P-2S on a distributed memory system compared to other well-known low-synchronization BCGS variants. In comparison to the classical reorthogonalized BCGS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.