Anatomy of High-Performance Column-Pivoted QR Decomposition
Maksim Melnichenko, Riley Murray, William Killian, James Demmel, Michael W. Mahoney, Piotr Luszczek, Mark Gates

TL;DR
This paper presents a flexible framework for efficient QR factorization with column pivoting, optimized for modern hardware, and demonstrates significant performance improvements over existing algorithms.
Contribution
It introduces a comprehensive, hardware-aware framework for QRCP, enabling practical, high-performance algorithms with open-source implementation and extensive empirical validation.
Findings
Achieves up to 100x speedup over LAPACK QRCP on CPU
Surpasses current state-of-the-art randomized QRCP algorithms in performance
Attains 65% of cuSOLVER's unpivoted QR performance on GPU
Abstract
We introduce an algorithmic framework for performing QR factorization with column pivoting (QRCP) on general matrices. The framework enables the design of practical QRCP algorithms through user-controlled choices for the core subroutines. We provide a comprehensive overview of how to navigate these choices on modern hardware platforms, offering detailed descriptions of alternative methods for both CPUs and GPUs. The practical QRCP algorithms developed within this framework are implemented as part of the open-source RandLAPACK library. Our empirical evaluation demonstrates that, on a dual AMD EPYC 9734 system, the proposed method achieves performance improvements of up to two orders of magnitude over LAPACK's standard QRCP routine and greatly surpasses the performance of the current state-of-the-art randomized QRCP algorithm. Additionally, on an NVIDIA H100 GPU, our method attains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques
