Parallel QR Factorization of Block Low-Rank Matrices
M. Ridwan Apriansyah, Rio Yokota

TL;DR
This paper introduces two novel parallel algorithms for QR factorization of Block Low-Rank matrices, demonstrating significant speedups and robustness over traditional dense QR methods on large matrices.
Contribution
The paper presents two new parallel Householder QR algorithms tailored for BLR matrices, improving efficiency and robustness compared to existing methods.
Findings
BLR-QR methods are over ten times faster than dense QR in MKL.
Parallel tiled BLR-QR achieves 50x speedup on 64-core CPU.
The proposed methods handle ill-conditioned matrices better than MGS-based approaches.
Abstract
We present two new algorithms for Householder QR factorization of Block Low-Rank (BLR) matrices: one that performs block-column-wise QR, and another that is based on tiled QR. We show how the block-column-wise algorithm exploits BLR structure to achieve arithmetic complexity of , while the tiled BLR-QR exhibits complexity. However, the tiled BLR-QR has finer task granularity that allows parallel task-based execution on shared memory systems. We compare the block-column-wise BLR-QR using fork-join parallelism with tiled BLR-QR using task-based parallelism. We also compare these two implementations of Householder BLR-QR with a block-column-wise Modified Gram-Schmidt (MGS) BLR-QR using fork-join parallelism, and a state-of-the-art vendor-optimized dense Householder QR in Intel MKL. For a matrix of size 131k 65k, all BLR methods are more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
