Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures
M\'onica Chillar\'on, Gregorio Quintana-Ort\'i, Vicente Vidal,, Per-Gunnar Martinsson

TL;DR
This paper introduces new techniques for solving large, rank-deficient linear least squares problems efficiently on shared-memory CPU and GPU architectures, even when data exceeds main memory capacity.
Contribution
It presents novel methods based on complete orthogonal decompositions and the randUTV algorithm for large-scale least squares problems that operate on disk-stored data.
Findings
Methods are competitive with state-of-the-art in-memory solutions.
Techniques effectively handle data exceeding main memory.
GPU and CPU implementations show strong performance.
Abstract
Solving very large linear systems of equations is a key computational task in science and technology. In many cases, the coefficient matrix of the linear system is rank-deficient, leading to systems that may be underdetermined, inconsistent, or both. In such cases, one generally seeks to compute the least squares solution that minimizes the residual of the problem, which can be further defined as the solution with smallest norm in cases where the coefficient matrix has a nontrivial nullspace. This work presents several new techniques for solving least squares problems involving coefficient matrices that are so large that they do not fit in main memory. The implementations include both CPU and GPU variants. All techniques rely on complete orthogonal decompositions that guarantee that both conditions of a least squares solution are met, regardless of the rank properties of the matrix.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical and numerical algorithms
