An inherently parallel H2-ULV factorization for solving dense linear systems on GPUs
Qianxiang Ma, Rio Yokota

TL;DR
This paper introduces an H2-ULV factorization method that enables efficient, parallel dense matrix factorization on GPUs, overcoming traditional hierarchical matrix challenges like load imbalance and serialization.
Contribution
The paper presents a novel H2-ULV factorization technique that achieves linear complexity and inherent parallelism for dense matrix factorization on GPUs.
Findings
Achieves linear complexity in dense matrix factorization.
Removes dependency on trailing sub-matrices for parallelism.
Enables efficient GPU implementation of hierarchical matrix methods.
Abstract
Hierarchical low-rank approximation of dense matrices can reduce the complexity of their factorization from O(N^3) to O(N). However, the complex structure of such hierarchical matrices makes them difficult to parallelize. The block size and ranks can vary between the sub-blocks, which creates load imbalance. The dependency between the sub-blocks during factorization results in serialization. Since many sub-blocks are low-rank, their small computational load exposes the overhead of runtime systems. The combination of these factors makes it challenging to implement these methods on GPUs. In this work, we show that dense matrices can be factorized with linear complexity, while extracting the potential parallelism of GPUs. This is made possible through the H2-ULV factorization, which removes the dependency on trailing sub-matrices.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
