GSoFa: Scalable Sparse Symbolic LU Factorization on GPUs
Anil Gaihre, Xiaoye S. Li, Hang Liu

TL;DR
This paper presents gSoFa, a GPU-based symbolic LU factorization method that significantly accelerates sparse matrix decomposition, achieving up to 31x speedup and better memory efficiency compared to CPU approaches.
Contribution
gSoFa introduces the first GPU-optimized symbolic LU factorization algorithm with novel parallelization, workload balancing, and space reduction techniques for sparse matrices.
Findings
Up to 31x speedup on Summit supercomputers.
Outperforms state-of-the-art CPU methods by 5x on average.
Achieves 47% of peak memory throughput of V100 GPUs.
Abstract
Decomposing matrix A into a lower matrix L and an upper matrix U, which is also known as LU decomposition, is an essential operation in numerical linear algebra. For a sparse matrix, LU decomposition often introduces more nonzero entries in the L and U factors than in the original matrix. A symbolic factorization step is needed to identify the nonzero structures of L and U matrices. Attracted by the enormous potentials of the Graphics Processing Units (GPUs), an array of efforts have surged to deploy various LU factorization steps except for the symbolic factorization, to the best of our knowledge, on GPUs. This paper introduces gSoFa, the first GPU-based Symbolic factorization design with the following three optimizations to enable scalable LU symbolic factorization for nonsymmetric pattern sparse matrices on GPUs. First, we introduce a novel fine-grained parallel symbolic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Interconnection Networks and Systems
