A Structure-Aware Irregular Blocking Method for Sparse LU Factorization
Zhen Hu, Dongliang Xiong, Kai Huang, Changjun Wu, Xiaowen Jiang

TL;DR
This paper introduces a structure-aware irregular blocking method for sparse LU factorization that adapts block sizes based on local nonzero distribution, significantly improving computational efficiency on GPU architectures.
Contribution
It proposes a novel diagonal block-based feature and an irregular blocking strategy that balances workload by adjusting block sizes according to local matrix structure.
Findings
Achieves 1.50x and 3.32x speedup over PanguLU and SuperLU_DIST on a single GPU.
Achieves 1.40x and 3.84x speedup on 4 GPUs.
Effectively balances workload across blocks in sparse matrices.
Abstract
In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to workload imbalance across blocks. Besides, existing matrix features fail to guide us effectively in blocking. In this paper, we propose a structure-aware irregular blocking method for numerical factorization. A novel diagonal block-based feature is introduced to effectively characterize the local nonzero distribution of sparse matrices. Based on this, we further propose an irregular blocking method that adjusts block sizes according to the local distribution of nonzeros. The strategy utilizes fine-grained blocks in dense regions and coarse-grained blocks in sparse regions, adequately balancing the nonzeros of blocks both within the same level and across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Low-power high-performance VLSI design
