A Structure-Aware Irregular Blocking Method for Sparse LU Factorization

Zhen Hu; Dongliang Xiong; Kai Huang; Changjun Wu; Xiaowen Jiang

arXiv:2512.04389·cs.DC·December 5, 2025

A Structure-Aware Irregular Blocking Method for Sparse LU Factorization

Zhen Hu, Dongliang Xiong, Kai Huang, Changjun Wu, Xiaowen Jiang

PDF

Open Access

TL;DR

This paper introduces a structure-aware irregular blocking method for sparse LU factorization that adapts block sizes based on local nonzero distribution, significantly improving computational efficiency on GPU architectures.

Contribution

It proposes a novel diagonal block-based feature and an irregular blocking strategy that balances workload by adjusting block sizes according to local matrix structure.

Findings

01

Achieves 1.50x and 3.32x speedup over PanguLU and SuperLU_DIST on a single GPU.

02

Achieves 1.40x and 3.84x speedup on 4 GPUs.

03

Effectively balances workload across blocks in sparse matrices.

Abstract

In sparse LU factorization, nonzero elements after symbolic factorization tend to distribute in diagonal and right-bottom region of sparse matrices. However, regular 2D blocking on this non-uniform distribution structure may lead to workload imbalance across blocks. Besides, existing matrix features fail to guide us effectively in blocking. In this paper, we propose a structure-aware irregular blocking method for numerical factorization. A novel diagonal block-based feature is introduced to effectively characterize the local nonzero distribution of sparse matrices. Based on this, we further propose an irregular blocking method that adjusts block sizes according to the local distribution of nonzeros. The strategy utilizes fine-grained blocks in dense regions and coarse-grained blocks in sparse regions, adequately balancing the nonzeros of blocks both within the same level and across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Low-power high-performance VLSI design