On Parallel Solution of Sparse Triangular Linear Systems in CUDA
Ruipeng Li

TL;DR
This paper presents new CUDA algorithms for efficiently solving sparse triangular linear systems, outperforming existing solvers by up to 2.6 times on structured and general sparse matrices.
Contribution
It introduces self-scheduling algorithms for parallel sparse triangular solves in CUDA, improving performance over existing level-scheduling methods.
Findings
CUDA algorithms outperform cuSPARSE solvers by up to 2.6x
Proposed methods are effective for both structured and general sparse matrices
Self-scheduling techniques enhance parallel efficiency in sparse triangular solves
Abstract
The acceleration of sparse matrix computations on modern many-core processors, such as the graphics processing units (GPUs), has been recognized and studied over a decade. Significant performance enhancements have been achieved for many sparse matrix computational kernels such as sparse matrix-vector products and sparse matrix-matrix products. Solving linear systems with sparse triangular structured matrices is another important sparse kernel as demanded by a variety of scientific and engineering applications such as sparse linear solvers. However, the development of efficient parallel algorithms in CUDA for solving sparse triangular linear systems remains a challenging task due to the inherently sequential nature of the computation. In this paper, we will revisit this problem by reviewing the existing level-scheduling methods and proposing algorithms with self-scheduling techniques.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
