Mapping Sparse Triangular Solves to GPUs via Fine-grained Domain Decomposition

Atharva Gondhalekar; Kjetil Haugen; Thomas Gibson; Wu-chun Feng

arXiv:2508.04917·cs.PF·August 8, 2025

Mapping Sparse Triangular Solves to GPUs via Fine-grained Domain Decomposition

Atharva Gondhalekar, Kjetil Haugen, Thomas Gibson, Wu-chun Feng

PDF

TL;DR

This paper introduces a fine-grained domain decomposition method to map sparse triangular solves onto GPUs, significantly improving parallelism and speed, especially for ILU0-preconditioned iterative solvers.

Contribution

It presents a novel domain decomposition strategy that enhances GPU performance for sparse triangular solves by increasing parallelism and reducing memory access irregularities.

Findings

01

Achieves 10.7× speedup for triangular solves

02

Attains 3.2× speedup for ILU0-preconditioned BiCGSTAB

03

Reduces irregular global memory accesses

Abstract

Sparse linear systems are typically solved using preconditioned iterative methods, but applying preconditioners via sparse triangular solves introduces bottlenecks due to irregular memory accesses and data dependencies. This work leverages fine-grained domain decomposition to adapt triangular solves to the GPU architecture. We develop a fine-grained domain decomposition strategy that generates non-overlapping subdomains, increasing parallelism in the application of preconditioner at the expense of a modest increase in the iteration count for convergence. Each subdomain is assigned to a thread block and is sized such that the subdomain vector fits in the GPU shared memory, eliminating the need for inter-block synchronization and reducing irregular global memory accesses. Compared to other state-of-the-art implementations using the ROCm $^{TM}$ software stack, we achieve a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.