GPU-Accelerated Cholesky Factorization of Block Tridiagonal Matrices

Roland Schwan; Daniel Kuhn; Colin N. Jones

arXiv:2601.03754·math.OC·January 8, 2026

GPU-Accelerated Cholesky Factorization of Block Tridiagonal Matrices

Roland Schwan, Daniel Kuhn, Colin N. Jones

PDF

Open Access

TL;DR

This paper introduces a GPU-accelerated framework for efficiently solving block tridiagonal linear systems using a novel permutation strategy and parallel implementation, significantly outperforming existing solvers especially for long-horizon problems.

Contribution

The paper presents a new GPU-based algorithm for block tridiagonal systems that reduces complexity and achieves high speedups, enabling real-time applications.

Findings

01

Speedups exceeding 100x over QDLDL

02

25x faster than optimized CPU implementation

03

Over 2x faster than NVIDIA CUDSS

Abstract

This paper presents a GPU-accelerated framework for solving block tridiagonal linear systems that arise naturally in numerous real-time applications across engineering and scientific computing. Through a multi-stage permutation strategy based on nested dissection, we reduce the computational complexity from $O (N n^{3})$ for sequential Cholesky factorization to $O (lo g_{2} (N) n^{3})$ when sufficient parallel resources are available, where $n$ is the block size and $N$ is the number of blocks. The algorithm is implemented using NVIDIA's Warp library and CUDA to exploit parallelism at multiple levels within the factorization algorithm. Our implementation achieves speedups exceeding 100x compared to the sparse solver QDLDL, 25x compared to a highly optimized CPU implementation using BLASFEO, and more than 2x compared to NVIDIA's CUDSS library. The logarithmic scaling with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems