Rank k Cholesky Up/Down-dating on the GPU: gpucholmodV0.2
Christian Walder

TL;DR
This paper presents a GPU-based algorithm for rank-k Cholesky updates and downdates, achieving significant speedups over CPU implementations for large matrices by leveraging CUDA and GPU parallelism.
Contribution
The authors introduce a GPU-accelerated method for rank-k Cholesky modifications that reduces computation time and scales efficiently with matrix size, using CUDA and multiple kernels.
Findings
Achieves around 7x speedup for 5000x5000 matrices with k=16.
Handles larger problems due to O(n) GPU memory scaling.
Limited speedups due to bandwidth-bound nature of the problem.
Abstract
In this note we briefly describe our Cholesky modification algorithm for streaming multiprocessor architectures. Our implementation is available in C++ with Matlab binding, using CUDA to utilise the graphics processing unit (GPU). Limited speed ups are possible due to the bandwidth bound nature of the problem. Furthermore, a complex dependency pattern must be obeyed, requiring multiple kernels to be launched. Nonetheless, this makes for an interesting problem, and our approach can reduce the computation time by a factor of around 7 for matrices of size 5000 by 5000 and k=16, in comparison with the LINPACK suite running on a CPU of comparable vintage. Much larger problems can be handled however due to the O(n) scaling in required GPU memory of our method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Matrix Theory and Algorithms · Tensor decomposition and applications
