Accelerating Mixed-Precision Out-of-Core Cholesky Factorization with Static Task Scheduling
Jie Ren, Hatem Ltaief, Sameh Abdulah, David E. Keyes

TL;DR
This paper presents a static task scheduling approach to optimize out-of-core mixed-precision Cholesky factorization on multi-GPU systems, leveraging new interconnect technology for significant performance improvements.
Contribution
It introduces a static scheduling method for out-of-core mixed-precision Cholesky factorization that effectively overlaps data movement with computation on modern GPU architectures.
Findings
20% performance gain over cuSOLVER on single GH200
Almost linear scaling on four GH200 superchips
3X speedup with mixed-precision over FP64-only implementation
Abstract
This paper explores the performance optimization of out-of-core (OOC) Cholesky factorization on shared-memory systems equipped with multiple GPUs. We employ fine-grained computational tasks to expose concurrency while creating opportunities to overlap data movement asynchronously with computations, especially when dealing with matrices that cannot fit on the GPU memory. We leverage the directed acyclic graph of the task-based Cholesky factorization and map it onto a static scheduler that promotes data reuse while supporting strategies for reducing data movement with the CPU host when the GPU memory is exhausted. The CPU-GPU interconnect may become the main performance bottleneck as the gap between the GPU execution rate and the traditional PCIe bandwidth continues to widen. While the surface-to-volume effect of compute-bound kernels partially mitigates the overhead of data motion,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Distributed and Parallel Computing Systems · Advanced Data Compression Techniques
