Task Parallel Incomplete Cholesky Factorization using 2D Partitioned-Block Layout
Kyungjoo Kim, Sivasankaran Rajamanickam, George Stelle, H. Carter, Edwards, and Stephen L. Olivier

TL;DR
This paper presents a task-parallel incomplete Cholesky factorization algorithm using a 2D block layout, enabling efficient execution on manycore architectures with significant speedups demonstrated on Intel Xeon Phi.
Contribution
The paper introduces a novel task-parallel algorithm with a 2D block layout and a portable API for manycore platforms, improving performance over traditional methods.
Findings
Achieved 26.6x speedup over single-threaded implementation.
Demonstrated 19.2x speedup over serial Cholesky without tasking overhead.
Validated performance on Intel Sandybridge and Xeon Phi platforms.
Abstract
We introduce a task-parallel algorithm for sparse incomplete Cholesky factorization that utilizes a 2D sparse partitioned-block layout of a matrix. Our factorization algorithm follows the idea of algorithms-by-blocks by using the block layout. The algorithm-by-blocks approach induces a task graph for the factorization. These tasks are inter-related to each other through their data dependences in the factorization algorithm. To process the tasks on various manycore architectures in a portable manner, we also present a portable tasking API that incorporates different tasking backends and device-specific features using an open-source framework for manycore platforms i.e., Kokkos. A performance evaluation is presented on both Intel Sandybridge and Xeon Phi platforms for matrices from the University of Florida sparse matrix collection to illustrate merits of the proposed task-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems
