TL;DR
This paper introduces a task-based QR algorithm with aggressive early deflation that significantly accelerates eigenvalue computations for dense matrices, leveraging dynamic task merging, GPU support, and optimized synchronization.
Contribution
It presents a novel task-based QR algorithm with improvements like dynamic merging and GPU support, outperforming traditional multi-threaded LAPACK and ScaLAPACK implementations.
Findings
Multiple times faster performance on CPU clusters
Effective GPU acceleration demonstrated
Reduced synchronization points and improved critical path
Abstract
The QR algorithm is one of the three phases in the process of computing the eigenvalues and the eigenvectors of a dense nonsymmetric matrix. This paper describes a task-based QR algorithm for reducing an upper Hessenberg matrix to real Schur form. The task-based algorithm also supports generalized eigenvalue problems (QZ algorithm) but this paper concentrates on the standard case. The task-based algorithm adopts previous algorithmic improvements, such as tightly-coupled multi-shifts and Aggressive Early Deflation (AED), and also incorporates several new ideas that significantly improve the performance. This includes, but is not limited to, the elimination of several synchronization points, the dynamic merging of previously separate computational steps, the shortening and the prioritization of the critical path, and experimental GPU support. The task-based implementation is demonstrated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
