Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
Hansheng Wang, Ruiyi Zhan, Dajun Huang, Xingchen Liu, Qiao Li, Hancong Duan, Dingwen Tao, Guangming Tan, Shaoshuai Zhang

TL;DR
This paper introduces a pipelined two-stage eigenvalue decomposition algorithm optimized for multi-GPU architectures, significantly outperforming existing libraries in speed and scalability.
Contribution
The paper presents a novel pipelined algorithm for dense symmetric eigenvalue decomposition that achieves higher performance and scalability on multi-GPU systems.
Findings
Achieves mean speedups of 5.74× over cuSOLVERMp
Achieves mean speedups of 6.59× over MAGMA
Demonstrates better scalability on multi-GPU platforms
Abstract
Large symmetric eigenvalue problems are commonly observed in many disciplines such as Chemistry and Physics, and several libraries including cuSOLVERMp, MAGMA and ELPA support computing large eigenvalue decomposition on multi-GPU or multi-CPU-GPU hybrid architectures. However, these libraries do not provide satisfied performance that all of the libraries only utilize around 1.5\% of the peak multi-GPU performance. In this paper, we propose a pipelined two-stage eigenvalue decomposition algorithm instead of conventional subsequent algorithm with substantial optimizations. On an 8A100 platform, our implementation surpasses state-of-the-art cuSOLVERMp and MAGMA baselines, delivering mean speedups of 5.74 and 6.59, with better strong and weak scalability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Model Reduction and Neural Networks
