Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures

Hansheng Wang; Ruiyi Zhan; Dajun Huang; Xingchen Liu; Qiao Li; Hancong Duan; Dingwen Tao; Guangming Tan; Shaoshuai Zhang

arXiv:2511.16174·cs.MS·November 21, 2025

Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures

Hansheng Wang, Ruiyi Zhan, Dajun Huang, Xingchen Liu, Qiao Li, Hancong Duan, Dingwen Tao, Guangming Tan, Shaoshuai Zhang

PDF

Open Access

TL;DR

This paper introduces a pipelined two-stage eigenvalue decomposition algorithm optimized for multi-GPU architectures, significantly outperforming existing libraries in speed and scalability.

Contribution

The paper presents a novel pipelined algorithm for dense symmetric eigenvalue decomposition that achieves higher performance and scalability on multi-GPU systems.

Findings

01

Achieves mean speedups of 5.74× over cuSOLVERMp

02

Achieves mean speedups of 6.59× over MAGMA

03

Demonstrates better scalability on multi-GPU platforms

Abstract

Large symmetric eigenvalue problems are commonly observed in many disciplines such as Chemistry and Physics, and several libraries including cuSOLVERMp, MAGMA and ELPA support computing large eigenvalue decomposition on multi-GPU or multi-CPU-GPU hybrid architectures. However, these libraries do not provide satisfied performance that all of the libraries only utilize around 1.5\% of the peak multi-GPU performance. In this paper, we propose a pipelined two-stage eigenvalue decomposition algorithm instead of conventional subsequent algorithm with substantial optimizations. On an 8 $\times$ A100 platform, our implementation surpasses state-of-the-art cuSOLVERMp and MAGMA baselines, delivering mean speedups of 5.74 $\times$ and 6.59 $\times$ , with better strong and weak scalability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Model Reduction and Neural Networks