A New High Performance and Scalable SVD algorithm on Distributed Memory Systems
Shengguo Li, Jie Liu, Yunfei Du

TL;DR
This paper presents a scalable, high-performance implementation of the Zolo-SVD algorithm on distributed memory systems, outperforming existing methods in speed and scalability, demonstrated through extensive experiments on supercomputers.
Contribution
The paper introduces a portable, scalable Zolo-SVD implementation based on Zolo-PD, offering improved speed and parallelization over existing PD algorithms like QDWH-PD.
Findings
Zolo-SVD is about twice as fast as ScaLAPACK PDGESVD.
Zolo-PD outperforms QDWH-PD by 20% on many processes.
Implementation is portable and tested on Tianhe-2 supercomputer.
Abstract
This paper introduces a high performance implementation of \texttt{Zolo-SVD} algorithm on distributed memory systems, which is based on the polar decomposition (PD) algorithm via the Zolotarev's function (\texttt{Zolo-PD}), originally proposed by Nakatsukasa and Freund [SIAM Review, 2016]. Our implementation highly relies on the routines of ScaLAPACK and therefore it is portable. Compared with the other PD algorithms such as the QR-based dynamically weighted Halley method (\texttt{QDWH-PD}), \texttt{Zolo-PD} is naturally parallelizable and has better scalability though performs more floating-point operations. When using many processes, \texttt{Zolo-PD} is usually 1.20 times faster than \texttt{QDWH-PD} algorithm, and \texttt{Zolo-SVD} can be about two times faster than the ScaLAPACK routine \texttt{\texttt{PDGESVD}}. These numerical experiments are performed on Tianhe-2 supercomputer,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMatrix Theory and Algorithms · Parallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques
