Extracting the Potential of Emerging Hardware Accelerators for Symmetric Eigenvalue Decomposition
Hansheng Wang, Lu Shi, Zhekai duan, Panruo Wu, Liwei Guo, Shaoshuai, Zhang

TL;DR
This paper analyzes the bottlenecks of symmetric eigenvalue decomposition on emerging hardware accelerators and proposes algorithmic optimizations that significantly improve performance, especially on GPUs like H100, A100, and RTX 4090.
Contribution
It identifies the memory-bound bottlenecks in conventional EVD algorithms and introduces optimized methods that leverage hardware features for better utilization and speedup.
Findings
Up to 10.1x speedup on H100 GPU
Up to 7.5x speedup on A100 GPU
Up to 4.1x overall EVD performance improvement
Abstract
Benefiting from the advancement of hardware accelerators such as GPUs, deep neural networks and scientific computing applications can achieve superior performance. Recently, the computing capacity of emerging hardware accelerators has increased rapidly, while memory bandwidth has not kept pace with this growth. This disparity exacerbates the gap between computing and memory, leading to inefficiencies on conventional algorithms, as they're likely to be converted from compute-bound to memory-bound. Symmetric eigenvalue decomposition (EVD), a critical operation in various research domains including scientific computing, deep learning training, and inference algorithms, exhibits suboptimal performance due to achieving less than 3\% hardware computing utilization on the H100 GPU. In this paper, we analyze the features of emerging hardware accelerators to identify the bottlenecks inherent in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Control Systems and Identification
