Extracting the Potential of Emerging Hardware Accelerators for Symmetric   Eigenvalue Decomposition

Hansheng Wang; Lu Shi; Zhekai duan; Panruo Wu; Liwei Guo; Shaoshuai; Zhang

arXiv:2410.02170·cs.DC·October 4, 2024

Extracting the Potential of Emerging Hardware Accelerators for Symmetric Eigenvalue Decomposition

Hansheng Wang, Lu Shi, Zhekai duan, Panruo Wu, Liwei Guo, Shaoshuai, Zhang

PDF

Open Access

TL;DR

This paper analyzes the bottlenecks of symmetric eigenvalue decomposition on emerging hardware accelerators and proposes algorithmic optimizations that significantly improve performance, especially on GPUs like H100, A100, and RTX 4090.

Contribution

It identifies the memory-bound bottlenecks in conventional EVD algorithms and introduces optimized methods that leverage hardware features for better utilization and speedup.

Findings

01

Up to 10.1x speedup on H100 GPU

02

Up to 7.5x speedup on A100 GPU

03

Up to 4.1x overall EVD performance improvement

Abstract

Benefiting from the advancement of hardware accelerators such as GPUs, deep neural networks and scientific computing applications can achieve superior performance. Recently, the computing capacity of emerging hardware accelerators has increased rapidly, while memory bandwidth has not kept pace with this growth. This disparity exacerbates the gap between computing and memory, leading to inefficiencies on conventional algorithms, as they're likely to be converted from compute-bound to memory-bound. Symmetric eigenvalue decomposition (EVD), a critical operation in various research domains including scientific computing, deep learning training, and inference algorithms, exhibits suboptimal performance due to achieving less than 3\% hardware computing utilization on the H100 GPU. In this paper, we analyze the features of emerging hardware accelerators to identify the bottlenecks inherent in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques · Control Systems and Identification