Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

Weile Luo; Ruibo Fan; Zeyu Li; Dayou Du; Qiang Wang; Xiaowen Chu

arXiv:2402.13499·cs.AR·February 22, 2024·1 cites

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

Weile Luo, Ruibo Fan, Zeyu Li, Dayou Du, Qiang Wang, Xiaowen Chu

PDF

Open Access

TL;DR

This paper provides a comprehensive benchmarking and analysis of Nvidia's Hopper GPU architecture, revealing its microarchitectural features, new instruction sets, and AI hardware units to aid software optimization.

Contribution

First detailed microbenchmarking study of Nvidia Hopper GPU, focusing on its novel features, instruction sets, and tensor core performance.

Findings

01

Hopper GPUs introduce new tensor cores supporting FP8 and DPX instructions.

02

Benchmarking reveals performance characteristics of Hopper's new shared memory and instruction sets.

03

Insights into Hopper's AI hardware units facilitate software optimization.

Abstract

Graphics processing units (GPUs) are continually evolving to cater to the computational demands of contemporary general-purpose workloads, particularly those driven by artificial intelligence (AI) utilizing deep learning techniques. A substantial body of studies have been dedicated to dissecting the microarchitectural metrics characterizing diverse GPU generations, which helps researchers understand the hardware details and leverage them to optimize the GPU programs. However, the latest Hopper GPUs present a set of novel attributes, including new tensor cores supporting FP8, DPX, and distributed shared memory. Their details still remain mysterious in terms of performance and operational characteristics. In this research, we propose an extensive benchmarking study focused on the Hopper GPU. The objective is to unveil its microarchitectural intricacies through an examination of the new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems