Enabling full-speed random access to the entire memory on the A100 GPU

Alden Walker

arXiv:2405.11425·cs.PF·May 21, 2024

Enabling full-speed random access to the entire memory on the A100 GPU

Alden Walker

PDF

Open Access

TL;DR

This paper reveals the A100 GPU's memory architecture details and introduces a method to achieve full-speed random access to the entire memory by avoiding TLB issues, with certain access constraints.

Contribution

It presents a novel reverse-engineering technique of the A100 memory layout and a method to enable full-speed random memory access on the GPU.

Findings

01

Achieved full-speed random access to the entire A100 memory

02

Developed a reverse-engineering approach for GPU memory layout

03

Demonstrated access optimization under 64GB window constraint

Abstract

We describe some features of the A100 memory architecture. In particular, we give a technique to reverse-engineer some hardware layout information. Using this information, we show how to avoid TLB issues to obtain full-speed random HBM access to the entire memory, as long as we constrain any particular thread to a reduced access window of less than 64GB.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Algorithms and Data Compression · Advanced Data Storage Technologies