Enabling full-speed random access to the entire memory on the A100 GPU
Alden Walker

TL;DR
This paper reveals the A100 GPU's memory architecture details and introduces a method to achieve full-speed random access to the entire memory by avoiding TLB issues, with certain access constraints.
Contribution
It presents a novel reverse-engineering technique of the A100 memory layout and a method to enable full-speed random memory access on the GPU.
Findings
Achieved full-speed random access to the entire A100 memory
Developed a reverse-engineering approach for GPU memory layout
Demonstrated access optimization under 64GB window constraint
Abstract
We describe some features of the A100 memory architecture. In particular, we give a technique to reverse-engineer some hardware layout information. Using this information, we show how to avoid TLB issues to obtain full-speed random HBM access to the entire memory, as long as we constrain any particular thread to a reduced access window of less than 64GB.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Algorithms and Data Compression · Advanced Data Storage Technologies
