CMD: A Cache-assisted GPU Memory Deduplication Architecture
Wei Zhao, Dan Feng, Wei Tong, Xueliang Wei, Bing Wu

TL;DR
This paper introduces CMD, a cache-assisted GPU memory deduplication architecture that significantly reduces off-chip memory accesses, energy consumption, and improves performance by exploiting data duplication in GPU applications.
Contribution
The paper presents a novel GPU memory deduplication architecture with techniques to detect and manage data duplication, reducing off-chip accesses and enhancing GPU performance.
Findings
Reduces off-chip accesses by 31.01%
Decreases energy consumption by 32.78%
Improves GPU performance by 37.79%
Abstract
Massive off-chip accesses in GPUs are the main performance bottleneck, and we divided these accesses into three types: (1) Write, (2) Data-Read, and (3) Read-Only. Besides, We find that many writes are duplicate, and the duplication can be inter-dup and intra-dup. While inter-dup means different memory blocks are identical, and intra-dup means all the 4B elements in a line are the same. In this work, we propose a cache-assisted GPU memory deduplication architecture named CMD to reduce the off-chip accesses via utilizing the data duplication in GPU applications. CMD includes three key design contributions which aim to reduce the three kinds of accesses: (1) A novel GPU memory deduplication architecture that removes the inter-dup and inter-dup lines. As for the inter-dup detection, we reduce the extra read requests caused by the traditional read-verify hash process. Besides, we design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
