CMD: A Cache-assisted GPU Memory Deduplication Architecture

Wei Zhao; Dan Feng; Wei Tong; Xueliang Wei; Bing Wu

arXiv:2408.09483·cs.AR·August 20, 2024

CMD: A Cache-assisted GPU Memory Deduplication Architecture

Wei Zhao, Dan Feng, Wei Tong, Xueliang Wei, Bing Wu

PDF

Open Access

TL;DR

This paper introduces CMD, a cache-assisted GPU memory deduplication architecture that significantly reduces off-chip memory accesses, energy consumption, and improves performance by exploiting data duplication in GPU applications.

Contribution

The paper presents a novel GPU memory deduplication architecture with techniques to detect and manage data duplication, reducing off-chip accesses and enhancing GPU performance.

Findings

01

Reduces off-chip accesses by 31.01%

02

Decreases energy consumption by 32.78%

03

Improves GPU performance by 37.79%

Abstract

Massive off-chip accesses in GPUs are the main performance bottleneck, and we divided these accesses into three types: (1) Write, (2) Data-Read, and (3) Read-Only. Besides, We find that many writes are duplicate, and the duplication can be inter-dup and intra-dup. While inter-dup means different memory blocks are identical, and intra-dup means all the 4B elements in a line are the same. In this work, we propose a cache-assisted GPU memory deduplication architecture named CMD to reduce the off-chip accesses via utilizing the data duplication in GPU applications. CMD includes three key design contributions which aim to reduce the three kinds of accesses: (1) A novel GPU memory deduplication architecture that removes the inter-dup and inter-dup lines. As for the inter-dup detection, we reduce the extra read requests caused by the traditional read-verify hash process. Besides, we design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems