Reuse Cache for Heterogeneous CPU-GPU Systems

Tejas Shah; Bobbi Yogatama; Kyle Roarty; Rami Dahman

arXiv:2107.13649·cs.AR·July 30, 2021

Reuse Cache for Heterogeneous CPU-GPU Systems

Tejas Shah, Bobbi Yogatama, Kyle Roarty, Rami Dahman

PDF

Open Access

TL;DR

This paper introduces a reuse cache for heterogeneous CPU-GPU systems that improves cache efficiency by storing only frequently reused data, achieving near-ideal IPC gains with significantly reduced area costs.

Contribution

The paper proposes a novel reuse cache design tailored for heterogeneous CPU-GPU systems, addressing their unique cache access patterns and improving efficiency.

Findings

01

Reuse cache achieves within 0.5% of IPC gains of a static partitioned LLC.

02

Reduces LLC area cost by an average of 40%.

03

Effectively handles different data access patterns in CPU-GPU systems.

Abstract

It is generally observed that the fraction of live lines in shared last-level caches (SLLC) is very small for chip multiprocessors (CMPs). This can be tackled using promotion-based replacement policies like re-reference interval prediction (RRIP) instead of LRU, dead-block predictors, or reuse-based cache allocation schemes. In GPU systems, similar LLC issues are alleviated using various cache bypassing techniques. These issues are worsened in heterogeneous CPU-GPU systems because the two processors have different data access patterns and frequencies. GPUs generally work on streaming data, but have many more threads accessing memory as compared to CPUs. As such, most traditional cache replacement and allocation policies prove ineffective due to the higher number of cache accesses in GPU applications, resulting in higher allocation for GPU cache lines, despite their minimal reuse. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques