Reuse Cache for Heterogeneous CPU-GPU Systems
Tejas Shah, Bobbi Yogatama, Kyle Roarty, Rami Dahman

TL;DR
This paper introduces a reuse cache for heterogeneous CPU-GPU systems that improves cache efficiency by storing only frequently reused data, achieving near-ideal IPC gains with significantly reduced area costs.
Contribution
The paper proposes a novel reuse cache design tailored for heterogeneous CPU-GPU systems, addressing their unique cache access patterns and improving efficiency.
Findings
Reuse cache achieves within 0.5% of IPC gains of a static partitioned LLC.
Reduces LLC area cost by an average of 40%.
Effectively handles different data access patterns in CPU-GPU systems.
Abstract
It is generally observed that the fraction of live lines in shared last-level caches (SLLC) is very small for chip multiprocessors (CMPs). This can be tackled using promotion-based replacement policies like re-reference interval prediction (RRIP) instead of LRU, dead-block predictors, or reuse-based cache allocation schemes. In GPU systems, similar LLC issues are alleviated using various cache bypassing techniques. These issues are worsened in heterogeneous CPU-GPU systems because the two processors have different data access patterns and frequencies. GPUs generally work on streaming data, but have many more threads accessing memory as compared to CPUs. As such, most traditional cache replacement and allocation policies prove ineffective due to the higher number of cache accesses in GPU applications, resulting in higher allocation for GPU cache lines, despite their minimal reuse. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques
