Exploring DRAM Cache Prefetching for Pooled Memory
Chandrahas Tirumalasetty, Narasimha Annapreddy

TL;DR
This paper introduces a DRAM cache prefetching system for pooled memory architectures like CXL, aiming to reduce latency and improve application performance through optimized prefetching and bandwidth management strategies.
Contribution
It proposes a novel DRAM cache prefetching mechanism for fabric attached memory and introduces optimizations to mitigate bandwidth contention effects.
Findings
7% IPC improvement from DRAM cache prefetching
Additional optimizations increase IPC by 7-10%
Effective in single and multi-node configurations
Abstract
Hardware based memory pooling enabled by interconnect standards like CXL have been gaining popularity amongst cloud providers and system integrators. While pooling memory resources has cost benefits, it comes at a penalty of increased memory access latency. With yet another addition to the memory hierarchy, local DRAM can be potentially used as a block cache(DRAM Cache) for fabric attached memory(FAM) and data prefetching techniques can be used to hide the FAM access latency. This paper proposes a system for prefetching sub-page blocks from FAM into DRAM cache for improving the data access latency and application performance. We further optimize our DRAM cache prefetch mechanism through enhancements that mitigate the performance degradation due to bandwidth contention at FAM. We consider the potential for providing additional functionality at the CXL-memory node through weighted fair…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems
