A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems
Jacob Wahlgren, Gabin Schieffer, Maya Gokhale, Ivy Peng

TL;DR
This paper presents a quantitative method and tools for evaluating and optimizing disaggregated memory systems in HPC, demonstrating significant performance improvements and resource utilization benefits through case studies.
Contribution
It introduces a top-down quantitative approach, profiling tools, and evaluation methods for disaggregated memory in HPC, addressing misconceptions and demonstrating practical benefits.
Findings
Prefetching significantly affects memory traffic profiles.
Memory interference impacts applications variably based on access patterns.
Application and system-level benefits include 50% reduction in remote access and 13% speedup in BFS.
Abstract
Memory disaggregation has recently been adopted in data centers to improve resource utilization, motivated by cost and sustainability. Recent studies on large-scale HPC facilities have also highlighted memory underutilization. A promising and non-disruptive option for memory disaggregation is rack-scale memory pooling, where shared memory pools supplement node-local memory. This work outlines the prospects and requirements for adoption and clarifies several misconceptions. We propose a quantitative method for dissecting application requirements on the memory system from the top down in three levels, moving from general, to multi-tier memory systems, and then to memory pooling. We provide a multi-level profiling tool and LBench to facilitate the quantitative approach. We evaluate a set of representative HPC workloads on an emulated platform. Our results show that prefetching activities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
