At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads
Jens Domke, Emil Vatai, Balazs Gerofi, Yuetsu Kodama, Mohamed Wahib,, Artur Podobas, Sparsh Mittal, Miquel Peric\`as, Lingqi Zhang, Peng Chen,, Aleksandr Drozd, Satoshi Matsuoka

TL;DR
This paper investigates the potential performance gains of integrating high-capacity 3D-stacked SRAM caches into future HPC processors, demonstrating an average 9.56x boost for cache-sensitive applications through simulation and methodological analysis.
Contribution
It introduces a novel, memory subsystem-agnostic method to estimate performance upper bounds and models hypothetical 3D-stacked cache HPC processors to quantify potential improvements.
Findings
Average 9.56x performance boost for cache-sensitive HPC applications
Methodology to estimate performance upper bounds independent of memory subsystem
Simulation results from 1.5 nm 3D-stacked cache HPC processor models
Abstract
Over the last three decades, innovations in the memory subsystem were primarily targeted at overcoming the data movement bottleneck. In this paper, we focus on a specific market trend in memory technology: 3D-stacked memory and caches. We investigate the impact of extending the on-chip memory capabilities in future HPC-focused processors, particularly by 3D-stacked SRAM. First, we propose a method oblivious to the memory subsystem to gauge the upper-bound in performance improvements when data movement costs are eliminated. Then, using the gem5 simulator, we model two variants of a hypothetical LARge Cache processor (LARC), fabricated in 1.5 nm and enriched with high-capacity 3D-stacked cache. With a volume of experiments involving a broad set of proxy-applications and benchmarks, we aim to reveal how HPC CPU performance will evolve, and conclude an average boost of 9.56x for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Memory and Neural Computing
