RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Nika Mansouri Ghiasi; Mohammad Sadrosadati; Geraldo F. Oliveira; Konstantinos Kanellopoulos; Rachata Ausavarungnirun; Juan G\'omez Luna; Jo\~ao Ferreira; Jeremie S. Kim; Christina Giannoula; Nandita Vijaykumar; Jisung Park; Onur Mutlu

arXiv:2210.08508·cs.AR·January 26, 2026·1 cites

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Nika Mansouri Ghiasi, Mohammad Sadrosadati, Geraldo F. Oliveira, Konstantinos Kanellopoulos, Rachata Ausavarungnirun, Juan G\'omez Luna, Jo\~ao Ferreira, Jeremie S. Kim, Christina Giannoula, Nandita Vijaykumar, Jisung Park, Onur Mutlu

PDF

Open Access

TL;DR

RevaMp3D is a novel processor and cache hierarchy design for monolithically integrated 3D memory and logic systems, significantly improving performance and energy efficiency by rethinking cache sharing, latency, and parallelism.

Contribution

It introduces five key architectural changes tailored for M3D systems, including cache removal, latency reduction, pipeline scaling, fine-grained synchronization, and memory-based instruction memoization.

Findings

01

Achieves 1.2x-2.9x speedup over state-of-the-art M3D systems.

02

Reduces energy consumption by 1.2x-1.4x.

03

Provides insights into latency-aware design choices for M3D architectures.

Abstract

Recent nano-technological advances enable the Monolithic 3D (M3D) integration of multiple memory and logic layers in a single chip, allowing for fine-grained connections between layers and significantly alleviating main memory bottlenecks. We show for a variety of workloads, on a state-of-the-art M3D-based system, that the performance and energy bottlenecks shift from main memory to the processor core and cache hierarchy. Therefore, there is a need to revisit current designs that have been conventionally tailored to tackle the memory bottleneck. Based on the insights from our design space exploration, we propose RevaMp3D, introducing five key changes. First, we propose removing the shared last-level cache, as this delivers speedups comparable to or exceeding those from increasing its size or reducing its latency across all workloads. Second, since improving L1 cache latency has a large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Advanced Memory and Neural Computing