# The Future of Memory: Limits and Opportunities

**Authors:** Samuel Dayo, Shuhan Liu, Peijing Li, Philip Levis, Subhasish Mitra, Thierry Tambe, David Tennenhouse, and H.-S. Philip Wong

arXiv: 2508.20425 · 2025-09-24

## TL;DR

This paper challenges the idea of huge shared memory systems, proposing instead a design with smaller, tightly integrated compute-memory nodes that improve bandwidth and energy efficiency using advanced integration technologies.

## Contribution

It introduces a novel system architecture that breaks memory into smaller, local slices closely coupled with compute elements, leveraging 2.5D/3D integration for improved performance.

## Key findings

- Smaller, local memory slices reduce access costs.
- In-package memory improves bandwidth and energy efficiency.
- Explicit hardware memory management enables efficient data hierarchy.

## Abstract

Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs. We argue two practical engineering challenges, scaling and signaling, limit such designs. We propose the opposite approach. Rather than create large, shared, homogenous memories, systems explicitly break memory up into smaller slices more tightly coupled with compute elements. Leveraging advances in 2.5D/3D integration, this compute-memory node provisions private local memory, enabling accesses of node-exclusive data through micrometer-scale distances, and dramatically reduced access cost. In-package memory elements support shared state within a processor, providing far better bandwidth and energy-efficiency than DRAM, which is used as main memory for large working sets and cold data. Hardware making memory capacities and distances explicit allows software to efficiently compose this hierarchy, managing data placement and movement.

---
Source: https://tomesphere.com/paper/2508.20425