Tearing Down the Memory Wall
Zaid Qureshi, Vikram Sharma Mailthody, Seung Won Min, I-Hsin Chung,, Jinjun Xiong, Wen-mei Hwu

TL;DR
The paper proposes the Erudite architecture that integrates high-density memory with programmable accelerators to scale compute and memory bandwidth together, aiming to eliminate the memory wall problem.
Contribution
It introduces a novel architecture combining memory and compute as first-class citizens, with scalable, high-throughput memory and accelerator communication.
Findings
Scales compute and memory bandwidth simultaneously
Enables high-throughput access to memory with overlapping requests
Facilitates communication between accelerators and remote memory
Abstract
We present a vision for the Erudite architecture that redefines the compute and memory abstractions such that memory bandwidth and capacity become first-class citizens along with compute throughput. In this architecture, we envision coupling a high-density, massively parallel memory technology like Flash with programmable near-data accelerators, like the streaming multiprocessors in modern GPUs. Each accelerator has a local pool of storage-class memory that it can access at high throughput by initiating very large numbers of overlapping requests that help to tolerate long access latency. The accelerators can also communicate with each other and remote memory through a high-throughput low-latency interconnect. As a result, systems based on the Erudite architecture scale compute and memory bandwidth at the same rate, tearing down the notorious memory wall that has plagued computer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
