An Experimental Exploration of In-Memory Computing for Multi-Layer Perceptrons
Pedro Carrinho, Hamid Moghadaspour, Oscar Ferraz, Jo\~ao Dinis Ferreira, Yann Falevoz, Vitor Silva, and Gabriel Falcao

TL;DR
This paper investigates the use of modern processing-in-memory (PiM) architectures, specifically UPMEM, to accelerate multi-layer perceptrons, demonstrating significant performance improvements over traditional CPU implementations and competitive inference times with low-power GPUs.
Contribution
It provides the first real-world evaluation of general-purpose PiM architecture for neural network inference, showing substantial speedups and efficiency insights.
Findings
UPMEM PiM achieves up to 259x performance improvement over CPU for large batch inference.
Using UPMEM's SRAM (WRAM) reduces inference times to under 3 ms, comparable to low-power GPUs.
The study offers practical insights into PiM's potential for neural network acceleration.
Abstract
In modern computer architectures, the performance of many memory-bound workloads (e.g., machine learning, graph processing, databases) is limited by the data movement bottleneck that emerges when transferring large amounts of data between the main memory and the central processing unit (CPU). Processing-in-memory is an emerging computing paradigm that aims to alleviate this data movement bottleneck by performing computation close to or within the memory units, where data resides. One example of a prevalent workload whose performance is bound by the data movement bottleneck is the training and inference process of artificial neural networks. In this work, we analyze the potential of modern general-purpose PiM architectures to accelerate neural networks. To this end, we selected the UPMEM PiM system, the first commercially available real-world general-purpose PiM architecture. We compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Parallel Computing and Optimization Techniques
