PIM-DRAM: Accelerating Machine Learning Workloads using Processing in Commodity DRAM
Sourjya Roy, Mustafa Ali, Anand Raghunathan

TL;DR
This paper introduces a novel DRAM-based processing-in-memory architecture with a new multiplication primitive that significantly accelerates machine learning workloads, achieving up to 19.5x speedup over traditional GPU systems.
Contribution
It proposes a DRAM-based PIM architecture with a new multiplication primitive that requires minimal area overhead and no changes to existing DRAM peripherals.
Findings
Achieves up to 19.5x speedup over GPU-based systems.
Adds less than 1% area overhead with no DRAM peripheral modifications.
Effectively accelerates DNN workloads like AlexNet, VGG16, ResNet18.
Abstract
Deep Neural Networks (DNNs) have transformed the field of machine learning and are widely deployed in many applications involving image, video, speech and natural language processing. The increasing compute demands of DNNs have been widely addressed through Graphics Processing Units (GPUs) and specialized accelerators. However, as model sizes grow, these von Neumann architectures require very high memory bandwidth to keep the processing elements utilized as a majority of the data resides in the main memory. Processing in memory has been proposed as a promising solution for the memory wall bottleneck for ML workloads. In this work, we propose a new DRAM-based processing-in-memory (PIM) multiplication primitive coupled with intra-bank accumulation to accelerate matrix vector operations in ML workloads. The proposed multiplication primitive adds < 1% area overhead and does not require any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
