PIM-Opt: Demystifying Distributed Optimization Algorithms on a   Real-World Processing-In-Memory System

Steve Rhyner; Haocong Luo; Juan G\'omez-Luna; Mohammad Sadrosadati,; Jiawei Jiang; Ataberk Olgun; Harshita Gupta; Ce Zhang; Onur Mutlu

arXiv:2404.07164·cs.AR·September 30, 2024·1 cites

PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System

Steve Rhyner, Haocong Luo, Juan G\'omez-Luna, Mohammad Sadrosadati,, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the performance of distributed SGD algorithms on real-world Processing-In-Memory systems, demonstrating potential advantages over traditional architectures and highlighting the importance of algorithm-hardware co-design.

Contribution

It implements and assesses distributed SGD algorithms on a real PIM system, providing insights into their performance, scalability, and implications for future hardware-software co-design.

Findings

01

PIM can be a viable alternative to CPUs and GPUs for memory-bound ML workloads

02

Careful selection of algorithms is crucial for PIM efficiency

03

PIM systems do not scale linearly with node count in data-intensive tasks

Abstract

Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance. Processor-centric architectures (e.g., CPUs, GPUs) commonly used for modern ML training workloads based on SGD are bottlenecked by data movement between the processor and memory units due to the poor data locality in accessing large datasets. As a result, processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck by placing the computation mechanisms inside or near memory. Our goal is to understand the capabilities of popular distributed SGD algorithms on real-world PIM systems to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CMU-SAFARI/PIM-Opt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems

MethodsStochastic Gradient Descent