PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System
Steve Rhyner, Haocong Luo, Juan G\'omez-Luna, Mohammad Sadrosadati,, Jiawei Jiang, Ataberk Olgun, Harshita Gupta, Ce Zhang, Onur Mutlu

TL;DR
This paper evaluates the performance of distributed SGD algorithms on real-world Processing-In-Memory systems, demonstrating potential advantages over traditional architectures and highlighting the importance of algorithm-hardware co-design.
Contribution
It implements and assesses distributed SGD algorithms on a real PIM system, providing insights into their performance, scalability, and implications for future hardware-software co-design.
Findings
PIM can be a viable alternative to CPUs and GPUs for memory-bound ML workloads
Careful selection of algorithms is crucial for PIM efficiency
PIM systems do not scale linearly with node count in data-intensive tasks
Abstract
Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance. Processor-centric architectures (e.g., CPUs, GPUs) commonly used for modern ML training workloads based on SGD are bottlenecked by data movement between the processor and memory units due to the poor data locality in accessing large datasets. As a result, processor-centric architectures suffer from low performance and high energy consumption while executing ML training workloads. Processing-In-Memory (PIM) is a promising solution to alleviate the data movement bottleneck by placing the computation mechanisms inside or near memory. Our goal is to understand the capabilities of popular distributed SGD algorithms on real-world PIM systems to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems
MethodsStochastic Gradient Descent
