Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD
Bapi Chatterjee, Vyacheslav Kungurtsev, Dan Alistarh

TL;DR
This paper introduces a decentralized shared-memory SGD algorithm that combines asynchronous updates, partial backpropagation, and in-place averaging, providing convergence guarantees and improved performance on image classification tasks.
Contribution
It presents a novel decentralized distributed-memory SGD method with convergence theory and practical efficiency improvements over existing approaches.
Findings
Proves ergodic convergence rates for non-convex objectives.
Achieves improved throughput on image classification benchmarks.
Maintains competitive accuracy on CIFAR-10, CIFAR-100, and ImageNet.
Abstract
Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to approach to parallelize SGD over a shared-memory setting. Despite its popularity and concomitant extensions, such as PASSM+ wherein concurrent processes update a shared model with partitioned gradients, scaling it to decentralized workers has surprisingly been relatively unexplored. To our knowledge, there is no convergence theory of such methods, nor systematic numerical comparisons evaluating speed-up. In this paper, we propose an algorithm incorporating decentralized distributed memory computing architecture with each node running multiprocessing parallel shared-memory SGD itself. Our scheme is based on the following algorithmic tools and features: (a) asynchronous local gradient updates on the shared-memory of workers, (b) partial backpropagation, and (c) non-blocking in-place averaging of the local models. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
