Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD

Bapi Chatterjee; Vyacheslav Kungurtsev; Dan Alistarh

arXiv:2203.06638·cs.LG·March 16, 2022

Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD

Bapi Chatterjee, Vyacheslav Kungurtsev, Dan Alistarh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a decentralized shared-memory SGD algorithm that combines asynchronous updates, partial backpropagation, and in-place averaging, providing convergence guarantees and improved performance on image classification tasks.

Contribution

It presents a novel decentralized distributed-memory SGD method with convergence theory and practical efficiency improvements over existing approaches.

Findings

01

Proves ergodic convergence rates for non-convex objectives.

02

Achieves improved throughput on image classification benchmarks.

03

Maintains competitive accuracy on CIFAR-10, CIFAR-100, and ImageNet.

Abstract

Powered by the simplicity of lock-free asynchrony, Hogwilld! is a go-to approach to parallelize SGD over a shared-memory setting. Despite its popularity and concomitant extensions, such as PASSM+ wherein concurrent processes update a shared model with partitioned gradients, scaling it to decentralized workers has surprisingly been relatively unexplored. To our knowledge, there is no convergence theory of such methods, nor systematic numerical comparisons evaluating speed-up. In this paper, we propose an algorithm incorporating decentralized distributed memory computing architecture with each node running multiprocessing parallel shared-memory SGD itself. Our scheme is based on the following algorithmic tools and features: (a) asynchronous local gradient updates on the shared-memory of workers, (b) partial backpropagation, and (c) non-blocking in-place averaging of the local models. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bapi/lpp-sgd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Neural Network Applications

MethodsStochastic Gradient Descent