Adaptive Step-Size Methods for Compressed SGD

Adarsh M. Subramaniam; Akshayaa Magesh; Venugopal V. Veeravalli

arXiv:2207.10046·stat.ML·July 21, 2022

Adaptive Step-Size Methods for Compressed SGD

Adarsh M. Subramaniam, Akshayaa Magesh, Venugopal V. Veeravalli

PDF

Open Access

TL;DR

This paper introduces an adaptive step-size method for compressed SGD that improves convergence and performance in distributed neural network training, addressing practical tuning issues and demonstrating superior results on standard datasets.

Contribution

We develop a novel adaptive step-size technique for compressed SGD that achieves order-optimal convergence rates and enhances empirical performance in neural network training.

Findings

01

The proposed method converges under convex, strong convex, and non-convex conditions.

02

Simulation shows the scaling technique prevents divergence in compressed SGD.

03

Experimental results outperform existing compressed SGD methods on neural networks.

Abstract

Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning. Existing compressed SGD algorithms assume the use of non-adaptive step-sizes(constant or diminishing) to provide theoretical convergence guarantees. Typically, the step-sizes are fine-tuned in practice to the dataset and the learning algorithm to provide good empirical performance. Such fine-tuning might be impractical in many learning scenarios, and it is therefore of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work on adaptive step-size methods for SGD to train neural networks efficiently in the uncompressed setting, we develop an adaptive step-size method for compressed SGD. In particular, we introduce a scaling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Concatenated Skip Connection · Dense Block · Max Pooling · Kaiming Initialization · Global Average Pooling · Dense Connections · Average Pooling · Softmax