An Empirical Study on Compressed Decentralized Stochastic Gradient Algorithms with Overparameterized Models
Arjun Ashok Rao, Hoi-To Wai

TL;DR
This paper empirically investigates how compressed decentralized stochastic gradient algorithms perform with overparameterized neural networks, revealing robustness in convergence rates and highlighting gaps between theory and practice.
Contribution
It provides the first empirical analysis of compressed DSG algorithms with overparameterized NNs, demonstrating their practical robustness across network sizes.
Findings
Convergence rates are robust to neural network size.
There is a gap between theoretical predictions and empirical performance.
Compressed DSG algorithms perform well in real-world network simulations.
Abstract
This paper considers decentralized optimization with application to machine learning on graphs. The growing size of neural network (NN) models has motivated prior works on decentralized stochastic gradient algorithms to incorporate communication compression. On the other hand, recent works have demonstrated the favorable convergence and generalization properties of overparameterized NNs. In this work, we present an empirical analysis on the performance of compressed decentralized stochastic gradient (DSG) algorithms with overparameterized NNs. Through simulations on an MPI network environment, we observe that the convergence rates of popular compressed DSG algorithms are robust to the size of NNs. Our findings suggest a gap between theories and practice of the compressed DSG algorithms in the existing literature.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Distributed Control Multi-Agent Systems
