Federated Random Reshuffling with Compression and Variance Reduction
Grigory Malinovsky, Peter Richt\'arik

TL;DR
This paper introduces new federated learning algorithms based on Random Reshuffling, incorporating compression and variance reduction techniques to improve efficiency and robustness, with theoretical analysis and experimental validation.
Contribution
It proposes three novel algorithms that enhance FedRR with compression and variance reduction, overcoming previous limitations and providing the first analysis under standard assumptions.
Findings
Algorithms outperform baselines in experiments
Variance reduction eliminates dependence on compression parameters
Theoretical analysis applies to heterogeneous data
Abstract
Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Statistical Methods and Inference
MethodsStochastic Gradient Descent · Local SGD
