Federated Random Reshuffling with Compression and Variance Reduction

Grigory Malinovsky; Peter Richt\'arik

arXiv:2205.03914·cs.LG·May 11, 2022

Federated Random Reshuffling with Compression and Variance Reduction

Grigory Malinovsky, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper introduces new federated learning algorithms based on Random Reshuffling, incorporating compression and variance reduction techniques to improve efficiency and robustness, with theoretical analysis and experimental validation.

Contribution

It proposes three novel algorithms that enhance FedRR with compression and variance reduction, overcoming previous limitations and providing the first analysis under standard assumptions.

Findings

01

Algorithms outperform baselines in experiments

02

Variance reduction eliminates dependence on compression parameters

03

Theoretical analysis applies to heterogeneous data

Abstract

Random Reshuffling (RR), which is a variant of Stochastic Gradient Descent (SGD) employing sampling without replacement, is an immensely popular method for training supervised machine learning models via empirical risk minimization. Due to its superior practical performance, it is embedded and often set as default in standard machine learning software. Under the name FedRR, this method was recently shown to be applicable to federated learning (Mishchenko et al.,2021), with superior performance when compared to common baselines such as Local SGD. Inspired by this development, we design three new algorithms to improve FedRR further: compressed FedRR and two variance reduced extensions: one for taming the variance coming from shuffling and the other for taming the variance due to compression. The variance reduction mechanism for compression allows us to eliminate dependence on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Statistical Methods and Inference

MethodsStochastic Gradient Descent · Local SGD