Faster Rates for Compressed Federated Learning with Client-Variance   Reduction

Haoyu Zhao; Konstantin Burlachenko; Zhize Li; Peter Richt\'arik

arXiv:2112.13097·cs.LG·September 26, 2023·1 cites

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper introduces COFIG and FRECON, two communication-efficient federated learning algorithms that reduce client-variance and improve convergence rates, especially in high heterogeneity and compression scenarios.

Contribution

The paper proposes novel compressed and client-variance reduced methods COFIG and FRECON with proven faster convergence bounds in federated learning.

Findings

01

COFIG achieves an $O(rac{(1+\omega)^{3/2}\sqrt{N}}{S\epsilon^2}+rac{(1+\omega)N^{2/3}}{S\epsilon^2})$ communication rounds bound.

02

FRECON attains an $O(rac{(1+\omega)\sqrt{N}}{S\epsilon^2})$ communication rounds bound.

03

Experimental results show COFIG and FRECON outperform existing baselines.

Abstract

Due to the communication bottleneck in distributed and federated learning applications, algorithms using communication compression have attracted significant attention and are widely used in practice. Moreover, the huge number, high heterogeneity and limited availability of clients result in high client-variance. This paper addresses these two issues together by proposing compressed and client-variance reduced methods COFIG and FRECON. We prove an $O (\frac{( 1 + ω ) ^{3/2} N}{S ϵ ^{2}} + \frac{( 1 + ω ) N ^{2/3}}{S ϵ ^{2}})$ bound on the number of communication rounds of COFIG in the nonconvex setting, where $N$ is the total number of clients, $S$ is the number of clients participating in each round, $ϵ$ is the convergence error, and $ω$ is the variance parameter associated with the compression operator. In case of FRECON, we prove an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Cryptography and Data Security