A Unified Linear Speedup Analysis of Federated Averaging and Nesterov   FedAvg

Zhaonan Qu; Kaixiang Lin; Zhaojian Li; Jiayu Zhou; Zhengyuan Zhou

arXiv:2007.05690·cs.LG·January 2, 2024·1 cites

A Unified Linear Speedup Analysis of Federated Averaging and Nesterov FedAvg

Zhaonan Qu, Kaixiang Lin, Zhaojian Li, Jiayu Zhou, Zhengyuan Zhou

PDF

Open Access

TL;DR

This paper provides a unified analysis of the convergence behavior of Federated Averaging (FedAvg) and its Nesterov accelerated variant, demonstrating linear speedup in various convex settings and characterizing their convergence rates.

Contribution

It offers the first unified convergence guarantees for FedAvg and Nesterov FedAvg under non-i.i.d. data and partial participation, including linear speedup results.

Findings

01

FedAvg achieves linear speedup in convex and strongly convex problems.

02

Nesterov FedAvg has established convergence rates with linear speedup.

03

Empirical results support the theoretical convergence guarantees.

Abstract

Federated learning (FL) learns a model jointly from a set of participating devices without sharing each other's privately held data. The characteristics of non-i.i.d. data across the network, low device participation, high communication costs, and the mandate that data remain private bring challenges in understanding the convergence of FL algorithms, particularly regarding how convergence scales with the number of participating devices. In this paper, we focus on Federated Averaging (FedAvg), one of the most popular and effective FL algorithms in use today, as well as its Nesterov accelerated variant, and conduct a systematic study of how their convergence scale with the number of participating devices under non-i.i.d. data and partial participation in convex settings. We provide a unified analysis that establishes convergence guarantees for FedAvg under strongly convex, convex, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization

MethodsLinear Regression