A New Theoretical Perspective on Data Heterogeneity in Federated   Optimization

Jiayi Wang; Shiqiang Wang; Rong-Rong Chen; Mingyue Ji

arXiv:2407.15567·cs.LG·July 23, 2024

A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji

PDF

Open Access

TL;DR

This paper introduces a new theoretical framework for understanding data heterogeneity in federated learning, showing that local updates can improve convergence under weaker assumptions than previously thought.

Contribution

It proposes the heterogeneity-driven pseudo-Lipschitz assumption, providing a refined convergence analysis that aligns better with empirical observations in federated learning.

Findings

01

Replacing local Lipschitz constant with heterogeneity-driven pseudo-Lipschitz constant reduces convergence bounds.

02

More local updates can improve convergence even with large data heterogeneity.

03

FedAvg can outperform mini-batch SGD in certain regions despite high gradient divergence.

Abstract

In federated learning (FL), data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence rate. In particular, for many FL algorithms, the convergence rate grows dramatically when the number of local updates becomes large, especially when the product of the gradient divergence and local Lipschitz constant is large. However, empirical studies can show that more local updates can improve the convergence rate even when these two parameters are large, which is inconsistent with the theoretical findings. This paper aims to bridge this gap between theoretical understanding and practical performance by providing a theoretical analysis from a new perspective on data heterogeneity. In particular, we propose a new and weaker assumption compared to the local Lipschitz gradient assumption, named the heterogeneity-driven pseudo-Lipschitz assumption.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Cloud Computing and Resource Management · IoT and Edge/Fog Computing

MethodsStochastic Gradient Descent