A New Theoretical Perspective on Data Heterogeneity in Federated Optimization
Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji

TL;DR
This paper introduces a new theoretical framework for understanding data heterogeneity in federated learning, showing that local updates can improve convergence under weaker assumptions than previously thought.
Contribution
It proposes the heterogeneity-driven pseudo-Lipschitz assumption, providing a refined convergence analysis that aligns better with empirical observations in federated learning.
Findings
Replacing local Lipschitz constant with heterogeneity-driven pseudo-Lipschitz constant reduces convergence bounds.
More local updates can improve convergence even with large data heterogeneity.
FedAvg can outperform mini-batch SGD in certain regions despite high gradient divergence.
Abstract
In federated learning (FL), data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence rate. In particular, for many FL algorithms, the convergence rate grows dramatically when the number of local updates becomes large, especially when the product of the gradient divergence and local Lipschitz constant is large. However, empirical studies can show that more local updates can improve the convergence rate even when these two parameters are large, which is inconsistent with the theoretical findings. This paper aims to bridge this gap between theoretical understanding and practical performance by providing a theoretical analysis from a new perspective on data heterogeneity. In particular, we propose a new and weaker assumption compared to the local Lipschitz gradient assumption, named the heterogeneity-driven pseudo-Lipschitz assumption.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Cloud Computing and Resource Management · IoT and Edge/Fog Computing
MethodsStochastic Gradient Descent
