Locally Adaptive Federated Learning

Sohom Mukherjee; Nicolas Loizou; Sebastian U. Stich

arXiv:2307.06306·cs.LG·May 15, 2024

Locally Adaptive Federated Learning

Sohom Mukherjee, Nicolas Loizou, Sebastian U. Stich

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces locally adaptive federated learning algorithms that utilize local geometric information for each client, improving convergence and performance especially in heterogeneous and overparameterized settings.

Contribution

It proposes novel locally adaptive algorithms with uncoordinated stepsizes, analyzing their convergence and demonstrating superior performance over existing methods.

Findings

01

Outperforms FedAvg in non-convex experiments

02

Matches FedAvg in convex settings

03

Achieves better generalization performance

Abstract

Federated learning is a paradigm of distributed machine learning in which multiple clients coordinate with a central server to learn a model, without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) ensure balance among the clients by using the same stepsize for local updates on all clients. However, this means that all clients need to respect the global geometry of the function which could yield slow convergence. In this work, we propose locally adaptive federated learning algorithms, that leverage the local geometric information for each client function. We show that such locally adaptive methods with uncoordinated stepsizes across all clients can be particularly efficient in interpolated (overparameterized) settings, and analyze their convergence in the presence of heterogeneous data for convex and strongly convex settings.…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. The propose algorithm performs local adaptive gradient steps, in contrast, most existing adaptive gradient methods in FL perform adaptive gradients at the server side. 2. Theoretical analysis is provided. Approximate convergence for convex and strongly-convex cases are guaranteed and exact convergence is provided under two special cases: interpolation condition and small step-size condition. 3. Some numerical experiments are provided to validate the proposed algorithm. The numerical studies i

Weaknesses

1. The proposed algorithm seems to be a direct extension of Stochastic Polyak step to the federated learning setting. What is the major difficulty of this application? 2. The theoretical analysis to the heterogeneity is not convincing. $\sigma_f^2$ is used as a measure of client heterogeneity in the paper, however, it is just an upper-bound (Proposition 1) of some more classical measure of heterogeneity, which means the proposed measure is weaker. In fact, if $l^*$ is chosen to be 0 (as in the

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The method is simple and seems effective in some cases based on the experimental results. FL is an important and hot topic which would be insteresting to the ICLR audience. The experiments compared with many baseline methods with adaptive learning rates.

Weaknesses

1. Algorithm design: in my understanding, FedSPS is mainly an FL version of SPS [Loizou et al., 2021]. This extension is rather standard and the algorithmic novelty is not particularly strong. 2. Theory: the theoretical analysis combines the techniques of SPS with standard FL convergence proof, and only studied convex loss functions. Many results in the paper require a very small learning rate upper bound $\gamma_b$ (typically for non-iid clients which is common in FL), which significantly limi

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. Originality: The paper introduces a approach to federated learning, addressing the limitations of existing stepzise tuning methods and providing a solution that leverages local geometric information. 2. Quality: The authors attempt to build a theoretical foundation for their proposed algorithms, analyzing their convergence in various settings.

Weaknesses

1. The connection to the Polyak stepsize and the rationale behind the specific choices of (\gamma_1) and (\gamma_2) in Example 1 could be clarified by referring to the definition in Loizou et al. 2021. 2. The choice of a noise standard deviation (sd) of 10 in Figure 1's caption requires clarification, especially given the observation that SPS does not seem to converge. 3. The paper should provide a clear definition of ( f^* ) in Eq. 5, addressing whether it refers to the global minimum of th

Code & Models

Repositories

IssamLaradji/sps
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Recommender Systems and Techniques

MethodsSemi-Pseudo-Label