Locally Adaptive Federated Learning
Sohom Mukherjee, Nicolas Loizou, Sebastian U. Stich

TL;DR
This paper introduces locally adaptive federated learning algorithms that utilize local geometric information for each client, improving convergence and performance especially in heterogeneous and overparameterized settings.
Contribution
It proposes novel locally adaptive algorithms with uncoordinated stepsizes, analyzing their convergence and demonstrating superior performance over existing methods.
Findings
Outperforms FedAvg in non-convex experiments
Matches FedAvg in convex settings
Achieves better generalization performance
Abstract
Federated learning is a paradigm of distributed machine learning in which multiple clients coordinate with a central server to learn a model, without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) ensure balance among the clients by using the same stepsize for local updates on all clients. However, this means that all clients need to respect the global geometry of the function which could yield slow convergence. In this work, we propose locally adaptive federated learning algorithms, that leverage the local geometric information for each client function. We show that such locally adaptive methods with uncoordinated stepsizes across all clients can be particularly efficient in interpolated (overparameterized) settings, and analyze their convergence in the presence of heterogeneous data for convex and strongly convex settings.…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The propose algorithm performs local adaptive gradient steps, in contrast, most existing adaptive gradient methods in FL perform adaptive gradients at the server side. 2. Theoretical analysis is provided. Approximate convergence for convex and strongly-convex cases are guaranteed and exact convergence is provided under two special cases: interpolation condition and small step-size condition. 3. Some numerical experiments are provided to validate the proposed algorithm. The numerical studies i
1. The proposed algorithm seems to be a direct extension of Stochastic Polyak step to the federated learning setting. What is the major difficulty of this application? 2. The theoretical analysis to the heterogeneity is not convincing. $\sigma_f^2$ is used as a measure of client heterogeneity in the paper, however, it is just an upper-bound (Proposition 1) of some more classical measure of heterogeneity, which means the proposed measure is weaker. In fact, if $l^*$ is chosen to be 0 (as in the
The method is simple and seems effective in some cases based on the experimental results. FL is an important and hot topic which would be insteresting to the ICLR audience. The experiments compared with many baseline methods with adaptive learning rates.
1. Algorithm design: in my understanding, FedSPS is mainly an FL version of SPS [Loizou et al., 2021]. This extension is rather standard and the algorithmic novelty is not particularly strong. 2. Theory: the theoretical analysis combines the techniques of SPS with standard FL convergence proof, and only studied convex loss functions. Many results in the paper require a very small learning rate upper bound $\gamma_b$ (typically for non-iid clients which is common in FL), which significantly limi
1. Originality: The paper introduces a approach to federated learning, addressing the limitations of existing stepzise tuning methods and providing a solution that leverages local geometric information. 2. Quality: The authors attempt to build a theoretical foundation for their proposed algorithms, analyzing their convergence in various settings.
1. The connection to the Polyak stepsize and the rationale behind the specific choices of (\gamma_1) and (\gamma_2) in Example 1 could be clarified by referring to the definition in Loizou et al. 2021. 2. The choice of a noise standard deviation (sd) of 10 in Figure 1's caption requires clarification, especially given the observation that SPS does not seem to converge. 3. The paper should provide a clear definition of ( f^* ) in Eq. 5, addressing whether it refers to the global minimum of th
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Recommender Systems and Techniques
MethodsSemi-Pseudo-Label
