Painless Federated Learning: An Interplay of Line-Search and Extrapolation
Geetika, Somya Tyagi, Bapi Chatterjee

TL;DR
This paper introduces FedSLS and FedExpSLS algorithms that adapt line search and extrapolation techniques to federated learning, improving convergence rates and performance in heterogeneous and noisy data environments.
Contribution
It proposes novel federated stochastic line search and extrapolation methods with theoretical convergence guarantees and empirical validation.
Findings
FedSLS achieves linear convergence for strongly convex problems.
FedExpSLS improves empirical performance with extrapolation.
Methods outperform popular federated algorithms across various tasks.
Abstract
The classical line search for learning rate (LR) tuning in the stochastic gradient descent (SGD) algorithm can tame the convergence slowdown due to data-sampling noise. In a federated setting, wherein the client heterogeneity introduces a slowdown to the global convergence, line search can be relevantly adapted. In this work, we show that a stochastic variant of line search tames the heterogeneity in federated optimization in addition to that due to client-local gradient noise. To this end, we introduce Federated Stochastic Line Search (FedSLS) algorithm and show that it achieves deterministic rates in expectation. Specifically, FedSLS offers linear convergence for strongly convex objectives even with partial client participation. Recently, the extrapolation of the server's LR has shown promise for improved empirical performance for federated learning. To benefit from extrapolation, we…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* Introduces the Armijo condition, which can be used to overcome the bias in SGD without the sample-wise interpolation for a single local solver. * Theoretical guarantees showing that FedSLS and FedExpSLS achieve deterministic rates in FL under standard assumptions along with the Armijo condition and the interpolation condition. * Proposed methods are evaluated on multiple datasets (CIFAR-10/100, FEMNIST, Shakespeare) and models (ResNet-18, LSTM, etc.), consistently outperforming FedAvg, FedExp
**Lack of clear motivation and positioning:** The motivation for the work feels underdeveloped. The introduction reads more like a problem formulation than a compelling argument for why this specific direction is needed. Many algorithms already aim to speed up federated learning, so it’s unclear why line search, and particularly Armijo line search, is the right tool for the job. The paper would benefit from a clearer explanation of what gap this approach fills and why combining line search with
- The algorithm makes sense and appears simple to implement, and shows that line search is a natural thing to try in FL - Linear convergence under partial participation is somewhat expected given the analogous result in the centralized setting, but it is good to see the authors are able to derive it. - FedExpSLS looks pretty strong, with solid convergence results compared to other methods - The line search overhead seems to not be too much, again showing the algorithm is not too costly
- The work assumes that there is an optimum that is shared across all the clients. This is a pretty strong assumption - Communication round-based plots are good, but there should probably be wall-clock based ones (or total client iteration-based) because of the additional client compute used by the line search - Tuning grids should be included in the experiments - Ideally in optimization, training runs should be compared against prior work in a leaderboard-like manner. See for example https://ke
(1) The authors proposed two novel algorithms, and provide corresponding analysis under a set of assumptions, the assumptions are clearly stated and the theorems are proved rigorously. (2) The paper combines the line search with extrapolation and obtain state of the art results. (3) Empirical results are there to further validate the theoretical claims.
(1) My primary concern is the set of assumptions used in the paper, the authors assume interpolation condition in all cases of their convergence guarantee. Although previous studies have provide certain justifications, such an assumption is not likely to be checked in general. In fact, I do not quite understand why would such an assumption is needed in the case of FedSLS and FedExPSLS. The authors mentioned FedExProx, and in that case this assumption is needed because of the algorithm connects t
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Data Quality and Management
