Orthogonal Calibration for Asynchronous Federated Learning
Jiayun Zhang, Shuheng Li, Haiyu Huang, Xiaofan Yu, Rajesh K. Gupta,, Jingbo Shang

TL;DR
OrthoFL introduces an orthogonal calibration framework for asynchronous federated learning, effectively decoupling global and local updates to improve accuracy and speedup over existing methods.
Contribution
The paper proposes OrthoFL, a novel orthogonal calibration method that reduces interference between global and local updates in asynchronous federated learning.
Findings
Improves accuracy by 9.6% over baseline methods.
Achieves 12× speedup compared to synchronous approaches.
Outperforms state-of-the-art asynchronous methods across various scenarios.
Abstract
Asynchronous federated learning mitigates the inefficiency of conventional synchronous aggregation by integrating updates as they arrive and adjusting their influence based on staleness. Due to asynchrony and data heterogeneity, learning objectives at the global and local levels are inherently inconsistent -- global optimization trajectories may conflict with ongoing local updates. Existing asynchronous methods simply distribute the latest global weights to clients, which can overwrite local progress and cause model drift. In this paper, we propose OrthoFL, an orthogonal calibration framework that decouples global and local learning progress and adjusts global shifts to minimize interference before merging them into local models. In OrthoFL, clients and the server maintain separate model weights. Upon receiving an update, the server aggregates it into the global weights via a moving…
Peer Reviews
Decision·Submitted to ICLR 2026
* The core idea of using orthogonal calibration to decouple and manage the interference between global and local updates is novel and intuitive. * The paper is well-written and clearly presented, making the motivation and methodology easy to follow. * The experiments are comprehensive, covering multiple datasets, a strong set of synchronous and asynchronous baselines, and various simulated delay distributions.
* The paper claims layer-wise projection is "memory-efficient" in Section 4.1. However, storing the model weights for projection, whether layer-by-layer or as a single flattened vector, seems to require the same amount of storage space. Could the authors clarify this claim? * The method excludes the component of the global shift that is projected onto the local update. Should this information always be discarded? Since local models can be biased due to non-IID data, this projected component (whe
1. The experiments in this paper are conducted comprehensively, though some minor flaws exist in certain details. 2. Most methods mentioned in the paper are elaborated upon in detail within the appendix. 3. By integrating the experiments in the main text with the experimental results in the appendix, it is evident that the proposed method in this paper demonstrates certain generalization capabilities and cutting-edge potential.
1. The author mentions improving model performance by optimizing weights on both client and server sides. It is suggested to add 1-2 adaptive weighting methods to the comparative experiments to demonstrate the model's advantages. 2. Table 2 presents the performance of the proposed model and comparison models across five datasets. However, the subsequent ablation experiments only include three datasets. It is recommended to include the remaining two datasets here. 3. The experimental sections i
The use of orthogonal projection to decouple conflicting updates is an elegant, geometrically intuitive, and novel solution. The evaluation uses strong baselines, diverse tasks, and realistic delay simulations to convincingly demonstrate the method's robustness and superiority. The evaluation is a major strength, using strong baselines, diverse tasks, and realistic delay simulations to convincingly demonstrate the method's robustness and superiority.
1. The theoretical analysis is limited. The analysis in Appendix I is a one-step analysis, showing that under certain conditions, a single step of ORTHOFL is better than a baseline. This is insightful but does not constitute a full convergence proof over the entire training process. For a top-tier conference like ICLR, a more complete theoretical treatment guaranteeing convergence would significantly strengthen the paper. 2. The ORTHOFL algorithm, particularly the server-side implementation (Alg
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Advanced Memory and Neural Computing
