GitFL: Adaptive Asynchronous Federated Learning using Version Control
Ming Hu, Zeke Xia, Zhihao Yue, Jun Xia, Yihao Huang and, Yang Liu, Mingsong Chen

TL;DR
GitFL introduces an adaptive asynchronous federated learning framework inspired by version control systems, effectively managing device stragglers and improving training speed and accuracy in AIoT scenarios.
Contribution
The paper proposes a novel asynchronous FL framework using version control concepts and RL-based device selection to enhance performance and robustness.
Findings
Achieves up to 2.64X training acceleration.
Improves inference accuracy by up to 7.88%.
Effectively manages straggling devices and model staleness.
Abstract
As a promising distributed machine learning paradigm that enables collaborative training without compromising data privacy, Federated Learning (FL) has been increasingly used in AIoT (Artificial Intelligence of Things) design. However, due to the lack of efficient management of straggling devices, existing FL methods greatly suffer from the problems of low inference accuracy and long training time. Things become even worse when taking various uncertain factors (e.g., network delays, performance variances caused by process variation) existing in AIoT scenarios into account. To address this issue, this paper proposes a novel asynchronous FL framework named GitFL, whose implementation is inspired by the famous version control system Git. Unlike traditional FL, the cloud server of GitFL maintains a master model (i.e., the global model) together with a set of branch models indicating the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
