TL;DR
This paper introduces FedBCGD, a communication-efficient federated learning method that splits model parameters into blocks, reducing communication overhead and accelerating convergence for large-scale models like Vision Transformers.
Contribution
It proposes a novel block coordinate gradient descent approach with acceleration and variance reduction, first to address parameter block communication in large-scale deep models.
Findings
Communication complexity reduced by a factor of 1/N.
Faster convergence compared to existing methods.
Empirical results demonstrate superior performance.
Abstract
Although Federated Learning has been widely studied in recent years, there are still high overhead expenses in each communication round for large-scale models such as Vision Transformer. To lower the communication complexity, we propose a novel Federated Block Coordinate Gradient Descent (FedBCGD) method for communication efficiency. The proposed method splits model parameters into several blocks, including a shared block and enables uploading a specific parameter block by each client, which can significantly reduce communication overhead. Moreover, we also develop an accelerated FedBCGD algorithm (called FedBCGD+) with client drift control and stochastic variance reduction. To the best of our knowledge, this paper is the first work on parameter block communication for training large-scale deep models. We also provide the convergence analysis for the proposed algorithms. Our theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
