Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang, Zhichao Wang, Xiaoying Tang

TL;DR
This paper introduces FedCyBGD, a novel federated learning method for large language models that enables full parameter tuning with minimal resource use by using cycle block gradient descent and a compression scheme.
Contribution
The paper presents FedCyBGD, a new approach that allows full parameter training of LLMs in federated learning with reduced communication, computation, and memory costs.
Findings
Achieves state-of-the-art performance in federated LLM training.
Reduces communication and resource costs significantly.
Enables full parameter tuning in federated settings.
Abstract
The advent of large language models (LLMs) has revolutionized the deep learning paradigm, yielding impressive results across a wide array of tasks. However, the pre-training or fine-tuning of LLMs within a federated learning (FL) framework poses substantial challenges, including considerable computational and memory resource demands, as well as communication bottlenecks between servers and clients. Existing solutions either make the unrealistic assumption that the entire model is exchanged for training, or apply parameter-effective fine-tuning methods from centralized learning to train LLMs in FL which tend to underperform during training or fine-tuning stages due to the limited search subspace of parameter updating. In this paper, we introduce a novel method for the efficient training and fine-tuning of LLMs in FL, with minimal resource consumption. Our approach, termed FedCyBGD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
