FedKRSO: Communication and Memory Efficient Federated Fine-Tuning of Large Language Models
Guohao Yang, Tongle Wu, Yuanxiong Guo, Ying Sun, Yanmin Gong

TL;DR
FedKRSO introduces a novel federated fine-tuning method for large language models that significantly reduces communication and memory costs while maintaining high performance, suitable for resource-constrained edge devices.
Contribution
The paper proposes FedKRSO, a new federated fine-tuning approach that uses random subspace optimization and accumulator updates to improve efficiency and performance.
Findings
Achieves high accuracy on GLUE benchmark in federated settings.
Reduces communication and memory overhead compared to existing methods.
Closely matches the performance of full fine-tuning in federated scenarios.
Abstract
Fine-tuning is essential to adapt general-purpose large language models (LLMs) to domain-specific tasks. As a privacy-preserving framework to leverage decentralized data for collaborative model training, Federated Learning (FL) is gaining popularity in LLM fine-tuning, but remains challenging due to the high cost of transmitting full model parameters and computing full gradients on resource-constrained clients. While Parameter-Efficient Fine-Tuning (PEFT) methods are widely used in FL to reduce communication and memory costs, they often sacrifice model performance compared to FFT. This paper proposes FedKRSO (Federated -Seed Random Subspace Optimization), a novel method that enables communication and memory efficient FFT of LLMs in federated settings. In FedKRSO, clients update the model within a shared set of random low-dimension subspaces generated by the server to save memory…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Big Data and Digital Economy
