Safely Learning with Private Data: A Federated Learning Framework for Large Language Model
JiaYing Zheng, HaiNan Zhang, LingXiang Wang, WangJie Qiu, HongWei, Zheng, ZhiMing Zheng

TL;DR
This paper introduces FL-GLM, a federated learning framework for large language models that enhances privacy and efficiency by preventing data leakage and enabling parallel training across distributed private data sources.
Contribution
The paper proposes a novel FL framework for LLMs that secures private data against attacks and improves training efficiency through client-side input/output placement, key-encryption, and optimized batching strategies.
Findings
FL-GLM achieves comparable performance to centralized models on NLP tasks.
The framework effectively prevents embedding gradient and peer-client reverse engineering attacks.
Experimental results show improved training efficiency with various acceleration methods.
Abstract
Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuitable for LLM due to their high computational demands on clients. An alternative, split learning, offloads most training parameters to the server while training embedding and output layers locally, making it more suitable for LLM. Nonetheless, it faces significant challenges in security and efficiency. Firstly, the gradients of embeddings are prone to attacks, leading to potential reverse engineering of private data. Furthermore, the server's limitation of handle only one client's training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
