Federated Split BERT for Heterogeneous Text Classification
Zhengyang Li, Shijing Si, Jianzong Wang, Jing Xiao

TL;DR
This paper introduces FedSplitBERT, a federated learning framework that splits BERT layers to handle heterogeneous data and reduces communication costs, improving performance in privacy-sensitive NLP tasks.
Contribution
The paper proposes FedSplitBERT, a novel split-layer approach for federated BERT training that addresses data heterogeneity and communication efficiency.
Findings
Outperforms baseline methods significantly.
Reduces communication cost by 11.9 times with quantization.
Effective in handling non-IID data in federated NLP tasks.
Abstract
Pre-trained BERT models have achieved impressive performance in many natural language processing (NLP) tasks. However, in many real-world situations, textual data are usually decentralized over many clients and unable to be uploaded to a central server due to privacy protection and regulations. Federated learning (FL) enables multiple clients collaboratively to train a global model while keeping the local data privacy. A few researches have investigated BERT in federated learning setting, but the problem of performance loss caused by heterogeneous (e.g., non-IID) data over clients remain under-explored. To address this issue, we propose a framework, FedSplitBERT, which handles heterogeneous data and decreases the communication cost by splitting the BERT encoder layers into local part and global part. The local part parameters are trained by the local client only while the global part…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Weight Decay · Dropout · Adam · WordPiece · Linear Warmup With Linear Decay · Attention Dropout
