ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data
Haodong Zhao, Jinming Hu, Zhaomin Wu, Zongru Wu, Wei Du, Junyi Hou, Caibei Zhao, Zhuosheng Zhang, Bingsheng He, Gongshen Liu

TL;DR
ProtegoFed is a novel federated instruction tuning framework that effectively detects and removes backdoor poisoned data across clients, ensuring model integrity without compromising utility.
Contribution
It introduces a robust gradient-based detection method and a collaborative clustering mechanism to defend against pervasive poisoned data in federated instruction tuning.
Findings
Detects 92-100% of poisoned samples
Reduces attack success rate to near zero
Maintains task utility
Abstract
Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings on natural backdoors and the existing training data collection method suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign. This work systematically examine this threat in FIT, demonstrating that existing defenses are ineffective when poisoned data is interspersed among all clients. Addressing this challenge entails two major difficulties: identifying the distinctive characteristics of poisoned samples at each client and enabling collaborative defense when some clients are heavily dominated by poisoned samples. To address these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
