ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data

Haodong Zhao; Jinming Hu; Zhaomin Wu; Zongru Wu; Wei Du; Junyi Hou; Caibei Zhao; Zhuosheng Zhang; Bingsheng He; Gongshen Liu

arXiv:2603.00516·cs.CR·March 3, 2026

ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data

Haodong Zhao, Jinming Hu, Zhaomin Wu, Zongru Wu, Wei Du, Junyi Hou, Caibei Zhao, Zhuosheng Zhang, Bingsheng He, Gongshen Liu

PDF

Open Access

TL;DR

ProtegoFed is a novel federated instruction tuning framework that effectively detects and removes backdoor poisoned data across clients, ensuring model integrity without compromising utility.

Contribution

It introduces a robust gradient-based detection method and a collaborative clustering mechanism to defend against pervasive poisoned data in federated instruction tuning.

Findings

01

Detects 92-100% of poisoned samples

02

Reduces attack success rate to near zero

03

Maintains task utility

Abstract

Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings on natural backdoors and the existing training data collection method suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign. This work systematically examine this threat in FIT, demonstrating that existing defenses are ineffective when poisoned data is interspersed among all clients. Addressing this challenge entails two major difficulties: identifying the distinctive characteristics of poisoned samples at each client and enabling collaborative defense when some clients are heavily dominated by poisoned samples. To address these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks