SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
Momin Ahmad Khan, Yasra Chandio, Fatima Muhammad Anwar

TL;DR
This paper investigates backdoor vulnerabilities in Federated Prompt Learning for vision-language models, proposing SABRE-FL, a lightweight defense that effectively detects and filters poisoned updates without raw data access.
Contribution
It is the first to analyze backdoor attacks in Federated Prompt Learning and introduces SABRE-FL, a novel embedding-space anomaly detector for robust defense.
Findings
SABRE-FL significantly reduces backdoor success rates across datasets.
The method maintains high accuracy on clean inputs.
It outperforms existing defenses in empirical evaluations.
Abstract
Federated Prompt Learning has emerged as a communication-efficient and privacy-preserving paradigm for adapting large vision-language models like CLIP across decentralized clients. However, the security implications of this setup remain underexplored. In this work, we present the first study of backdoor attacks in Federated Prompt Learning. We show that when malicious clients inject visually imperceptible, learnable noise triggers into input images, the global prompt learner becomes vulnerable to targeted misclassification while still maintaining high accuracy on clean inputs. Motivated by this vulnerability, we propose SABRE-FL, a lightweight, modular defense that filters poisoned prompt updates using an embedding-space anomaly detector trained offline on out-of-distribution data. SABRE-FL requires no access to raw client data or labels and generalizes across diverse datasets. We show,…
Peer Reviews
Decision·ICLR 2026 Poster
1. This paper studies backdoor attacks in federated prompt learning (FPL), where only prompt parameters, not full model weights, are shared; this is timely and relevant as CLIP-style adaptations in FL are increasingly used. 2. The method shows potential generalizability: a detector trained on Caltech-101 transfers to datasets not seen during training. 3. The paper provides clear motivation and visualization that support the defense’s intuition; the results look promising in tackling the backdoor
1. The methodology section lacks important details. (i) The paper describes the trigger as a learnable noise pattern but does not explain how it is optimized, what loss function or parameters are used, and how it interacts with the local prompt updates (e.g., whether it uses SGD, PGD, or another generator). (ii) The defense critically depends on the parameter $m$—the number of clients excluded from aggregation each round—but there is no principled method or empirical guideline for setting this v
- Pioneers the study of backdoor threats in the emerging FPL paradigm. - Introduces a well-motivated and FPL-specific backdoor mechanism based on learnable, imperceptible noise triggers. - Clear writing and strong organization, aided by effective visual explanations of both attack and defense designs.
1. The paper claims the noise triggers are *visually imperceptible*, but lacks direct image comparisons. Including visual examples (original vs. triggered) or a qualitative study would strengthen this claim. 2. SABRE-FL removes the top-*m* suspicious clients, assuming *m* is known. An analysis of sensitivity to inaccurate estimates of *m* would clarify robustness in real-world settings. 3. The distinction between the proposed attack and a federated adaptation of BadCLIP should be elaborated—what
1.This work presents the first systematic study of backdoor attack vulnerabilities within the Federated Prompt Learning (FPL) paradigm. This exploratory contribution is significant as it illuminates a critical and previously unexamined attack dimension. 2.The paper proposes SABRE-FL, a novel server-side defense mechanism. The core of this mechanism involves using a lightweight MLP, trained offline on an out-of-distribution (OOD) dataset, to detect embedding-space anomalies. 3.A key advantage of
1.SABRE-FL is essentially an anomaly detector. In heterogeneous (Non-IID) FL scenarios, natural shifts in data distribution are an inherent characteristic. The paper provides no evidence that the detector D can distinguish between malicious offsets caused by the attack and benign shifts arising from this data heterogeneity. This casts serious doubt on the method's effectiveness in realistic FL settings. 2.The defense mechanism relies on an assumption that is difficult to satisfy in practice: the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Privacy-Preserving Technologies in Data
