A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

Keke Gai; Dongjue Wang; Jing Yu; Liehuang Zhu; Qi Wu

arXiv:2508.10315·cs.LG·October 14, 2025

A Vision-Language Pre-training Model-Guided Approach for Mitigating Backdoor Attacks in Federated Learning

Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu

PDF

TL;DR

This paper introduces CLIP-Fed, a novel federated learning defense framework leveraging vision-language pre-training models to effectively mitigate backdoor attacks without relying on homogeneous data or clean datasets.

Contribution

The paper proposes a new FL backdoor defense method using vision-language models, overcoming Non-IID data limitations and enhancing privacy-preserving strategies.

Findings

01

Significant reduction in attack success rate on CIFAR-10 and CIFAR-10-LT datasets.

02

Improved main task accuracy compared to existing methods.

03

Effective elimination of trigger-label correlations through prototype contrastive loss.

Abstract

Defending backdoor attacks in Federated Learning (FL) under heterogeneous client data distributions encounters limitations balancing effectiveness and privacy-preserving, while most existing methods highly rely on the assumption of homogeneous client data distributions or the availability of a clean serve dataset. In this paper, we propose an FL backdoor defense framework, named CLIP-Fed, that utilizes the zero-shot learning capabilities of vision-language pre-training models. Our scheme overcomes the limitations of Non-IID imposed on defense effectiveness by integrating pre-aggregation and post-aggregation defense strategies. CLIP-Fed aligns the knowledge of the global model and CLIP on the augmented dataset using prototype contrastive loss and Kullback-Leibler divergence, so that class prototype deviations caused by backdoor samples are ensured and the correlation between trigger…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.