Proactive Adversarial Defense: Harnessing Prompt Tuning in   Vision-Language Models to Detect Unseen Backdoored Images

Kyle Stein; Andrew Arash Mahyari; Guillermo Francia; Eman El-Sheikh

arXiv:2412.08755·cs.CV·April 9, 2025

Proactive Adversarial Defense: Harnessing Prompt Tuning in Vision-Language Models to Detect Unseen Backdoored Images

Kyle Stein, Andrew Arash Mahyari, Guillermo Francia, Eman El-Sheikh

PDF

Open Access

TL;DR

This paper presents a novel prompt tuning-based method using vision-language models to detect unseen backdoored images during training and inference, significantly improving detection accuracy.

Contribution

It introduces a new approach leveraging prompt tuning in vision-language models to directly detect unseen backdoor triggers, addressing a critical gap in existing defenses.

Findings

01

Achieves 86% average detection accuracy on benchmark datasets.

02

Effectively detects unseen backdoor triggers during training and inference.

03

Sets a new standard in backdoor defense performance.

Abstract

Backdoor attacks pose a critical threat by embedding hidden triggers into inputs, causing models to misclassify them into target labels. While extensive research has focused on mitigating these attacks in object recognition models through weight fine-tuning, much less attention has been given to detecting backdoored samples directly. Given the vast datasets used in training, manual inspection for backdoor triggers is impractical, and even state-of-the-art defense mechanisms fail to fully neutralize their impact. To address this gap, we introduce a groundbreaking method to detect unseen backdoored images during both training and inference. Leveraging the transformative success of prompt tuning in Vision Language Models (VLMs), our approach trains learnable text prompts to differentiate clean images from those with hidden backdoor triggers. Experiments demonstrate the exceptional efficacy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection

MethodsSoftmax · Attention Is All You Need