One Pixel is All I Need
Deng Siqin, Zhou Xiaoyi

TL;DR
This paper uncovers vulnerabilities of Vision Transformers to backdoor attacks, introduces a tool called PSDM to analyze model sensitivity, and proposes a highly efficient data poisoning attack called WorstVIT that requires minimal modifications.
Contribution
It reveals the effectiveness of quasi-triggers in ViTs, introduces PSDM for sensitivity analysis, and presents WorstVIT, a minimalistic backdoor attack method for ViTs.
Findings
ViTs are more susceptible to quasi-triggers than CNNs.
PSDM shows patch-like sensitivity patterns in ViTs.
WorstVIT achieves successful attacks with minimal poisoning.
Abstract
Vision Transformers (ViTs) have achieved record-breaking performance in various visual tasks. However, concerns about their robustness against backdoor attacks have grown. Backdoor attacks involve associating a specific trigger with a target label, causing the model to predict the attacker-specified label when the trigger is present, while correctly identifying clean images.We found that ViTs exhibit higher attack success rates for quasi-triggers(patterns different from but similar to the original training triggers)compared to CNNs. Moreover, some backdoor features in clean samples can suppress the original trigger, making quasi-triggers more effective.To better understand and exploit these vulnerabilities, we developed a tool called the Perturbation Sensitivity Distribution Map (PSDM). PSDM computes and sums gradients over many inputs to show how sensitive the model is to small changes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCCD and CMOS Imaging Sensors
