Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks
Xiaoyun Xu, Oguzhan Ersoy, Stjepan Picek

TL;DR
This paper introduces a novel backdoor detection method using universal adversarial perturbations, exploiting their differing behaviors in backdoored versus clean models to identify and reverse engineer backdoors effectively.
Contribution
It proposes the Universal Soldier method that leverages UAPs to detect backdoors and reverse engineer triggers, outperforming existing techniques.
Findings
UAPs from backdoored models require fewer perturbations to mislead.
Backdoored models' UAPs exploit shortcuts created by triggers.
The method achieves high detection accuracy on multiple datasets.
Abstract
Deep learning models achieve excellent performance in numerous machine learning tasks. Yet, they suffer from security-related issues such as adversarial examples and poisoning (backdoor) attacks. A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. Then, a backdoored model performs as expected when receiving a clean input, but it misclassifies when receiving a backdoored input stamped with a pre-designed pattern called "trigger". Unfortunately, it is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger. This paper proposes a backdoor detection method by utilizing a special type of adversarial attack, universal adversarial perturbation (UAP), and its similarities with a backdoor trigger. We observe an intuitive phenomenon: UAPs generated from backdoored models need fewer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Electrostatic Discharge in Electronics
