Trojaning semi-supervised learning model via poisoning wild images on the web
Le Feng, Zhenxing Qian, Sheng Li, Xinpeng Zhang

TL;DR
This paper demonstrates a novel backdoor poisoning attack on semi-supervised learning models trained from scratch using unlabeled web images, achieving high success rates and bypassing defenses.
Contribution
It introduces a gradient matching poisoning strategy specifically designed for unlabeled images in SSL, a setting previously unexplored for backdoor attacks.
Findings
Achieves state-of-the-art attack success rates on SSL models.
Backdoor poisoning fails on unlabeled images from different classes.
Proposes a gradient matching method for effective poisoning.
Abstract
Wild images on the web are vulnerable to backdoor (also called trojan) poisoning, causing machine learning models learned on these images to be injected with backdoors. Most previous attacks assumed that the wild images are labeled. In reality, however, most images on the web are unlabeled. Specifically, we study the effects of unlabeled backdoor images under semi-supervised learning (SSL) on widely studied deep neural networks. To be realistic, we assume that the adversary is zero-knowledge and that the semi-supervised learning model is trained from scratch. Firstly, we find the fact that backdoor poisoning always fails when poisoned unlabeled images come from different classes, which is different from poisoning the labeled images. The reason is that the SSL algorithms always strive to correct them during training. Therefore, for unlabeled images, we implement backdoor poisoning on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
