The Perils of Learning From Unlabeled Data: Backdoor Attacks on   Semi-supervised Learning

Virat Shejwalkar; Lingjuan Lyu; Amir Houmansadr

arXiv:2211.00453·cs.CR·November 2, 2022

The Perils of Learning From Unlabeled Data: Backdoor Attacks on Semi-supervised Learning

Virat Shejwalkar, Lingjuan Lyu, Amir Houmansadr

PDF

Open Access

TL;DR

This paper demonstrates that semi-supervised learning is highly vulnerable to backdoor poisoning attacks using minimal unlabeled data, which can cause widespread misclassification and bypass defenses.

Contribution

It introduces a novel backdoor poisoning attack on SSL that requires minimal data poisoning and is effective across multiple datasets and algorithms, highlighting security risks.

Findings

01

Poisoning only 0.2% of unlabeled data causes over 80% misclassification.

02

Attacks are effective across 20 dataset and algorithm combinations.

03

Existing defenses can be circumvented by the proposed attack.

Abstract

Semi-supervised machine learning (SSL) is gaining popularity as it reduces the cost of training ML models. It does so by using very small amounts of (expensive, well-inspected) labeled data and large amounts of (cheap, non-inspected) unlabeled data. SSL has shown comparable or even superior performances compared to conventional fully-supervised ML techniques. In this paper, we show that the key feature of SSL that it can learn from (non-inspected) unlabeled data exposes SSL to strong poisoning attacks. In fact, we argue that, due to its reliance on non-inspected unlabeled data, poisoning is a much more severe problem in SSL than in conventional fully-supervised ML. Specifically, we design a backdoor poisoning attack on SSL that can be conducted by a weak adversary with no knowledge of target SSL pipeline. This is unlike prior poisoning attacks in fully-supervised settings that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification