TL;DR
This paper introduces a simple yet highly effective Deep k-NN based defense mechanism that detects and removes over 99% of clean-label poisoning samples in training data, significantly improving robustness against such attacks.
Contribution
The authors propose a novel Deep k-NN defense method that effectively counters clean-label data poisoning attacks on CIFAR-10, providing a reliable and easy-to-implement solution.
Findings
Detects over 99% of poisoned examples
Removes poisoned samples without harming model performance
Provides guidelines for selecting k in real-world datasets
Abstract
Targeted clean-label data poisoning is a type of adversarial attack on machine learning systems in which an adversary injects a few correctly-labeled, minimally-perturbed samples into the training data, causing a model to misclassify a particular test sample during inference. Although defenses have been proposed for general poisoning attacks, no reliable defense for clean-label attacks has been demonstrated, despite the attacks' effectiveness and realistic applications. In this work, we propose a simple, yet highly-effective Deep k-NN defense against both feature collision and convex polytope clean-label attacks on the CIFAR-10 dataset. We demonstrate that our proposed strategy is able to detect over 99% of poisoned examples in both attacks and remove them without compromising model performance. Additionally, through ablation studies, we discover simple guidelines for selecting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest · k-Nearest Neighbors
