Label Sanitization against Label Flipping Poisoning Attacks
Andrea Paudice, Luis Mu\~noz-Gonz\'alez, Emil C. Lupu

TL;DR
This paper introduces a method to perform optimal label flipping poisoning attacks on machine learning systems and proposes a detection and relabeling mechanism to defend against such attacks, enhancing data integrity.
Contribution
It presents an efficient algorithm for optimal label flipping attacks and a novel detection method to mitigate poisoning effects in training data.
Findings
The attack algorithm identifies the most damaging label flips.
The detection mechanism effectively identifies suspicious data points.
The combined approach improves model robustness against poisoning.
Abstract
Many machine learning systems rely on data collected in the wild from untrusted sources, exposing the learning algorithms to data poisoning. Attackers can inject malicious data in the training dataset to subvert the learning process, compromising the performance of the algorithm producing errors in a targeted or an indiscriminate way. Label flipping attacks are a special case of data poisoning, where the attacker can control the labels assigned to a fraction of the training points. Even if the capabilities of the attacker are constrained, these attacks have been shown to be effective to significantly degrade the performance of the system. In this paper we propose an efficient algorithm to perform optimal label flipping poisoning attacks and a mechanism to detect and relabel suspicious data points, mitigating the effect of such poisoning attacks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
