Robustly-reliable learners under poisoning attacks
Maria-Florina Balcan, Avrim Blum, Steve Hanneke, Dravyansh Sharma

TL;DR
This paper introduces methods to achieve strong, certifiable robustness against data poisoning attacks in machine learning, ensuring correct predictions despite adversarial corruption, with efficient algorithms for certain models and settings.
Contribution
It provides the first robustly-reliable prediction guarantees under poisoning attacks, including complete learnability characterizations and efficient algorithms for linear separators.
Findings
Certifies prediction correctness under poisoning with a known corruption budget.
Provides tight bounds on the certifiable region for learnability.
Develops polynomial-time algorithms for linear classifiers on log-concave distributions.
Abstract
Data poisoning attacks, in which an adversary corrupts a training set with the goal of inducing specific desired mistakes, have raised substantial concern: even just the possibility of such an attack can make a user no longer trust the results of a learning system. In this work, we show how to achieve strong robustness guarantees in the face of such attacks across multiple axes. We provide robustly-reliable predictions, in which the predicted label is guaranteed to be correct so long as the adversary has not exceeded a given corruption budget, even in the presence of instance targeted attacks, where the adversary knows the test example in advance and aims to cause a specific failure on that example. Our guarantees are substantially stronger than those in prior approaches, which were only able to provide certificates that the prediction of the learning algorithm does not change, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Adversarial Robustness in Machine Learning · Imbalanced Data Classification Techniques
