Breaking Fair Binary Classification with Optimal Flipping Attacks
Changhun Jo, Jy-yong Sohn, Kangwook Lee

TL;DR
This paper investigates the vulnerability of fair classification algorithms to data poisoning attacks, establishing bounds on the corruption needed for successful attacks and proposing an efficient attack method.
Contribution
It provides tight bounds on data corruption required for flipping attacks and introduces a practical algorithm to compromise fair classifiers.
Findings
Bounds on data corruption for successful attacks
Efficient algorithm for data poisoning
Vulnerability of fair classifiers to flipping attacks
Abstract
Minimizing risk with fairness constraints is one of the popular approaches to learning a fair classifier. Recent works showed that this approach yields an unfair classifier if the training set is corrupted. In this work, we study the minimum amount of data corruption required for a successful flipping attack. First, we find lower/upper bounds on this quantity and show that these bounds are tight when the target model is the unique unconstrained risk minimizer. Second, we propose a computationally efficient data poisoning attack algorithm that can compromise the performance of fair learning algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data
