Breaking Fair Binary Classification with Optimal Flipping Attacks

Changhun Jo; Jy-yong Sohn; Kangwook Lee

arXiv:2204.05472·cs.LG·May 10, 2022

Breaking Fair Binary Classification with Optimal Flipping Attacks

Changhun Jo, Jy-yong Sohn, Kangwook Lee

PDF

Open Access

TL;DR

This paper investigates the vulnerability of fair classification algorithms to data poisoning attacks, establishing bounds on the corruption needed for successful attacks and proposing an efficient attack method.

Contribution

It provides tight bounds on data corruption required for flipping attacks and introduces a practical algorithm to compromise fair classifiers.

Findings

01

Bounds on data corruption for successful attacks

02

Efficient algorithm for data poisoning

03

Vulnerability of fair classifiers to flipping attacks

Abstract

Minimizing risk with fairness constraints is one of the popular approaches to learning a fair classifier. Recent works showed that this approach yields an unfair classifier if the training set is corrupted. In this work, we study the minimum amount of data corruption required for a successful flipping attack. First, we find lower/upper bounds on this quantity and show that these bounds are tight when the target model is the unique unconstrained risk minimizer. Second, we propose a computationally efficient data poisoning attack algorithm that can compromise the performance of fair learning algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data