TL;DR
This paper reveals how adversaries can intentionally manipulate data to undermine the fairness of machine learning systems, highlighting vulnerabilities in current fairness measures.
Contribution
It introduces novel data poisoning attacks specifically designed to target and compromise the fairness of machine learning models.
Findings
Proposed two new fairness attack methods: anchoring and influence attacks.
Experiments demonstrate the effectiveness of these attacks in degrading fairness.
Highlighting the need for robustness in fairness measures against adversarial manipulation.
Abstract
Algorithmic fairness has attracted significant attention in recent years, with many quantitative measures suggested for characterizing the fairness of different machine learning algorithms. Despite this interest, the robustness of those fairness measures with respect to an intentional adversarial attack has not been properly addressed. Indeed, most adversarial machine learning has focused on the impact of malicious attacks on the accuracy of the system, without any regard to the system's fairness. We propose new types of data poisoning attacks where an adversary intentionally targets the fairness of a system. Specifically, we propose two families of attacks that target fairness measures. In the anchoring attack, we skew the decision boundary by placing poisoned points near specific target points to bias the outcome. In the influence attack on fairness, we aim to maximize the covariance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
