Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms
Kristof Meding, Thilo Hagendorff

TL;DR
This paper explores how fairness metrics in machine learning can be manipulated through 'fairness hacking', potentially concealing unfairness, and discusses countermeasures to mitigate such practices.
Contribution
It introduces the concepts of intra-metric and inter-metric fairness hacking, illustrating their impact and proposing initial countermeasures to address these issues.
Findings
Demonstrated fairness hacking on real datasets
Identified two categories of fairness hacking and their implications
Discussed potential countermeasures for intra-metric hacking
Abstract
Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI
