Fair Machine Learning under Limited Demographically Labeled Data
Mustafa Safa Ozdayi, Murat Kantarcioglu, Rishabh Iyer

TL;DR
This paper develops fair machine learning algorithms that perform well with minimal demographic data, addressing privacy concerns and demonstrating effectiveness even with only 0.1% demographic labels available.
Contribution
It introduces algorithms that balance utility and fairness when demographic labels are scarce, outperforming Rawlsian methods with very limited demographic information.
Findings
Algorithms outperform Rawlsian methods with 0.1% demographic data
Main algorithm is adaptable to multiple objectives
Extended to be robust against label noise
Abstract
Research has shown that, machine learning models might inherit and propagate undesired social biases encoded in the data. To address this problem, fair training algorithms are developed. However, most algorithms assume we know demographic/sensitive data features such as gender and race. This assumption falls short in scenarios where collecting demographic information is not feasible due to privacy concerns, and data protection policies. A recent line of work develops fair training methods that can function without any demographic feature on the data, that are collectively referred as Rawlsian methods. Yet, we show in experiments that, Rawlsian methods tend to exhibit relatively high bias. Given this, we look at the middle ground between the previous approaches, and consider a setting where we know the demographic attributes for only a small subset of our data. In such a setting, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI
