TL;DR
This paper introduces an adversarial learning framework to reduce biases related to demographic groups in machine learning models, improving fairness without significantly sacrificing accuracy.
Contribution
It proposes a flexible adversarial approach that mitigates biases across various fairness definitions and learning tasks, applicable to text, census data, and multiple model types.
Findings
Reduces stereotyping in analogy completion tasks.
Achieves near-equalized odds on census data.
Maintains high prediction accuracy while reducing bias.
Abstract
Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to maximize the predictor's ability to predict Y while minimizing the adversary's ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
