Towards Equal Opportunity Fairness through Adversarial Learning
Xudong Han, Timothy Baldwin, Trevor Cohn

TL;DR
This paper introduces an augmented adversarial training method that explicitly models equal opportunity fairness in NLP, leading to improved bias mitigation and better performance-fairness trade-offs.
Contribution
It proposes a novel augmented discriminator for adversarial training that explicitly incorporates equal opportunity considerations in bias mitigation.
Findings
Significant improvement over standard adversarial debiasing methods.
Enhanced performance-fairness trade-off in experiments.
Effective modeling of equal opportunity in NLP bias mitigation.
Abstract
Adversarial training is a common approach for bias mitigation in natural language processing. Although most work on debiasing is motivated by equal opportunity, it is not explicitly captured in standard adversarial training. In this paper, we propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features and more explicitly model equal opportunity. Experimental results over two datasets show that our method substantially improves over standard adversarial debiasing methods, in terms of the performance--fairness trade-off.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
