Diverse Adversaries for Mitigating Bias in Training
Xudong Han, Timothy Baldwin, Trevor Cohn

TL;DR
This paper introduces a novel adversarial learning method using multiple diverse discriminators with orthogonal representations, significantly improving bias mitigation and training stability in language models.
Contribution
The paper proposes a new adversarial training approach with multiple diverse discriminators to better mitigate bias and enhance training stability.
Findings
Substantially reduces bias compared to standard methods
Improves training stability in adversarial learning
Demonstrates effectiveness on language models
Abstract
Adversarial learning can learn fairer and less biased models of language than standard methods. However, current adversarial techniques only partially mitigate model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results show that our method substantially improves over standard adversarial removal methods, in terms of reducing bias and the stability of training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Topic Modeling
