Diverse Adversaries for Mitigating Bias in Training

Xudong Han; Timothy Baldwin; Trevor Cohn

arXiv:2101.10001·cs.LG·January 26, 2021·1 cites

Diverse Adversaries for Mitigating Bias in Training

Xudong Han, Timothy Baldwin, Trevor Cohn

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel adversarial learning method using multiple diverse discriminators with orthogonal representations, significantly improving bias mitigation and training stability in language models.

Contribution

The paper proposes a new adversarial training approach with multiple diverse discriminators to better mitigate bias and enhance training stability.

Findings

01

Substantially reduces bias compared to standard methods

02

Improves training stability in adversarial learning

03

Demonstrates effectiveness on language models

Abstract

Adversarial learning can learn fairer and less biased models of language than standard methods. However, current adversarial techniques only partially mitigate model bias, added to which their training procedures are often unstable. In this paper, we propose a novel approach to adversarial learning based on the use of multiple diverse discriminators, whereby discriminators are encouraged to learn orthogonal hidden representations from one another. Experimental results show that our method substantially improves over standard adversarial removal methods, in terms of reducing bias and the stability of training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HanXudong/Diverse_Adversaries_for_Mitigating_Bias_in_Training
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Topic Modeling