Adversarial Examples Generation for Reducing Implicit Gender Bias in   Pre-trained Models

Wenqian Ye; Fei Xu; Yaojia Huang; Cassie Huang; Ji A

arXiv:2110.01094·cs.CL·October 5, 2021

Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models

Wenqian Ye, Fei Xu, Yaojia Huang, Cassie Huang, Ji A

PDF

Open Access

TL;DR

This paper introduces a method to generate sentence-level implicit gender bias examples and a metric to measure gender bias, aiming to improve the robustness of pre-trained models against gender bias.

Contribution

It presents a novel approach for automatic generation of implicit gender bias samples and a bias measurement metric for pre-trained NLP models.

Findings

01

Generated examples can effectively reveal gender bias in models.

02

The bias metric correlates with model performance on biased data.

03

Method aids in evaluating and reducing gender bias in NLP models.

Abstract

Over the last few years, Contextualized Pre-trained Neural Language Models, such as BERT, GPT, have shown significant gains in various NLP tasks. To enhance the robustness of existing pre-trained models, one way is adversarial examples generation and evaluation for conducting data augmentation or adversarial learning. In the meanwhile, gender bias embedded in the models seems to be a serious problem in practical applications. Many researches have covered the gender bias produced by word-level information(e.g. gender-stereotypical occupations), while few researchers have investigated the sentence-level cases and implicit cases. In this paper, we proposed a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias. Samples generated by our method will be evaluated in terms of accuracy. The metric will be used to guide the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Discriminative Fine-Tuning · Cosine Annealing · WordPiece · Linear Warmup With Cosine Annealing · Adam · Attention Dropout · Residual Connection