Adversarial Examples Generation for Reducing Implicit Gender Bias in Pre-trained Models
Wenqian Ye, Fei Xu, Yaojia Huang, Cassie Huang, Ji A

TL;DR
This paper introduces a method to generate sentence-level implicit gender bias examples and a metric to measure gender bias, aiming to improve the robustness of pre-trained models against gender bias.
Contribution
It presents a novel approach for automatic generation of implicit gender bias samples and a bias measurement metric for pre-trained NLP models.
Findings
Generated examples can effectively reveal gender bias in models.
The bias metric correlates with model performance on biased data.
Method aids in evaluating and reducing gender bias in NLP models.
Abstract
Over the last few years, Contextualized Pre-trained Neural Language Models, such as BERT, GPT, have shown significant gains in various NLP tasks. To enhance the robustness of existing pre-trained models, one way is adversarial examples generation and evaluation for conducting data augmentation or adversarial learning. In the meanwhile, gender bias embedded in the models seems to be a serious problem in practical applications. Many researches have covered the gender bias produced by word-level information(e.g. gender-stereotypical occupations), while few researchers have investigated the sentence-level cases and implicit cases. In this paper, we proposed a method to automatically generate implicit gender bias samples at sentence-level and a metric to measure gender bias. Samples generated by our method will be evaluated in terms of accuracy. The metric will be used to guide the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Discriminative Fine-Tuning · Cosine Annealing · WordPiece · Linear Warmup With Cosine Annealing · Adam · Attention Dropout · Residual Connection
