Towards Equal Gender Representation in the Annotations of Toxic Language Detection
Elizabeth Excell, Noura Al Moubayed

TL;DR
This paper investigates how gender differences in comment annotation influence toxicity detection models, revealing biases and proposing data removal strategies to improve fairness and reduce gender-based prediction disparities.
Contribution
It uncovers gender-based annotation biases in toxicity datasets and demonstrates effective mitigation techniques to enhance fairness in toxicity classifiers.
Findings
BERT associates toxic comments with male annotators, predicting 67.7% of toxic comments as male-annotated.
Removing offensive words and highly toxic comments reduces bias by 55.5%.
Models trained on female-annotated data perform 1.8% better in fairness metrics.
Abstract
Classifiers tend to propagate biases present in the data on which they are trained. Hence, it is important to understand how the demographic identities of the annotators of comments affect the fairness of the resulting model. In this paper, we focus on the differences in the ways men and women annotate comments for toxicity, investigating how these differences result in models that amplify the opinions of male annotators. We find that the BERT model as-sociates toxic comments containing offensive words with male annotators, causing the model to predict 67.7% of toxic comments as having been annotated by men. We show that this disparity between gender predictions can be mitigated by removing offensive words and highly toxic comments from the training data. We then apply the learned associations between gender and language to toxic language classifiers, finding that models trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout · Dense Connections · Softmax
