Your fairness may vary: Pretrained language model fairness in toxic text   classification

Ioana Baldini; Dennis Wei; Karthikeyan Natesan Ramamurthy; Mikhail; Yurochkin; Moninder Singh

arXiv:2108.01250·cs.CL·April 15, 2022·1 cites

Your fairness may vary: Pretrained language model fairness in toxic text classification

Ioana Baldini, Dennis Wei, Karthikeyan Natesan Ramamurthy, Mikhail, Yurochkin, Moninder Singh

PDF

Open Access

TL;DR

This paper highlights that pretrained language models for toxic text classification exhibit significant fairness variability, which is not solely dependent on model size, and demonstrates post-processing methods to enhance fairness without retraining.

Contribution

It reveals the variability of fairness in pretrained language models across different sizes and initializations, and adapts post-processing fairness techniques from tabular data to NLP models.

Findings

01

Fairness varies more than accuracy with training data size and initialization.

02

Model size explains little of the fairness variation.

03

Post-processing methods improve fairness without retraining.

Abstract

The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that focusing on accuracy measures alone can lead to models with wide variation in fairness characteristics. Specifically, we observe that fairness can vary even more than accuracy with increasing training data size and different random initializations. At the same time, we find that little of the fairness variation is explained by model size, despite claims in the literature. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection