Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias
Yarden Tal, Inbal Magar, Roy Schwartz

TL;DR
Larger pretrained language models tend to exhibit more stereotypical gender biases in some evaluations, but they also make fewer gender-related errors overall, revealing complex effects of model size on bias.
Contribution
This study systematically analyzes how increasing model size impacts gender bias and error patterns across multiple NLP models and evaluation methods.
Findings
Larger models show higher bias scores in prompt-based bias tests.
They make fewer gender errors in downstream tasks.
The proportion of stereotypical errors increases with model size.
Abstract
The size of pretrained models is increasing, and so is their performance on a variety of NLP tasks. However, as their memorization capacity grows, they might pick up more social biases. In this work, we examine the connection between model size and its gender bias (specifically, occupational gender bias). We measure bias in three masked language model families (RoBERTa, DeBERTa, and T5) in two setups: directly using prompt based method, and using a downstream task (Winogender). We find on the one hand that larger models receive higher bias scores on the former task, but when evaluated on the latter, they make fewer gender errors. To examine these potentially conflicting results, we carefully investigate the behavior of the different models on Winogender. We find that while larger models outperform smaller ones, the probability that their mistakes are caused by gender bias is higher.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
MethodsHow do I file a dispute with Expedia?*DisputeFastService · DeBERTa
