Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT
Muhammad Ali, Swetasudha Panda, Qinlan Shen, Michael Wick, Ari Kobren

TL;DR
This study examines how the scale of BERT models and the nature of pre-training data influence social biases, revealing that larger models and different data sources affect bias types and levels during both language modeling and downstream tasks.
Contribution
It provides a detailed analysis of how pre-training data and model scale interact to shape social biases in BERT, highlighting the qualitative impact of data sources on bias evolution.
Findings
Pre-training data significantly affects bias development with scale.
Larger models on internet data show increased toxicity.
Biases during downstream tasks tend to decrease with scale.
Abstract
In the current landscape of language model research, larger models, larger datasets and more compute seems to be the only way to advance towards intelligence. While there have been extensive studies of scaling laws and models' scaling behaviors, the effect of scale on a model's social biases and stereotyping tendencies has received less attention. In this study, we explore the influence of model scale and pre-training data on its learnt social biases. We focus on BERT -- an extremely popular language model -- and investigate biases as they show up during language modeling (upstream), as well as during classification applications after fine-tuning (downstream). Our experiments on four architecture sizes of BERT demonstrate that pre-training data substantially influences how upstream biases evolve with model scale. With increasing scale, models pre-trained on large internet scrapes like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Dense Connections · Dropout · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay · WordPiece
