Understanding the Interplay of Scale, Data, and Bias in Language Models:   A Case Study with BERT

Muhammad Ali; Swetasudha Panda; Qinlan Shen; Michael Wick; Ari Kobren

arXiv:2407.21058·cs.CL·August 1, 2024

Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT

Muhammad Ali, Swetasudha Panda, Qinlan Shen, Michael Wick, Ari Kobren

PDF

Open Access

TL;DR

This study examines how the scale of BERT models and the nature of pre-training data influence social biases, revealing that larger models and different data sources affect bias types and levels during both language modeling and downstream tasks.

Contribution

It provides a detailed analysis of how pre-training data and model scale interact to shape social biases in BERT, highlighting the qualitative impact of data sources on bias evolution.

Findings

01

Pre-training data significantly affects bias development with scale.

02

Larger models on internet data show increased toxicity.

03

Biases during downstream tasks tend to decrease with scale.

Abstract

In the current landscape of language model research, larger models, larger datasets and more compute seems to be the only way to advance towards intelligence. While there have been extensive studies of scaling laws and models' scaling behaviors, the effect of scale on a model's social biases and stereotyping tendencies has received less attention. In this study, we explore the influence of model scale and pre-training data on its learnt social biases. We focus on BERT -- an extremely popular language model -- and investigate biases as they show up during language modeling (upstream), as well as during classification applications after fine-tuning (downstream). Our experiments on four architecture sizes of BERT demonstrate that pre-training data substantially influences how upstream biases evolve with model scale. With increasing scale, models pre-trained on large internet scrapes like…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Dense Connections · Dropout · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Linear Decay · WordPiece