Exploring the Influence of Label Aggregation on Minority Voices:   Implications for Dataset Bias and Model Training

Mugdha Pandya; Nafise Sadat Moosavi; Diana Maynard

arXiv:2412.04025·cs.CL·December 6, 2024

Exploring the Influence of Label Aggregation on Minority Voices: Implications for Dataset Bias and Model Training

Mugdha Pandya, Nafise Sadat Moosavi, Diana Maynard

PDF

Open Access

TL;DR

This paper examines how common label aggregation methods in dataset annotation can unintentionally suppress minority opinions, affecting class distributions and model biases in sexism detection tasks.

Contribution

It analyzes the impact of label aggregation strategies on minority opinion representation and discusses potential biases introduced during dataset creation and model training.

Findings

01

Standard aggregation can silence minority opinions

02

Aggregation methods influence class distribution and bias

03

Models may amplify dataset biases

Abstract

Resolving disagreement in manual annotation typically consists of removing unreliable annotators and using a label aggregation strategy such as majority vote or expert opinion to resolve disagreement. These may have the side-effect of silencing or under-representing minority but equally valid opinions. In this paper, we study the impact of standard label aggregation strategies on minority opinion representation in sexism detection. We investigate the quality and value of minority annotations, and then examine their effect on the class distributions in gold labels, as well as how this affects the behaviour of models trained on the resulting datasets. Finally, we discuss the potential biases introduced by each method and how they can be amplified by the models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques