No Word Embedding Model Is Perfect: Evaluating the Representation   Accuracy for Social Bias in the Media

Maximilian Splieth\"over; Maximilian Keiff; Henning Wachsmuth

arXiv:2211.03634·cs.CL·November 8, 2022·1 cites

No Word Embedding Model Is Perfect: Evaluating the Representation Accuracy for Social Bias in the Media

Maximilian Splieth\"over, Maximilian Keiff, Henning Wachsmuth

PDF

Open Access 1 Repo

TL;DR

This paper evaluates how different word embedding algorithms measure social bias in US news articles, revealing limitations of standard bias quantification methods and proposing improvements aligned with psychological insights.

Contribution

It systematically compares embedding algorithms for bias measurement in news, highlighting their shortcomings and proposing methods that better reflect psychological expectations.

Findings

01

Standard bias measures do not align well with psychology literature.

02

Proposed algorithms reduce bias measurement gaps.

03

Embedding models still do not fully match expected social biases.

Abstract

News articles both shape and reflect public opinion across the political spectrum. Analyzing them for social bias can thus provide valuable insights, such as prevailing stereotypes in society and the media, which are often adopted by NLP models trained on respective data. Recent work has relied on word embedding bias measures, such as WEAT. However, several representation issues of embeddings can harm the measures' accuracy, including low-resource settings and token frequency differences. In this work, we study what kind of embedding algorithm serves best to accurately measure types of social bias known to exist in US online news articles. To cover the whole spectrum of political bias in the US, we collect 500k articles and review psychology literature with respect to expected social bias. We then quantify social bias using WEAT along with embedding algorithms that account for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

webis-de/emnlp-22
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Media Influence and Politics

MethodsALIGN