[Re] Badder Seeds: Reproducing the Evaluation of Lexical Methods for Bias Measurement
Jille van der Togt, Lea Tiyavorabun, Matteo Rosati, Giulio Starace

TL;DR
This study reproduces the original work on bias measurement in NLP, confirming that seed lexicons often contain biases that impact their effectiveness, highlighting the need for careful construction and validation.
Contribution
It provides a reproducibility analysis of the original study, validating that seed sets for bias measurement are often biased and fragile, emphasizing the importance of thorough checking.
Findings
Seed sets often contain biases affecting their use as baselines.
Reproduced results largely support the original claim about seed set fragility.
Minor differences in some results do not change the overall conclusion.
Abstract
Combating bias in NLP requires bias measurement. Bias measurement is almost always achieved by using lexicons of seed terms, i.e. sets of words specifying stereotypes or dimensions of interest. This reproducibility study focuses on the original authors' main claim that the rationale for the construction of these lexicons needs thorough checking before usage, as the seeds used for bias measurement can themselves exhibit biases. The study aims to evaluate the reproducibility of the quantitative and qualitative results presented in the paper and the conclusions drawn thereof. We reproduce most of the results supporting the original authors' general claim: seed sets often suffer from biases that affect their performance as a baseline for bias metrics. Generally, our results mirror the original paper's. They are slightly different on select occasions, but not in ways that undermine the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Software Engineering Research
