Assessing the Reliability of Word Embedding Gender Bias Measures

Yupei Du; Qixiang Fang; Dong Nguyen

arXiv:2109.04732·cs.CL·September 13, 2021·1 cites

Assessing the Reliability of Word Embedding Gender Bias Measures

Yupei Du, Qixiang Fang, Dong Nguyen

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the reliability of existing gender bias measures in word embeddings, focusing on their consistency across different conditions and factors, to improve their robustness and guide better research practices.

Contribution

It systematically assesses three types of reliability for gender bias measures in word embeddings, highlighting their limitations and influencing factors.

Findings

01

Bias scores vary with random seeds and scoring rules.

02

Reliability is affected by choice of words and measurement conditions.

03

Researchers should critically evaluate bias measures before application.

Abstract

Various measures have been proposed to quantify human-like social biases in word embeddings. However, bias scores based on these measures can suffer from measurement error. One indication of measurement quality is reliability, concerning the extent to which a measure produces consistent results. In this paper, we assess three types of reliability of word embedding gender bias measures, namely test-retest reliability, inter-rater consistency and internal consistency. Specifically, we investigate the consistency of bias scores across different choices of random seeds, scoring rules and words. Furthermore, we analyse the effects of various factors on these measures' reliability scores. Our findings inform better design of word embedding gender bias measures. Moreover, we urge researchers to be more critical about the application of such measures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlpsoc/reliability_bias
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Authorship Attribution and Profiling · Language and cultural evolution