Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models
Yang Liu

TL;DR
This paper introduces new evaluation measures for social biases in masked language models that utilize distributional divergence metrics, leading to more robust and interpretable bias assessments.
Contribution
The paper proposes representing PLL score sets as Gaussian distributions and employs KL and JS divergence for bias evaluation, improving robustness over previous methods.
Findings
Proposed measures outperform previous ones in robustness.
New measures are more interpretable.
Validated on StereoSet and CrowS-Pairs datasets.
Abstract
Many evaluation measures are used to evaluate social biases in masked language models (MLMs). However, we find that these previously proposed evaluation measures are lacking robustness in scenarios with limited datasets. This is because these measures are obtained by comparing the pseudo-log-likelihood (PLL) scores of the stereotypical and anti-stereotypical samples using an indicator function. The disadvantage is the limited mining of the PLL score sets without capturing its distributional information. In this paper, we represent a PLL score set as a Gaussian distribution and use Kullback Leibler (KL) divergence and Jensen Shannon (JS) divergence to construct evaluation measures for the distributions of stereotypical and anti-stereotypical PLL scores. Experimental results on the publicly available datasets StereoSet (SS) and CrowS-Pairs (CP) show that our proposed measures are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Natural Language Processing Techniques · Authorship Attribution and Profiling
MethodsSparse Evolutionary Training
