Are Bias Evaluation Methods Biased ?
Lina Berrayana, Sean Rooney, Luis Garc\'es-Erice, Ioana Giurgiu

TL;DR
This paper examines the robustness of bias evaluation benchmarks for Large Language Models, revealing that different methods produce inconsistent model rankings and offering recommendations for more reliable evaluation practices.
Contribution
It systematically compares various bias evaluation methods, highlighting their discrepancies and providing guidance for more robust bias assessment in AI models.
Findings
Different bias evaluation methods yield inconsistent model rankings.
Benchmark methods vary significantly in their bias assessments.
Recommendations are provided for improving bias evaluation robustness.
Abstract
The creation of benchmarks to evaluate the safety of Large Language Models is one of the key activities within the trusted AI community. These benchmarks allow models to be compared for different aspects of safety such as toxicity, bias, harmful behavior etc. Independent benchmarks adopt different approaches with distinct data sets and evaluation methods. We investigate how robust such benchmarks are by using different approaches to rank a set of representative models for bias and compare how similar are the overall rankings. We show that different but widely used bias evaluations methods result in disparate model rankings. We conclude with recommendations for the community in the usage of such benchmarks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)
