Why Don't Prompt-Based Fairness Metrics Correlate?
Abdelrahman Zayed, Goncalo Mordido, Ioana Baldini, Sarath Chandar

TL;DR
This paper investigates the inconsistency of prompt-based fairness metrics in large language models, identifies reasons for low correlation, and proposes CAIRO to improve metric agreement significantly.
Contribution
It introduces CAIRO, a method that enhances fairness metric correlation by augmenting prompts with multiple language models, addressing reliability issues in bias evaluation.
Findings
Significant increase in Pearson correlation from 0.3 and 0.18 to 0.90 and 0.98.
Demonstrates low initial agreement among existing fairness metrics.
Provides insights into reasons for poor correlation across fairness metrics.
Abstract
The widespread use of large language models has brought up essential questions about the potential biases these models might learn. This led to the development of several metrics aimed at evaluating and mitigating these biases. In this paper, we first demonstrate that prompt-based fairness metrics exhibit poor agreement, as measured by correlation, raising important questions about the reliability of fairness assessment using prompts. Then, we outline six relevant reasons why such a low correlation is observed across existing metrics. Based on these insights, we propose a method called Correlated Fairness Output (CAIRO) to enhance the correlation between fairness metrics. CAIRO augments the original prompts of a given fairness metric by using several pre-trained language models and then selects the combination of the augmented prompts that achieves the highest correlation across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEthics and Social Impacts of AI
