The Confidence Trap: Gender Bias and Predictive Certainty in LLMs
Ahmed Sabir, Markus K\"angsepp, and Rajesh Sharma

TL;DR
This paper investigates how well Large Language Models' confidence scores reflect gender bias, introducing a new fairness-aware calibration metric and analyzing model calibration in gendered pronoun resolution tasks.
Contribution
It provides a fairness-aware evaluation of LLM confidence calibration and introduces the Gender-ECE metric to measure gender disparities.
Findings
Gemma-2 shows the worst calibration among models.
Calibration scores can reveal gender bias disparities.
The new Gender-ECE metric effectively measures gender-related calibration issues.
Abstract
The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the alignment between LLM-predicted confidence and human-annotated bias judgments. Focusing on gender bias, the research investigates probability confidence calibration in contexts involving gendered pronoun resolution. The goal is to evaluate if calibration metrics based on predicted confidence scores effectively capture fairness-related disparities in LLMs. The results show that, among the six state-of-the-art models, Gemma-2 demonstrates the worst calibration according to the gender bias benchmark. The primary contribution of this work is a fairness-aware evaluation of LLMs' confidence calibration, offering guidance for ethical deployment. In addition, we introduce a new calibration metric, Gender-ECE,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEthics and Social Impacts of AI · Topic Modeling · Computational and Text Analysis Methods
