Do Prevalent Bias Metrics Capture Allocational Harms from LLMs?

Hannah Cyberey; Yangfeng Ji; David Evans

arXiv:2408.01285·cs.CL·March 9, 2026

Do Prevalent Bias Metrics Capture Allocational Harms from LLMs?

Hannah Cyberey, Yangfeng Ji, David Evans

PDF

Open Access

TL;DR

This paper critically examines whether current bias metrics effectively measure allocational harms from large language models, revealing their limitations and emphasizing the importance of considering decision-making processes in bias assessment.

Contribution

It evaluates the predictive validity of existing bias metrics for allocational harms across multiple LLMs and tasks, highlighting their inadequacies.

Findings

01

Common bias metrics fail to reliably capture allocation disparities.

02

Metrics based on performance gap and distribution distance are insufficient.

03

The study underscores the importance of considering decision processes in bias evaluation.

Abstract

Allocational harms occur when resources or opportunities are unfairly withheld from specific groups. Many proposed bias measures ignore the discrepancy between predictions, which are what the proposed methods consider, and decisions that are made as a result of those predictions. Our work examines the reliability of current bias metrics in assessing allocational harms arising from predictions of large language models (LLMs). We evaluate their predictive validity and utility for model selection across ten LLMs and two allocation tasks. Our results reveal that commonly-used bias metrics based on average performance gap and distribution distance fail to reliably capture group disparities in allocation outcomes. Our work highlights the need to account for how model predictions are used in decisions, in particular in contexts where they are influenced by how limited resources are allocated.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEuropean and International Law Studies · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning