TL;DR
This paper introduces a decision framework for selecting bias and fairness metrics tailored to specific LLM deployment contexts, emphasizing the importance of context-aware evaluation over generic benchmarks.
Contribution
It proposes a systematic approach to match use cases with relevant fairness metrics and releases an open-source library for practical implementation.
Findings
Fairness risks vary significantly across different prompt populations.
Benchmark performance alone is insufficient for reliable fairness assessment.
The framework effectively guides context-specific bias and fairness evaluation.
Abstract
Bias and fairness risks in Large Language Models (LLMs) vary substantially across deployment contexts, yet existing approaches lack systematic guidance for selecting appropriate evaluation metrics. We present a decision framework that maps LLM use cases, characterized by a model and population of prompts, to relevant bias and fairness metrics based on task type, whether prompts contain protected attribute mentions, and stakeholder priorities. Our framework addresses toxicity, stereotyping, counterfactual unfairness, and allocational harms, and introduces novel metrics based on stereotype classifiers and counterfactual adaptations of text similarity measures. We release an open-source Python library, \texttt{langfair}, for practical adoption. Extensive experiments on use cases across five LLMs and five prompt populations demonstrate that fairness risks cannot be reliably assessed from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
