DeFrame: Debiasing Large Language Models Against Framing Effects
Kahee Lim, Soyeon Kim, Steven Euijong Whang

TL;DR
This paper investigates how different ways of phrasing prompts affect fairness in large language models, revealing framing disparities and proposing a debiasing method to improve consistency and fairness across framings.
Contribution
It introduces framing disparity as a new measure, shows existing debiasing methods' limitations, and proposes a framing-aware debiasing technique to enhance fairness consistency.
Findings
Framing significantly impacts fairness scores.
Existing debiasing methods do not reduce framing disparities.
The proposed method improves fairness and robustness across framings.
Abstract
As large language models (LLMs) are increasingly deployed in real-world applications, ensuring their fair responses across demographics has become crucial. Despite many efforts, an ongoing challenge is hidden bias: LLMs appear fair under standard evaluations, but can produce biased responses outside those evaluation settings. In this paper, we identify framing -- differences in how semantically equivalent prompts are expressed (e.g., "A is better than B" vs. "B is worse than A") -- as an underexplored contributor to this gap. We first introduce the concept of "framing disparity" to quantify the impact of framing on fairness evaluation. By augmenting fairness evaluation benchmarks with alternative framings, we find that (1) fairness scores vary significantly with framing and (2) existing debiasing methods improve overall (i.e., frame-averaged) fairness, but often fail to reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Computational and Text Analysis Methods · Explainable Artificial Intelligence (XAI)
