Guiding LLM Decision-Making with Fairness Reward Models
Zara Hall, Melanie Subbiah, Thomas P Zollo, Kathleen McKeown, Richard Zemel

TL;DR
This paper introduces a Fairness Reward Model that guides large language models to make more equitable decisions in high-stakes scenarios by down-weighting biased reasoning, improving fairness without sacrificing accuracy.
Contribution
The paper presents a novel, transferable Fairness Reward Model trained on weakly supervised data to enhance fairness in LLM decision-making across various tasks and domains.
Findings
Improves fairness in high-stakes decisions like recidivism prediction and social media moderation.
Transfers across tasks, domains, and model types without additional fine-tuning.
Maintains or surpasses baseline accuracy while increasing fairness.
Abstract
Large language models are increasingly used to support high-stakes decisions, potentially influencing who is granted bail or receives a loan. Naive chain-of-thought sampling can improve average decision accuracy, but has also been shown to amplify unfair bias. To address this challenge and enable the trustworthy use of reasoning models in high-stakes decision-making, we propose a framework for training a generalizable Fairness Reward Model (FRM). Our model assigns a fairness score to LLM reasoning, enabling the system to down-weight biased trajectories and favor equitable ones when aggregating decisions across reasoning chains. We show that a single Fairness Reward Model, trained on weakly supervised, LLM-annotated examples of biased versus unbiased reasoning, transfers across tasks, domains, and model families without additional fine-tuning. Applied to real-world decision-making tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsLaw, Economics, and Judicial Systems · Digitalization, Law, and Regulation · Artificial Intelligence in Law
