Guiding LLM Decision-Making with Fairness Reward Models

Zara Hall; Melanie Subbiah; Thomas P Zollo; Kathleen McKeown; Richard Zemel

arXiv:2507.11344·cs.LG·July 16, 2025

Guiding LLM Decision-Making with Fairness Reward Models

Zara Hall, Melanie Subbiah, Thomas P Zollo, Kathleen McKeown, Richard Zemel

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a Fairness Reward Model that guides large language models to make more equitable decisions in high-stakes scenarios by down-weighting biased reasoning, improving fairness without sacrificing accuracy.

Contribution

The paper presents a novel, transferable Fairness Reward Model trained on weakly supervised data to enhance fairness in LLM decision-making across various tasks and domains.

Findings

01

Improves fairness in high-stakes decisions like recidivism prediction and social media moderation.

02

Transfers across tasks, domains, and model types without additional fine-tuning.

03

Maintains or surpasses baseline accuracy while increasing fairness.

Abstract

Large language models are increasingly used to support high-stakes decisions, potentially influencing who is granted bail or receives a loan. Naive chain-of-thought sampling can improve average decision accuracy, but has also been shown to amplify unfair bias. To address this challenge and enable the trustworthy use of reasoning models in high-stakes decision-making, we propose a framework for training a generalizable Fairness Reward Model (FRM). Our model assigns a fairness score to LLM reasoning, enabling the system to down-weight biased trajectories and favor equitable ones when aggregating decisions across reasoning chains. We show that a single Fairness Reward Model, trained on weakly supervised, LLM-annotated examples of biased versus unbiased reasoning, transfers across tasks, domains, and model families without additional fine-tuning. Applied to real-world decision-making tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zarahall/fairness-prms
noneOfficial

Videos

Guiding LLM Decision-Making with Fairness Reward Models· slideslive

Taxonomy

TopicsLaw, Economics, and Judicial Systems · Digitalization, Law, and Regulation · Artificial Intelligence in Law