Rejected Dialects: Biases Against African American Language in Reward Models
Joel Mire, Zubin Trivadi Aysola, Daniel Chechelnitsky, Nicholas Deas,, Chrysoula Zerva, Maarten Sap

TL;DR
This paper investigates biases against African American Language in reward models used for large language models, revealing significant preferences against AAL and highlighting ethical concerns about fairness and representation.
Contribution
The authors introduce a framework for evaluating dialect biases in reward models and provide a case study on biases against African American Language, revealing systematic preferences and steering behaviors.
Findings
Reward models are less aligned with human preferences on AAL texts (-4% accuracy).
Reward models disprefer AAL-aligned texts compared to WME.
Reward models steer conversations toward WME even when prompted with AAL texts.
Abstract
Preference alignment via reward models helps build safe, helpful, and reliable large language models (LLMs). However, subjectivity in preference judgments and the lack of representative sampling in preference data collection can introduce new biases, hindering reward models' fairness and equity. In this work, we introduce a framework for evaluating dialect biases in reward models and conduct a case study on biases against African American Language (AAL) through several experiments comparing reward model preferences and behavior on paired White Mainstream English (WME) and both machine-translated and human-written AAL corpora. We show that reward models are less aligned with human preferences when processing AAL texts vs. WME ones (-4\% accuracy on average), frequently disprefer AAL-aligned texts vs. WME-aligned ones, and steer conversations toward WME, even when prompted with AAL texts.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsLanguage, Linguistics, Cultural Analysis
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
