AI Alignment Amplifies the Role of Race, Gender, and Disability in Hiring Decisions
Ze Wang, Guobin Shen, Michael Thaler

TL;DR
This study investigates how language model alignment influences demographic biases in hiring decisions, revealing amplification of advantages for female and Black candidates and disadvantages for disabled candidates.
Contribution
It provides large-scale empirical evidence that alignment amplifies existing demographic biases in language models' hiring decisions.
Findings
Alignment amplifies demographic biases by over 300%.
Language models reverse racial discrimination patterns after alignment.
Alignment increases returns to skills for marginalized groups.
Abstract
Humans increasingly delegate decisions to language models, yet whether these systems reproduce or reshape human patterns of discrimination remains unclear. Here we run a large-scale study to analyse whether language models use demographic information in hiring decisions. We show, across 27 models and 177 occupations, that language models give female and Black candidates hiring advantages relative to otherwise-comparable male and white candidates, while giving disabled candidates disadvantages. The differences are meaningful in magnitude: the role of race, gender, and disability status is comparable to six months to one year of additional education. Post-training alignment is the primary driver: relative to matched pre-trained models, alignment amplifies advantages for female and Black candidates by 325% and 330%, and disadvantages for disabled candidates by 171%. Compared with previous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
