Are Emily and Greg Still More Employable than Lakisha and Jamal? Investigating Algorithmic Hiring Bias in the Era of ChatGPT
Akshaj Kumar Veldanda, Fabian Grob, Shailja Thakur, Hammond Pearce,, Benjamin Tan, Ramesh Karri, Siddharth Garg

TL;DR
This study investigates bias in large language models used for algorithmic hiring, replicating classic experiments to evaluate fairness across protected attributes like race, gender, and political views, revealing robustness in race and gender but biases in pregnancy and political affiliation.
Contribution
It replicates a seminal hiring bias experiment on modern LLMs, providing new insights into their fairness and bias sources in recruitment tasks.
Findings
LLMs are robust across race and gender attributes.
Bias observed in pregnancy and political affiliation detection.
Contrastive input decoding helps identify bias sources.
Abstract
Large Language Models (LLMs) such as GPT-3.5, Bard, and Claude exhibit applicability across numerous tasks. One domain of interest is their use in algorithmic hiring, specifically in matching resumes with job categories. Yet, this introduces issues of bias on protected attributes like gender, race and maternity status. The seminal work of Bertrand & Mullainathan (2003) set the gold-standard for identifying hiring bias via field experiments where the response rate for identical resumes that differ only in protected attributes, e.g., racially suggestive names such as Emily or Lakisha, is compared. We replicate this experiment on state-of-art LLMs (GPT-3.5, Bard, Claude and Llama) to evaluate bias (or lack thereof) on gender, race, maternity status, pregnancy status, and political affiliation. We evaluate LLMs on two tasks: (1) matching resumes to job categories; and (2) summarizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Layer Normalization · Dropout · Weight Decay · {Dispute@FaQ-s}How to file a dispute with Expedia? · Softmax · Byte Pair Encoding
