JobFair: A Framework for Benchmarking Gender Hiring Bias in Large   Language Models

Ze Wang; Zekun Wu; Xin Guan; Michael Thaler; Adriano Koshiyama; Skylar; Lu; Sachin Beepath; Ediz Ertekin Jr.; Maria Perez-Ortiz

arXiv:2406.15484·cs.CL·January 20, 2025·1 cites

JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models

Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar, Lu, Sachin Beepath, Ediz Ertekin Jr., Maria Perez-Ortiz

PDF

Open Access 1 Video

TL;DR

This paper introduces a comprehensive framework for benchmarking gender hiring bias in Large Language Models, revealing significant biases and overdebiasing issues, with detailed metrics and analysis across multiple models and industries.

Contribution

It presents a novel bias construct based on economics and legal principles, along with rigorous metrics and analysis of biases in ten LLMs, including industry-specific insights.

Findings

01

Seven out of ten LLMs show gender bias against males in at least one industry

02

Healthcare industry exhibits the most bias against males

03

Bias performance remains consistent across different resume qualities

Abstract

The use of Large Language Models (LLMs) in hiring has led to legislative actions to protect vulnerable demographic groups. This paper presents a novel framework for benchmarking hierarchical gender hiring bias in Large Language Models (LLMs) for resume scoring, revealing significant issues of reverse gender hiring bias and overdebiasing. Our contributions are fourfold: Firstly, we introduce a new construct grounded in labour economics, legal principles, and critiques of current bias benchmarks: hiring bias can be categorized into two types: Level bias (difference in the average outcomes between demographic counterfactual groups) and Spread bias (difference in the variance of outcomes between demographic counterfactual groups); Level bias can be further subdivided into statistical bias (i.e. changing with non-demographic content) and taste-based bias (i.e. consistent regardless of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models· underline

Taxonomy

TopicsGender Studies in Language · Hate Speech and Cyberbullying Detection · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Residual Connection · Multi-Head Attention · Weight Decay · Softmax · Layer Normalization