Measuring Validity in LLM-based Resume Screening
Jane Castleman, Zeyu Shen, Blossom Metevier, Max Springer, Aleksandra Korolova

TL;DR
This paper introduces a framework for evaluating the validity of LLM-based resume screening systems by constructing a large, ground-truth dataset, revealing inconsistencies and biases in model decisions.
Contribution
It presents a novel dataset and methodology for systematically measuring the validity of LLM resume screening, addressing the lack of ground truth in current evaluations.
Findings
Many LLMs struggle to consistently identify more qualified candidates.
Models often do not abstain reliably when candidates are equally qualified.
Biases exist in demographic group selection rates, sometimes favoring marginalized groups.
Abstract
Resume screening is perceived as a particularly suitable task for LLMs given their ability to analyze natural language; thus many entities rely on general purpose LLMs without further adapting them to the task. While researchers have shown that some LLMs are biased in their selection rates of different demographics, studies measuring the validity of LLM decisions are limited. One of the difficulties in externally measuring validity stems from lack of access to a large corpus of resumes for whom the ground truth in their ranking is known and that has not already been used for LLM training. In this work, we overcome this challenge by systematically constructing a large dataset of resumes tailored to particular jobs that are directly comparable, with a known ground truth of superiority. We then use the constructed dataset to measure the validity of ranking decisions made by various LLMs,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Topic Modeling · Authorship Attribution and Profiling
