Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

Eitan Anzenberg; Arunava Samajpati; Sivasankaran Chandrasekar; Varun Kacholia

arXiv:2507.02087·cs.LG·July 29, 2025

Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions

Eitan Anzenberg, Arunava Samajpati, Sivasankaran Chandrasekar, Varun Kacholia

PDF

TL;DR

This paper benchmarks large language models against a proprietary domain-specific hiring model, revealing that specialized models outperform general LLMs in accuracy and fairness, emphasizing the importance of domain-specific design and bias mitigation in AI hiring tools.

Contribution

The study provides a comprehensive comparison of state-of-the-art LLMs with a proprietary hiring model, demonstrating the advantages of domain-specific models in accuracy and fairness in hiring decisions.

Findings

01

Match Score outperforms LLMs in accuracy (ROC AUC 0.85 vs 0.77)

02

Match Score achieves near-parity in demographic impact ratios

03

Domain-specific models better mitigate societal biases in hiring scenarios

Abstract

The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model's predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.