Evaluating the Promise and Pitfalls of LLMs in Hiring Decisions
Eitan Anzenberg, Arunava Samajpati, Sivasankaran Chandrasekar, Varun Kacholia

TL;DR
This paper benchmarks large language models against a proprietary domain-specific hiring model, revealing that specialized models outperform general LLMs in accuracy and fairness, emphasizing the importance of domain-specific design and bias mitigation in AI hiring tools.
Contribution
The study provides a comprehensive comparison of state-of-the-art LLMs with a proprietary hiring model, demonstrating the advantages of domain-specific models in accuracy and fairness in hiring decisions.
Findings
Match Score outperforms LLMs in accuracy (ROC AUC 0.85 vs 0.77)
Match Score achieves near-parity in demographic impact ratios
Domain-specific models better mitigate societal biases in hiring scenarios
Abstract
The use of large language models (LLMs) in hiring promises to streamline candidate screening, but it also raises serious concerns regarding accuracy and algorithmic bias where sufficient safeguards are not in place. In this work, we benchmark several state-of-the-art foundational LLMs - including models from OpenAI, Anthropic, Google, Meta, and Deepseek, and compare them with our proprietary domain-specific hiring model (Match Score) for job candidate matching. We evaluate each model's predictive accuracy (ROC AUC, Precision-Recall AUC, F1-score) and fairness (impact ratio of cut-off analysis across declared gender, race, and intersectional subgroups). Our experiments on a dataset of roughly 10,000 real-world recent candidate-job pairs show that Match Score outperforms the general-purpose LLMs on accuracy (ROC AUC 0.85 vs 0.77) and achieves significantly more equitable outcomes across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
