Signal or Noise? Evaluating Large Language Models in Resume Screening Across Contextual Variations and Human Expert Benchmarks

Aryan Varshney; Venkat Ram Reddy Ganuthula

arXiv:2507.08019·cs.CL·July 14, 2025

Signal or Noise? Evaluating Large Language Models in Resume Screening Across Contextual Variations and Human Expert Benchmarks

Aryan Varshney, Venkat Ram Reddy Ganuthula

PDF

TL;DR

This study evaluates large language models' consistency and reliability in resume screening across different contexts and compares their performance to human experts, revealing significant differences and adaptive behaviors.

Contribution

It provides a comprehensive analysis of LLMs' performance variability and their divergence from human judgment in automated resume screening tasks.

Findings

01

LLMs show significant performance differences across contexts.

02

GPT adapts strongly to company context, more than other LLMs.

03

All LLMs differ significantly from human experts in evaluations.

Abstract

This study investigates whether large language models (LLMs) exhibit consistent behavior (signal) or random variation (noise) when screening resumes against job descriptions, and how their performance compares to human experts. Using controlled datasets, we tested three LLMs (Claude, GPT, and Gemini) across contexts (No Company, Firm1 [MNC], Firm2 [Startup], Reduced Context) with identical and randomized resumes, benchmarked against three human recruitment experts. Analysis of variance revealed significant mean differences in four of eight LLM-only conditions and consistently significant differences between LLM and human evaluations (p < 0.01). Paired t-tests showed GPT adapts strongly to company context (p < 0.001), Gemini partially (p = 0.038 for Firm1), and Claude minimally (p > 0.1), while all LLMs differed significantly from human experts across contexts. Meta-cognition analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.