"You Are Rejected!": An Empirical Study of Large Language Models Taking Hiring Evaluations
Dingjie Fu, Dianxing Shi

TL;DR
This study assesses whether large language models can pass professional hiring evaluations, revealing that current models significantly underperform and fail to meet the standards required for engineering candidate assessments.
Contribution
It provides the first comprehensive empirical evaluation of LLMs on real-world hiring tests, highlighting their current limitations in professional assessment scenarios.
Findings
All evaluated LLMs fail to pass the hiring evaluations.
Significant inconsistency between LLM responses and reference solutions.
Demonstrates the gap between LLM capabilities and professional standards.
Abstract
With the proliferation of the internet and the rapid advancement of Artificial Intelligence, leading technology companies face an urgent annual demand for a considerable number of software and algorithm engineers. To efficiently and effectively identify high-potential candidates from thousands of applicants, these firms have established a multi-stage selection process, which crucially includes a standardized hiring evaluation designed to assess job-specific competencies. Motivated by the demonstrated prowess of Large Language Models (LLMs) in coding and reasoning tasks, this paper investigates a critical question: Can LLMs successfully pass these hiring evaluations? To this end, we conduct a comprehensive examination of a widely used professional assessment questionnaire. We employ state-of-the-art LLMs to generate responses and subsequently evaluate their performance. Contrary to any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing · Expert finding and Q&A systems
