Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation
Lisette Espin-Noboa, Gonzalo Gabriel Mendez

TL;DR
This paper introduces LLMScholarBench, a comprehensive benchmark for evaluating LLM-based scholar recommendation systems, focusing on both model quality and user interventions across multiple tasks and metrics.
Contribution
It presents a novel benchmark that jointly assesses model infrastructure and end-user interventions, revealing how interventions affect trade-offs in scholar recommendation tasks.
Findings
End-user interventions redistribute errors rather than fix them.
Higher temperature reduces model validity and factuality.
RAG improves technical quality but decreases diversity and parity.
Abstract
Large language models (LLMs) are increasingly used for academic expert recommendation. Existing audits typically evaluate model outputs in isolation, largely ignoring end-user inference-time interventions. As a result, it remains unclear whether failures such as refusals, hallucinations, and uneven coverage stem from model choice or deployment decisions. We introduce LLMScholarBench, a benchmark for auditing LLM-based scholar recommendation that jointly evaluates model infrastructure and end-user interventions across multiple tasks. LLMScholarBench measures both technical quality and social representation using nine metrics. We instantiate the benchmark in physics expert recommendation and audit 22 LLMs under temperature variation, representation-constrained prompting, and retrieval-augmented generation (RAG) via web search. Our results show that end-user interventions do not yield…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education
