Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation

Lisette Espin-Noboa; Gonzalo Gabriel Mendez

arXiv:2602.08873·cs.IR·February 10, 2026

Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation

Lisette Espin-Noboa, Gonzalo Gabriel Mendez

PDF

Open Access

TL;DR

This paper introduces LLMScholarBench, a comprehensive benchmark for evaluating LLM-based scholar recommendation systems, focusing on both model quality and user interventions across multiple tasks and metrics.

Contribution

It presents a novel benchmark that jointly assesses model infrastructure and end-user interventions, revealing how interventions affect trade-offs in scholar recommendation tasks.

Findings

01

End-user interventions redistribute errors rather than fix them.

02

Higher temperature reduces model validity and factuality.

03

RAG improves technical quality but decreases diversity and parity.

Abstract

Large language models (LLMs) are increasingly used for academic expert recommendation. Existing audits typically evaluate model outputs in isolation, largely ignoring end-user inference-time interventions. As a result, it remains unclear whether failures such as refusals, hallucinations, and uneven coverage stem from model choice or deployment decisions. We introduce LLMScholarBench, a benchmark for auditing LLM-based scholar recommendation that jointly evaluates model infrastructure and end-user interventions across multiple tasks. LLMScholarBench measures both technical quality and social representation using nine metrics. We instantiate the benchmark in physics expert recommendation and audit 22 LLMs under temperature variation, representation-constrained prompting, and retrieval-augmented generation (RAG) via web search. Our results show that end-user interventions do not yield…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Topic Modeling · Artificial Intelligence in Healthcare and Education