AuthorityBench: Benchmarking LLM Authority Perception for Reliable Retrieval-Augmented Generation
Zhihui Yao, Hengran Zhang, and Keping Bi

TL;DR
This paper introduces AuthorityBench, a benchmark to evaluate LLMs' ability to perceive information authority, demonstrating its importance for improving retrieval accuracy and reducing misinformation in knowledge-based tasks.
Contribution
We present AuthorityBench, a comprehensive benchmark with datasets and evaluation methods to assess LLM authority perception, highlighting its significance for reliable retrieval-augmented generation.
Findings
ListJudge and PairJudge with PointScore are most correlated with ground-truth authority.
Incorporating webpage text degrades judgment performance, indicating authority is distinct from textual style.
Authority-guided filtering improves answer accuracy in downstream RAG tasks.
Abstract
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) with external knowledge but remains vulnerable to low-authority sources that can propagate misinformation. We investigate whether LLMs can perceive information authority - a capability extending beyond semantic understanding. To address this, we introduce AuthorityBench, a comprehensive benchmark for evaluating LLM authority perception comprising three datasets: DomainAuth (10K web domains with PageRank-based authority), EntityAuth (22K entities with popularity-based authority), and RAGAuth (120 queries with documents of varying authority for downstream evaluation). We evaluate five LLMs using three judging methods (PointJudge, PairJudge, ListJudge) across multiple output formats. Results show that ListJudge and PairJudge with PointScore output achieve the strongest correlation with ground-truth authority, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Misinformation and Its Impacts · Information Retrieval and Search Behavior
