AuthorityBench: Benchmarking LLM Authority Perception for Reliable Retrieval-Augmented Generation

Zhihui Yao; Hengran Zhang; and Keping Bi

arXiv:2603.25092·cs.IR·March 27, 2026

AuthorityBench: Benchmarking LLM Authority Perception for Reliable Retrieval-Augmented Generation

Zhihui Yao, Hengran Zhang, and Keping Bi

PDF

Open Access 1 Datasets

TL;DR

This paper introduces AuthorityBench, a benchmark to evaluate LLMs' ability to perceive information authority, demonstrating its importance for improving retrieval accuracy and reducing misinformation in knowledge-based tasks.

Contribution

We present AuthorityBench, a comprehensive benchmark with datasets and evaluation methods to assess LLM authority perception, highlighting its significance for reliable retrieval-augmented generation.

Findings

01

ListJudge and PairJudge with PointScore are most correlated with ground-truth authority.

02

Incorporating webpage text degrades judgment performance, indicating authority is distinct from textual style.

03

Authority-guided filtering improves answer accuracy in downstream RAG tasks.

Abstract

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) with external knowledge but remains vulnerable to low-authority sources that can propagate misinformation. We investigate whether LLMs can perceive information authority - a capability extending beyond semantic understanding. To address this, we introduce AuthorityBench, a comprehensive benchmark for evaluating LLM authority perception comprising three datasets: DomainAuth (10K web domains with PageRank-based authority), EntityAuth (22K entities with popularity-based authority), and RAGAuth (120 queries with documents of varying authority for downstream evaluation). We evaluate five LLMs using three judging methods (PointJudge, PairJudge, ListJudge) across multiple output formats. Results show that ListJudge and PairJudge with PointScore output achieve the strongest correlation with ground-truth authority, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Trustworthy-Information-Access/AuthorityBench
dataset

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Misinformation and Its Impacts · Information Retrieval and Search Behavior