SourceBench: Can AI Answers Reference Quality Web Sources?

Hexi Jin; Stephen Liu; Yuheng Li; Simran Malik; Yiying Zhang

arXiv:2602.16942·cs.AI·February 20, 2026

SourceBench: Can AI Answers Reference Quality Web Sources?

Hexi Jin, Stephen Liu, Yuheng Li, Simran Malik, Yiying Zhang

PDF

Open Access

TL;DR

SourceBench is a comprehensive benchmark designed to evaluate the quality of web sources cited by large language models, focusing on content relevance, accuracy, objectivity, and page-level signals across diverse query types.

Contribution

We introduce SourceBench, a multi-metric benchmark with a human-labeled dataset and an LLM evaluator, enabling systematic assessment of cited web source quality in AI-generated answers.

Findings

01

LLMs vary significantly in source quality.

02

Google Search often provides higher-quality sources.

03

The benchmark reveals key areas for improving source reliability.

Abstract

Large language models (LLMs) increasingly answer queries by citing web sources, but existing evaluations emphasize answer correctness rather than evidence quality. We introduce SourceBench, a benchmark for measuring the quality of cited web sources across 100 real-world queries spanning informational, factual, argumentative, social, and shopping intents. SourceBench uses an eight-metric framework covering content quality (content relevance, factual accuracy, objectivity) and page-level signals (e.g., freshness, authority/accountability, clarity), and includes a human-labeled dataset with a calibrated LLM-based evaluator that matches expert judgments closely. We evaluate eight LLMs, Google Search, and three AI search tools over 3996 cited sources using SourceBench and conduct further experiments to understand the evaluation results. Overall, our work reveals four key new insights that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExpert finding and Q&A systems · Topic Modeling · Wikis in Education and Collaboration