Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics

Yilun Hua; Giuseppe Castellucci; Peter Schulam; Heba Elfardy; Kevin Small

arXiv:2601.23129·cs.CL·February 2, 2026

Evaluating the Utility of Grounding Documents with Reference-Free LLM-based Metrics

Yilun Hua, Giuseppe Castellucci, Peter Schulam, Heba Elfardy, Kevin Small

PDF

Open Access

TL;DR

This paper introduces GroGU, a reference-free, model-specific metric that quantifies the utility of grounding documents for LLMs based on entropy, improving retrieval and generation quality without costly annotations.

Contribution

The paper presents GroGU, a novel utility metric for grounding documents that is reference-free, model-specific, and effectively guides training for better RAG performance.

Findings

01

GroGU effectively distinguishes ground-truth documents.

02

Using GroGU improves RAG retrieval metrics.

03

GroGU enhances answer accuracy in experiments.

Abstract

Retrieval Augmented Generation (RAG)'s success depends on the utility the LLM derives from the content used for grounding. Quantifying content utility does not have a definitive specification and existing metrics ignore model-specific capabilities and/or rely on costly annotations. In this paper, we propose Grounding Generation Utility (GroGU), a model-specific and reference-free metric that defines utility as a function of the downstream LLM's generation confidence based on entropy. Despite having no annotation requirements, GroGU is largely faithful in distinguishing ground-truth documents while capturing nuances ignored by LLM-agnostic metrics. We apply GroGU to train a query-rewriter for RAG by identifying high-utility preference data for Direct Preference Optimization. Experiments show improvements by up to 18.2 points in Mean Reciprocal Rank and up to 9.4 points in answer accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Information Retrieval and Search Behavior · Text Readability and Simplification