Practical RAG Evaluation: A Rarity-Aware Set-Based Metric and Cost-Latency-Quality Trade-offs

Etienne Dallaire

arXiv:2511.09545·cs.IR·November 13, 2025

Practical RAG Evaluation: A Rarity-Aware Set-Based Metric and Cost-Latency-Quality Trade-offs

Etienne Dallaire

PDF

Open Access

TL;DR

This paper introduces a rarity-aware evaluation metric and a comprehensive benchmarking framework for production RAG systems, addressing limitations of classical IR metrics and providing reproducible, cost-aware decision tools.

Contribution

It proposes a new rarity-aware set score, a golden-set pipeline, and a detailed benchmark for production RAG, enabling better evaluation and optimization.

Findings

01

RA-nWG@K effectively measures rarity-aware retrieval quality.

02

The golden-set pipeline outperforms single-shot ranking methods.

03

Benchmark results reveal trade-offs among retrieval models, dimensions, and rerankers.

Abstract

This paper addresses the guessing game in building production RAG. Classical rank-centric IR metrics (nDCG/MAP/MRR) are a poor fit for RAG, where LLMs consume a set of passages rather than a browsed list; position discounts and prevalence-blind aggregation miss what matters: whether the prompt at cutoff K contains the decisive evidence. Second, there is no standardized, reproducible way to build and audit golden sets. Third, leaderboards exist but lack end-to-end, on-corpus benchmarking that reflects production trade-offs. Fourth, how state-of-the-art embedding models handle proper-name identity signals and conversational noise remains opaque. To address these, we contribute: (1) RA-nWG@K, a rarity-aware, per-query-normalized set score, and operational ceilings via the pool-restricted oracle ceiling (PROC) and the percentage of PROC (%PROC) to separate retrieval from ordering headroom…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Biomedical Text Mining and Ontologies