Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework
Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick, Craswell, Jimmy Lin

TL;DR
This paper introduces the AutoNuggetizer framework, automating nugget evaluation for RAG systems, and demonstrates its strong correlation with manual assessments, aiding future system development.
Contribution
It refactors the historical nugget evaluation methodology for RAG systems using large language models for automation.
Findings
Strong correlation between automatic and manual nugget evaluation scores
Automated process can reliably guide RAG system improvements
Initial results from 21 topics across 45 runs show promising evaluation consistency
Abstract
This report provides an initial look at partial results from the TREC 2024 Retrieval-Augmented Generation (RAG) Track. We have identified RAG evaluation as a barrier to continued progress in information access (and more broadly, natural language processing and artificial intelligence), and it is our hope that we can contribute to tackling the many challenges in this space. The central hypothesis we explore in this work is that the nugget evaluation methodology, originally developed for the TREC Question Answering Track in 2003, provides a solid foundation for evaluating RAG systems. As such, our efforts have focused on "refactoring" this methodology, specifically applying large language models to both automatically create nuggets and to automatically assign nuggets to system answers. We call this the AutoNuggetizer framework. Within the TREC setup, we are able to calibrate our fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeophysics and Gravity Measurements · Meteorological Phenomena and Simulations · Seismic Imaging and Inversion Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Linear Layer · Dense Connections · Multi-Head Attention · Linear Warmup With Linear Decay · Layer Normalization · WordPiece · Dropout
