A Reality Check on Context Utilisation for Retrieval-Augmented Generation
Lovisa Hagstr\"om, Sara Vera Marjanovi\'c, Haeun Yu, Arnav Arora, Christina Lioma, Maria Maistro, Pepa Atanasova, Isabelle Augenstein

TL;DR
This paper introduces DRUID, a new dataset with real-world contexts for claim verification, revealing that synthetic datasets often misrepresent real retrieval complexity and inflate context utilisation results.
Contribution
The paper presents DRUID, a real-world dataset for evaluating context utilisation in RAG, and demonstrates the limitations of synthetic datasets in representing real retrieval challenges.
Findings
Synthetic datasets exaggerate rare context features.
Artificial datasets inflate context utilisation scores.
Context source properties correlate more with ACU than singleton characteristics.
Abstract
Retrieval-augmented generation (RAG) helps address the limitations of parametric knowledge embedded within a language model (LM). In real world settings, retrieved information can vary in complexity, yet most investigations of LM utilisation of context has been limited to synthetic text. We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with real-world queries and contexts manually annotated for stance. The dataset is based on the prototypical task of automated claim verification, for which automated retrieval of real-world evidence is crucial. We compare DRUID to synthetic datasets (CounterFact, ConflictQA) and find that artificial datasets often fail to represent the complexity and diversity of realistically retrieved context. We show that synthetic datasets exaggerate context characteristics rare in real retrieved data, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Advanced Image and Video Retrieval Techniques · Robotics and Automated Systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Weight Decay · Multi-Head Attention · Layer Normalization · WordPiece · Dropout
