A Reality Check on Context Utilisation for Retrieval-Augmented Generation

Lovisa Hagstr\"om; Sara Vera Marjanovi\'c; Haeun Yu; Arnav Arora; Christina Lioma; Maria Maistro; Pepa Atanasova; Isabelle Augenstein

arXiv:2412.17031·cs.CL·May 30, 2025

A Reality Check on Context Utilisation for Retrieval-Augmented Generation

Lovisa Hagstr\"om, Sara Vera Marjanovi\'c, Haeun Yu, Arnav Arora, Christina Lioma, Maria Maistro, Pepa Atanasova, Isabelle Augenstein

PDF

Open Access 1 Repo 3 Datasets

TL;DR

This paper introduces DRUID, a new dataset with real-world contexts for claim verification, revealing that synthetic datasets often misrepresent real retrieval complexity and inflate context utilisation results.

Contribution

The paper presents DRUID, a real-world dataset for evaluating context utilisation in RAG, and demonstrates the limitations of synthetic datasets in representing real retrieval challenges.

Findings

01

Synthetic datasets exaggerate rare context features.

02

Artificial datasets inflate context utilisation scores.

03

Context source properties correlate more with ACU than singleton characteristics.

Abstract

Retrieval-augmented generation (RAG) helps address the limitations of parametric knowledge embedded within a language model (LM). In real world settings, retrieved information can vary in complexity, yet most investigations of LM utilisation of context has been limited to synthetic text. We introduce DRUID (Dataset of Retrieved Unreliable, Insufficient and Difficult-to-understand contexts) with real-world queries and contexts manually annotated for stance. The dataset is based on the prototypical task of automated claim verification, for which automated retrieval of real-world evidence is crucial. We compare DRUID to synthetic datasets (CounterFact, ConflictQA) and find that artificial datasets often fail to represent the complexity and diversity of realistically retrieved context. We show that synthetic datasets exaggerate context characteristics rare in real retrieved data, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

copenlu/context-utilisation-for-rag
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Advanced Image and Video Retrieval Techniques · Robotics and Automated Systems

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Adam · Weight Decay · Multi-Head Attention · Layer Normalization · WordPiece · Dropout