Quantifying reliance on external information over parametric knowledge   during Retrieval Augmented Generation (RAG) using mechanistic analysis

Reshmi Ghosh; Rahul Seetharaman; Hitesh Wadhwa; Somyaa Aggarwal,; Samyadeep Basu; Soundararajan Srinivasan; Wenlong Zhao; Shreyas Chaudhari,; Ehsan Aghazadeh

arXiv:2410.00857·cs.CL·October 2, 2024

Quantifying reliance on external information over parametric knowledge during Retrieval Augmented Generation (RAG) using mechanistic analysis

Reshmi Ghosh, Rahul Seetharaman, Hitesh Wadhwa, Somyaa Aggarwal,, Samyadeep Basu, Soundararajan Srinivasan, Wenlong Zhao, Shreyas Chaudhari,, Ehsan Aghazadeh

PDF

Open Access

TL;DR

This paper investigates how Retrieval Augmented Generation (RAG) models predominantly rely on external retrieved context rather than their internal knowledge, revealing a 'shortcut' bias through mechanistic analysis.

Contribution

It introduces causal mediation analysis and attention knockout methods to demonstrate minimal parametric memory use and highlights the bias towards external context in RAG models.

Findings

01

RAG models heavily depend on retrieved context for answers.

02

Parametric memory contributes minimally to the generated responses.

03

The 'shortcut' bias is consistent across different language model sizes and types.

Abstract

Retrieval Augmented Generation (RAG) is a widely used approach for leveraging external context in several natural language applications such as question answering and information retrieval. Yet, the exact nature in which a Language Model (LM) leverages this non-parametric memory or retrieved context isn't clearly understood. This paper mechanistically examines the RAG pipeline to highlight that LMs demonstrate a "shortcut'' effect and have a strong bias towards utilizing the retrieved context to answer questions, while relying minimally on model priors. We propose (a) Causal Mediation Analysis; for proving that parametric memory is minimally utilized when answering a question and (b) Attention Contributions and Knockouts for showing the last token residual stream do not get enriched from the subject token in the question, but gets enriched from tokens of RAG-context. We find this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling

MethodsAttention Is All You Need · Attention Dropout · WordPiece · Linear Warmup With Linear Decay · Linear Layer · Weight Decay · Byte Pair Encoding · BERT · Softmax · Dropout