Source Attribution in Retrieval-Augmented Generation

Ikhtiyor Nematov; Tarik Kalai; Elizaveta Kuzmenko; Gabriele Fugagnoli; Dimitris Sacharidis; Katja Hose; Tomer Sagi

arXiv:2507.04480·cs.LG·July 8, 2025

Source Attribution in Retrieval-Augmented Generation

Ikhtiyor Nematov, Tarik Kalai, Elizaveta Kuzmenko, Gabriele Fugagnoli, Dimitris Sacharidis, Katja Hose, Tomer Sagi

PDF

TL;DR

This paper explores the adaptation of Shapley value-based attribution methods to Retrieval-Augmented Generation systems, aiming to identify influential documents efficiently while addressing computational challenges.

Contribution

It systematically applies Shapley-based attribution to RAG, compares approximations, and evaluates their effectiveness in practical, complex scenarios.

Findings

01

Shapley approximations can closely mirror exact attributions.

02

SHAP methods reduce computational costs significantly.

03

Effective identification of critical documents in complex relationships.

Abstract

While attribution methods, such as Shapley values, are widely used to explain the importance of features or training data in traditional machine learning, their application to Large Language Models (LLMs), particularly within Retrieval-Augmented Generation (RAG) systems, is nascent and challenging. The primary obstacle is the substantial computational cost, where each utility function evaluation involves an expensive LLM call, resulting in direct monetary and time expenses. This paper investigates the feasibility and effectiveness of adapting Shapley-based attribution to identify influential retrieved documents in RAG. We compare Shapley with more computationally tractable approximations and some existing attribution methods for LLM. Our work aims to: (1) systematically apply established attribution principles to the RAG document-level setting; (2) quantify how well SHAP approximations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.