Quantifying Document Impact in RAG-LLMs

Armin Gerami; Kazem Faghih; Ramani Duraiswami

arXiv:2601.05260·cs.IR·January 12, 2026

Quantifying Document Impact in RAG-LLMs

Armin Gerami, Kazem Faghih, Ramani Duraiswami

PDF

Open Access

TL;DR

This paper introduces the Influence Score (IS), a new metric based on Partial Information Decomposition, to quantify the impact of individual documents on RAG-generated responses, enhancing transparency and trustworthiness.

Contribution

The paper proposes the Influence Score (IS), a novel metric for measuring document impact in RAG systems, validated through experiments demonstrating its effectiveness in identifying influential documents.

Findings

01

IS correctly identifies malicious documents in 86% of cases

02

Using top-ranked documents by IS yields responses closer to original

03

IS improves transparency and reliability of RAG systems

Abstract

Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing outdated information. However, this introduces challenges such as factual inconsistencies, source conflicts, bias propagation, and security vulnerabilities, which undermine the trustworthiness of RAG systems. A key gap in current RAG evaluation is the lack of a metric to quantify the contribution of individual retrieved documents to the final output. To address this, we introduce the Influence Score (IS), a novel metric based on Partial Information Decomposition that measures the impact of each retrieved document on the generated response. We validate IS through two experiments. First, a poison attack simulation across three datasets demonstrates that IS correctly identifies the malicious document as the most influential in $86%$ of cases.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Misinformation and Its Impacts