ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning
Xiangyu Yin, Yi Qi, Chih-Hong Cheng

TL;DR
ProGRank is a training-free, retriever-side defense method for dense-retriever RAG systems that detects and reranks potentially poisoned passages using probe-gradient signals, enhancing robustness without retraining.
Contribution
ProGRank introduces a novel, post hoc, training-free reranking approach that improves defense against corpus poisoning in RAG systems by leveraging probe-gradient derived instability signals.
Findings
ProGRank outperforms existing defenses across multiple datasets and attack types.
It maintains high utility while significantly improving robustness against corpus poisoning.
ProGRank remains effective under adaptive evasive attacks.
Abstract
Retrieval-Augmented Generation (RAG) improves the reliability of large language model applications by grounding generation in retrieved evidence, but it also introduces a new attack surface: corpus poisoning. In this setting, an adversary injects or edits passages so that they are ranked into the Top- results for target queries and then affect downstream generation. Existing defences against corpus poisoning often rely on content filtering, auxiliary models, or generator-side reasoning, which can make deployment more difficult. We propose ProGRank, a post hoc, training-free retriever-side defence for dense-retriever RAG. ProGRank stress-tests each query--passage pair under mild randomized perturbations and extracts probe gradients from a small fixed parameter subset of the retriever. From these signals, it derives two instability signals, representational consistency and dispersion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
