TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation
Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan, Yue Chen, Zhenhao Li, Zhaoyang Wang, Hamed Haddadi, Emine Yilmaz

TL;DR
TrustRAG introduces a robust, training-free framework that enhances retrieval-augmented generation by filtering malicious content, improving accuracy, efficiency, and resistance to corpus poisoning attacks.
Contribution
It presents a novel two-stage filtering approach that systematically detects and removes malicious content before retrieval, improving the robustness of LLMs in RAG systems.
Findings
Significant improvement in retrieval accuracy.
Enhanced resistance to corpus poisoning attacks.
Seamless integration with existing LLMs.
Abstract
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user queries. These systems, however, remain susceptible to corpus poisoning attacks, which can severely impair the performance of LLMs. To address this challenge, we propose TrustRAG, a robust framework that systematically filters malicious and irrelevant content before it is retrieved for generation. Our approach employs a two-stage defense mechanism. The first stage implements a cluster filtering strategy to detect potential attack patterns. The second stage employs a self-assessment process that harnesses the internal capabilities of LLMs to detect malicious documents and resolve inconsistencies. TrustRAG provides a plug-and-play, training-free module that integrates seamlessly with any open- or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Automated Systems
Methodsk-Means Clustering
