TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

Huichi Zhou; Kin-Hei Lee; Zhonghao Zhan; Yue Chen; Zhenhao Li; Zhaoyang Wang; Hamed Haddadi; Emine Yilmaz

arXiv:2501.00879·cs.CL·May 26, 2025·2 cites

TrustRAG: Enhancing Robustness and Trustworthiness in Retrieval-Augmented Generation

Huichi Zhou, Kin-Hei Lee, Zhonghao Zhan, Yue Chen, Zhenhao Li, Zhaoyang Wang, Hamed Haddadi, Emine Yilmaz

PDF

Open Access 1 Repo

TL;DR

TrustRAG introduces a robust, training-free framework that enhances retrieval-augmented generation by filtering malicious content, improving accuracy, efficiency, and resistance to corpus poisoning attacks.

Contribution

It presents a novel two-stage filtering approach that systematically detects and removes malicious content before retrieval, improving the robustness of LLMs in RAG systems.

Findings

01

Significant improvement in retrieval accuracy.

02

Enhanced resistance to corpus poisoning attacks.

03

Seamless integration with existing LLMs.

Abstract

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user queries. These systems, however, remain susceptible to corpus poisoning attacks, which can severely impair the performance of LLMs. To address this challenge, we propose TrustRAG, a robust framework that systematically filters malicious and irrelevant content before it is retrieved for generation. Our approach employs a two-stage defense mechanism. The first stage implements a cluster filtering strategy to detect potential attack patterns. The second stage employs a self-assessment process that harnesses the internal capabilities of LLMs to detect malicious documents and resolve inconsistencies. TrustRAG provides a plug-and-play, training-free module that integrates seamlessly with any open- or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huichizhou/trustrag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems

Methodsk-Means Clustering