RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented   Generation for Large Language Models

Bang An; Shiyue Zhang; Mark Dredze

arXiv:2504.18041·cs.CL·April 28, 2025·2 cites

RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models

Bang An, Shiyue Zhang, Mark Dredze

PDF

Open Access 1 Video

TL;DR

This paper investigates how Retrieval-Augmented Generation (RAG) frameworks impact the safety of large language models, revealing that RAG can reduce safety and alter safety profiles, necessitating specialized safety measures.

Contribution

It provides the first detailed comparison of RAG and non-RAG LLMs regarding safety, identifying safety risks and evaluating red-teaming effectiveness specific to RAG settings.

Findings

01

RAG can decrease model safety and alter safety profiles.

02

Safe models with safe documents can still produce unsafe outputs.

03

Existing red-teaming methods are less effective for RAG models.

Abstract

Efforts to ensure the safety of large language models (LLMs) include safety fine-tuning, evaluation, and red teaming. However, despite the widespread use of the Retrieval-Augmented Generation (RAG) framework, AI safety work focuses on standard LLMs, which means we know little about how RAG use cases change a model's safety profile. We conduct a detailed comparative analysis of RAG and non-RAG frameworks with eleven LLMs. We find that RAG can make models less safe and change their safety profile. We explore the causes of this change and find that even combinations of safe models with safe documents can cause unsafe generations. In addition, we evaluate some existing red teaming methods for RAG settings and show that they are less effective than when used for non-RAG settings. Our work highlights the need for safety research and red-teaming methods specifically tailored for RAG LLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models· underline

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Hate Speech and Cyberbullying Detection

MethodsDropout · BERT · BART · RAG