Information Retrieval Induced Safety Degradation in AI Agents

Cheng Yu; Benedikt Stroebl; Diyi Yang; Orestis Papakyriakopoulos

arXiv:2505.14215·cs.CY·October 27, 2025

Information Retrieval Induced Safety Degradation in AI Agents

Cheng Yu, Benedikt Stroebl, Diyi Yang, Orestis Papakyriakopoulos

PDF

Open Access

TL;DR

This paper demonstrates that enabling external retrieval in AI agents can lead to safety degradation, increasing bias, harmful content, and unsafe behaviors despite retrieval accuracy or mitigation efforts.

Contribution

It reveals the counterintuitive safety risks of retrieval-enabled AI agents and highlights the need for new mitigation strategies to maintain safety and fairness.

Findings

01

Retrieval access reduces refusal rates but increases bias and harmful content.

02

Retrieval-enabled models often behave more unsafely than uncensored models.

03

Safety degradation persists even with mitigation and high retrieval accuracy.

Abstract

Despite the growing integration of retrieval-enabled AI agents into society, their safety and ethical behavior remain inadequately understood. In particular, the integration of LLMs and AI agents with external information sources and real-world environments raises critical questions about how they engage with and are influenced by these external data sources and interactive contexts. This study investigates how expanding retrieval access -- from no external sources to Wikipedia-based retrieval and open web search -- affects model reliability, bias propagation, and harmful content generation. Through extensive benchmarking of censored and uncensored LLMs and AI agents, our findings reveal a consistent degradation in refusal rates, bias sensitivity, and harmfulness safeguards as models gain broader access to external sources, culminating in a phenomenon we term safety degradation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI