When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Junchen Li; Chao Qi; Rongzheng Wang; Qizhi Chen; Liang Xu; Di Liang; Bob Simons; Shuang Liang

arXiv:2603.03919·cs.CR·March 5, 2026

When Safety Becomes a Vulnerability: Exploiting LLM Alignment Homogeneity for Transferable Blocking in RAG

Junchen Li, Chao Qi, Rongzheng Wang, Qizhi Chen, Liang Xu, Di Liang, Bob Simons, Shuang Liang

PDF

Open Access

TL;DR

This paper reveals that safety alignment in large language models creates a shared vulnerability, enabling transferable blocking attacks in retrieval-augmented generation systems, demonstrated through a novel black-box attack framework called TabooRAG.

Contribution

The paper introduces TabooRAG, a transferable blocking attack method exploiting alignment homogeneity, and demonstrates its effectiveness across multiple LLMs and datasets.

Findings

01

TabooRAG achieves up to 96% success rate in transfer attacks.

02

Alignment homogeneity enables cross-model transferability of blocking attacks.

03

Safety-aligned LLMs share vulnerabilities that can be exploited in RAG systems.

Abstract

Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by incorporating external knowledge, but its reliance on potentially poisonable knowledge bases introduces new availability risks. Attackers can inject documents that cause LLMs to refuse benign queries, attacks known as blocking attacks. Prior blocking attacks relying on adversarial suffixes or explicit instruction injection are increasingly ineffective against modern safety-aligned LLMs. We observe that safety-aligned LLMs exhibit heightened sensitivity to query-relevant risk signals, causing alignment mechanisms designed for harm prevention to become a source of exploitable refusal. Moreover, mainstream alignment practices share overlapping risk categories and refusal criteria, a phenomenon we term alignment homogeneity, enabling restricted risk context constructed on an accessible LLM to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Advanced Graph Neural Networks