Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Quanyu Long; Yue Deng; LeiLei Gan; Wenya Wang; and Sinno Jialin Pan

arXiv:2402.13532·cs.CL·August 26, 2025·3 cites

Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers

Quanyu Long, Yue Deng, LeiLei Gan, Wenya Wang, and Sinno Jialin Pan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a covert backdoor attack on dense retrieval systems triggered by grammatical errors, which can retrieve malicious content with high success and minimal corpus poisoning, while remaining undetectable and preserving normal performance.

Contribution

The study reveals the vulnerability of dense retrievers to grammar-error-triggered backdoor attacks, demonstrating high attack success with minimal poisoning and robustness against defenses.

Findings

01

High attack success rate with only 0.048% corpus poisoning

02

Contrastive loss increases sensitivity to grammatical errors

03

Hard negative sampling exacerbates backdoor susceptibility

Abstract

Dense retrieval systems have been widely used in various NLP applications. However, their vulnerabilities to potential attacks have been underexplored. This paper investigates a novel attack scenario where the attackers aim to mislead the retrieval system into retrieving the attacker-specified contents. Those contents, injected into the retrieval corpus by attackers, can include harmful text like hate speech or spam. Unlike prior methods that rely on model weights and generate conspicuous, unnatural outputs, we propose a covert backdoor attack triggered by grammar errors. Our approach ensures that the attacked models can function normally for standard queries while covertly triggering the retrieval of the attacker's contents in response to minor linguistic mistakes. Specifically, dense retrievers are trained with contrastive loss and hard negative sampling. Surprisingly, our findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruyue0001/backdoor_dpr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Security and Intrusion Detection · Adversarial Robustness in Machine Learning · Spam and Phishing Detection