Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers
Quanyu Long, Yue Deng, LeiLei Gan, Wenya Wang, and Sinno Jialin Pan

TL;DR
This paper introduces a covert backdoor attack on dense retrieval systems triggered by grammatical errors, which can retrieve malicious content with high success and minimal corpus poisoning, while remaining undetectable and preserving normal performance.
Contribution
The study reveals the vulnerability of dense retrievers to grammar-error-triggered backdoor attacks, demonstrating high attack success with minimal poisoning and robustness against defenses.
Findings
High attack success rate with only 0.048% corpus poisoning
Contrastive loss increases sensitivity to grammatical errors
Hard negative sampling exacerbates backdoor susceptibility
Abstract
Dense retrieval systems have been widely used in various NLP applications. However, their vulnerabilities to potential attacks have been underexplored. This paper investigates a novel attack scenario where the attackers aim to mislead the retrieval system into retrieving the attacker-specified contents. Those contents, injected into the retrieval corpus by attackers, can include harmful text like hate speech or spam. Unlike prior methods that rely on model weights and generate conspicuous, unnatural outputs, we propose a covert backdoor attack triggered by grammar errors. Our approach ensures that the attacked models can function normally for standard queries while covertly triggering the retrieval of the attacker's contents in response to minor linguistic mistakes. Specifically, dense retrievers are trained with contrastive loss and hard negative sampling. Surprisingly, our findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Adversarial Robustness in Machine Learning · Spam and Phishing Detection
