Illusions of Relevance: Arbitrary Content Injection Attacks Deceive Retrievers, Rerankers, and LLM Judges
Manveer Singh Tamber, Jimmy Lin

TL;DR
This paper demonstrates that search systems and LLM relevance judges are highly vulnerable to content injection attacks, which can manipulate search rankings and relevance scores, raising concerns about robustness and trustworthiness.
Contribution
It introduces a comprehensive analysis of how arbitrary content injection attacks deceive retrieval and ranking models, revealing vulnerabilities across various model types and sizes.
Findings
Retrievers, rerankers, and LLM judges are highly susceptible to content injection.
Injection success depends on factors like model class, size, and content toxicity.
Current defenses often fail to detect injected content.
Abstract
This work considers a black-box threat model in which adversaries attempt to propagate arbitrary non-relevant content in search. We show that retrievers, rerankers, and LLM relevance judges are all highly vulnerable to attacks that enable arbitrary content to be promoted to the top of search results and to be assigned perfect relevance scores. We investigate how attackers may achieve this via content injection, injecting arbitrary sentences into relevant passages or query terms into arbitrary passages. Our study analyzes how factors such as model class and size, the balance between relevant and non-relevant content, injection location, toxicity and severity of injected content, and the role of LLM-generated content influence attack success, yielding novel, concerning, and often counterintuitive results. Our results reveal a weakness in embedding models, LLM-based scoring models, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics
MethodsADaptive gradient method with the OPTimal convergence rate
