Illusions of Relevance: Arbitrary Content Injection Attacks Deceive Retrievers, Rerankers, and LLM Judges

Manveer Singh Tamber; Jimmy Lin

arXiv:2501.18536·cs.IR·January 1, 2026

Illusions of Relevance: Arbitrary Content Injection Attacks Deceive Retrievers, Rerankers, and LLM Judges

Manveer Singh Tamber, Jimmy Lin

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that search systems and LLM relevance judges are highly vulnerable to content injection attacks, which can manipulate search rankings and relevance scores, raising concerns about robustness and trustworthiness.

Contribution

It introduces a comprehensive analysis of how arbitrary content injection attacks deceive retrieval and ranking models, revealing vulnerabilities across various model types and sizes.

Findings

01

Retrievers, rerankers, and LLM judges are highly susceptible to content injection.

02

Injection success depends on factors like model class, size, and content toxicity.

03

Current defenses often fail to detect injected content.

Abstract

This work considers a black-box threat model in which adversaries attempt to propagate arbitrary non-relevant content in search. We show that retrievers, rerankers, and LLM relevance judges are all highly vulnerable to attacks that enable arbitrary content to be promoted to the top of search results and to be assigned perfect relevance scores. We investigate how attackers may achieve this via content injection, injecting arbitrary sentences into relevant passages or query terms into arbitrary passages. Our study analyzes how factors such as model class and size, the balance between relevant and non-relevant content, injection location, toxicity and severity of injected content, and the role of LLM-generated content influence attack success, yielding novel, concerning, and often counterintuitive results. Our results reveal a weakness in embedding models, LLM-based scoring models, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manveertamber/content_injection_attacks
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics

MethodsADaptive gradient method with the OPTimal convergence rate