Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG
Haoze Guo, Ziqi Wei

TL;DR
This paper introduces OpenRAG-Soc, a comprehensive benchmark suite for evaluating the security and robustness of retrieval-augmented generation systems against web-based prompt injection and poisoning attacks.
Contribution
It provides a standardized, reproducible framework with diverse defenses and evaluation metrics for assessing RAG systems' vulnerability to indirect prompt injection.
Findings
Benchmark enables consistent evaluation of RAG defenses.
HTML/Markdown sanitization reduces attack success.
Attribution gating improves response integrity.
Abstract
Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web, amplifying both their usefulness and their attack surface. Most notably, indirect prompt injection and retrieval poisoning attack the web-native carriers that survive ingestion pipelines and are very concerning. We provide OpenRAG-Soc, a compact, reproducible benchmark-and-harness for web-facing RAG evaluation under these threats, in a discrete data package. The suite combines a social corpus with interchangeable sparse and dense retrievers and deployable mitigations - HTML/Markdown sanitization, Unicode normalization, and attribution-gated answered. It standardizes end-to-end evaluation from ingestion to generation and reports attacks time of one of the responses at answer time, rank shifts in both sparse and dense retrievers, utility and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Application Security Vulnerabilities · Spam and Phishing Detection · Digital and Cyber Forensics
