GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search
Matan Ben-Tov, Mahmood Sharif

TL;DR
This paper introduces GASLITE, a novel attack method that exposes significant vulnerabilities in dense embedding-based retrieval systems, especially against concept-specific queries, highlighting the need for improved robustness.
Contribution
The work presents GASLITE, an effective adversarial attack technique, and provides a comprehensive evaluation of retriever robustness across multiple models and threat scenarios.
Findings
Retrievers are highly vulnerable to concept-specific SEO attacks.
GASLITE effectively bypasses existing defenses and outperforms prior attack methods.
Even minimal poisoning rates can significantly impact retrieval accuracy.
Abstract
Dense embedding-based text retrievalretrieval of relevant passages from corpora via deep learning encodingshas emerged as a powerful method attaining state-of-the-art search results and popularizing Retrieval Augmented Generation (RAG). Still, like other search methods, embedding-based retrieval may be susceptible to search-engine optimization (SEO) attacks, where adversaries promote malicious content by introducing adversarial passages to corpora. Prior work has shown such SEO is feasible, mostly demonstrating attacks against retrieval-integrated systems (e.g., RAG). Yet, these consider relaxed SEO threat models (e.g., targeting single queries), use baseline attack methods, and provide small-scale retrieval evaluation, thus obscuring our comprehensive understanding of retrievers' worst-case behavior. This work aims to faithfully and thoroughly assess…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Spam and Phishing Detection
