GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search

Matan Ben-Tov; Mahmood Sharif

arXiv:2412.20953·cs.CR·September 19, 2025

GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search

Matan Ben-Tov, Mahmood Sharif

PDF

Open Access 1 Repo

TL;DR

This paper introduces GASLITE, a novel attack method that exposes significant vulnerabilities in dense embedding-based retrieval systems, especially against concept-specific queries, highlighting the need for improved robustness.

Contribution

The work presents GASLITE, an effective adversarial attack technique, and provides a comprehensive evaluation of retriever robustness across multiple models and threat scenarios.

Findings

01

Retrievers are highly vulnerable to concept-specific SEO attacks.

02

GASLITE effectively bypasses existing defenses and outperforms prior attack methods.

03

Even minimal poisoning rates can significantly impact retrieval accuracy.

Abstract

Dense embedding-based text retrieval $\unicode x 2013$ retrieval of relevant passages from corpora via deep learning encodings $\unicode x 2013$ has emerged as a powerful method attaining state-of-the-art search results and popularizing Retrieval Augmented Generation (RAG). Still, like other search methods, embedding-based retrieval may be susceptible to search-engine optimization (SEO) attacks, where adversaries promote malicious content by introducing adversarial passages to corpora. Prior work has shown such SEO is feasible, mostly demonstrating attacks against retrieval-integrated systems (e.g., RAG). Yet, these consider relaxed SEO threat models (e.g., targeting single queries), use baseline attack methods, and provide small-scale retrieval evaluation, thus obscuring our comprehensive understanding of retrievers' worst-case behavior. This work aims to faithfully and thoroughly assess…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

matanbt/gaslite
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Spam and Phishing Detection