Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval
Yongkang Li, Panagiotis Eustratiadis, Evangelos Kanoulas

TL;DR
This paper enhances the HotFlip attack method for corpus poisoning in dense retrieval systems by significantly improving its efficiency, enabling faster adversarial passage generation, and exploring its effectiveness in black-box and query-agnostic scenarios.
Contribution
We substantially optimize HotFlip's efficiency, reducing generation time from 4 hours to 15 minutes, and extend analysis to black-box and query-agnostic attack settings.
Findings
HotFlip can effectively attack dense retrievers.
Attack performance decreases against advanced methods.
Limited generalization in black-box attacks.
Abstract
HotFlip is a topical gradient-based word substitution method for attacking language models. Recently, this method has been further applied to attack retrieval systems by generating malicious passages that are injected into a corpus, i.e., corpus poisoning. However, HotFlip is known to be computationally inefficient, with the majority of time being spent on gradient accumulation for each query-passage pair during the adversarial token generation phase, making it impossible to generate an adequate number of adversarial passages in a reasonable amount of time. Moreover, the attack method itself assumes access to a set of user queries, a strong assumption that does not correspond to how real-world adversarial attacks are usually performed. In this paper, we first significantly boost the efficiency of HotFlip, reducing the adversarial generation process from 4 hours per document to only 15…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques
MethodsSparse Evolutionary Training
