One word at a time: adversarial attacks on retrieval models
Nisarg Raval, Manisha Verma

TL;DR
This paper introduces a systematic method to evaluate the robustness of ranking models against adversarial attacks, revealing their vulnerability to minimal, semantically similar perturbations that significantly alter document rankings.
Contribution
It presents a novel approach to generate adversarial examples for ranking models and analyzes their robustness across multiple datasets.
Findings
Few token changes can fool rankers into changing document scores
Adversarial perturbations significantly lower document ranks
Ranking models are vulnerable to minimal, semantically similar modifications
Abstract
Adversarial examples, generated by applying small perturbations to input features, are widely used to fool classifiers and measure their robustness to noisy inputs. However, little work has been done to evaluate the robustness of ranking models through adversarial examples. In this work, we present a systematic approach of leveraging adversarial examples to measure the robustness of popular ranking models. We explore a simple method to generate adversarial examples that forces a ranker to incorrectly rank the documents. Using this approach, we analyze the robustness of various ranking models and the quality of perturbations generated by the adversarial attacker across two datasets. Our findings suggest that with very few token changes (1-3), the attacker can yield semantically similar perturbed documents that can fool different rankers into changing a document's score, lowering its rank…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Domain Adaptation and Few-Shot Learning · Topic Modeling
