One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking
Tanmay Karmakar, Sourav Saha, Debapriyo Majumdar, Surjyanee Halder

TL;DR
This paper demonstrates that minimal, query-aware adversarial perturbations, often just a single word, can significantly manipulate neural text ranking models, revealing critical robustness vulnerabilities.
Contribution
It introduces a minimal, query-aware attack method using single-word perturbations and new diagnostic metrics to analyze neural ranking model robustness.
Findings
Single-word attacks achieve up to 91% success rate.
Fewer than two tokens are modified on average per document.
Mid-ranked documents are most vulnerable to attacks.
Abstract
Neural ranking models (NRMs) achieve strong retrieval effectiveness, yet prior work has shown they are vulnerable to adversarial perturbations. We revisit this robustness question with a minimal, query-aware attack that promotes a target document by inserting or substituting a single, semantically aligned word - the query center. We study heuristic and gradient-guided variants, including a white-box method that identifies influential insertion points. On TREC-DL 2019/2020 with BERT and monoT5 re-rankers, our single-word attacks achieve up to 91% success while modifying fewer than two tokens per document on average, achieving competitive rank and score boosts with far fewer edits under a comparable white-box setup to ensure fair evaluation against PRADA. We also introduce new diagnostic metrics to analyze attack sensitivity beyond aggregate success rates. Our analysis reveals a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Multimodal Machine Learning Applications
