TL;DR
This paper introduces CertDR, a certified defense method based on randomized smoothing, to guarantee the robustness of neural ranking models against word substitution ranking attacks, especially for top-K results.
Contribution
The paper proposes a novel certified defense approach, CertDR, providing provable robustness guarantees for neural ranking models against adversarial word substitutions.
Findings
CertDR outperforms existing empirical defenses in experiments.
It provides provable guarantees for top-K robustness.
The method is effective on real web search datasets.
Abstract
Neural ranking models (NRMs) have achieved promising results in information retrieval. NRMs have also been shown to be vulnerable to adversarial examples. A typical Word Substitution Ranking Attack (WSRA) against NRMs was proposed recently, in which an attacker promotes a target document in rankings by adding human-imperceptible perturbations to its text. This raises concerns when deploying NRMs in real-world applications. Therefore, it is important to develop techniques that defend against such attacks for NRMs. In empirical defenses adversarial examples are found during training and used to augment the training set. However, such methods offer no theoretical guarantee on the models' robustness and may eventually be broken by other sophisticated WSRAs. To escape this arms race, rigorous and provable certified defense methods for NRMs are needed. To this end, we first define the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
