Preserving Semantics in Textual Adversarial Attacks
David Herel, Hugo Cisneros, Tomas Mikolov

TL;DR
This paper introduces a new sentence embedding method called SPE that enhances the quality of adversarial attacks in NLP by better preserving semantics, leading to more effective and reliable attack success rates.
Contribution
The paper presents SPE, a supervised sentence encoder that significantly improves semantic preservation in adversarial attacks compared to existing encoders.
Findings
Up to 70% of adversarial examples are semantically invalid and should be discarded.
SPE outperforms existing encoders with 1.2x to 5.1x higher attack success rate.
The authors provide a plugin to integrate SPE into existing adversarial attack frameworks.
Abstract
The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead of attackers, we need better adversarial attacks. In this paper, we show that up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We address this core weakness and propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x - 5.1x better real attack success rate. We release our code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
