Preserving Semantics in Textual Adversarial Attacks

David Herel; Hugo Cisneros; Tomas Mikolov

arXiv:2211.04205·cs.CL·October 9, 2023

Preserving Semantics in Textual Adversarial Attacks

David Herel, Hugo Cisneros, Tomas Mikolov

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new sentence embedding method called SPE that enhances the quality of adversarial attacks in NLP by better preserving semantics, leading to more effective and reliable attack success rates.

Contribution

The paper presents SPE, a supervised sentence encoder that significantly improves semantic preservation in adversarial attacks compared to existing encoders.

Findings

01

Up to 70% of adversarial examples are semantically invalid and should be discarded.

02

SPE outperforms existing encoders with 1.2x to 5.1x higher attack success rate.

03

The authors provide a plugin to integrate SPE into existing adversarial attack frameworks.

Abstract

The growth of hateful online content, or hate speech, has been associated with a global increase in violent crimes against minorities [23]. Harmful online content can be produced easily, automatically and anonymously. Even though, some form of auto-detection is already achieved through text classifiers in NLP, they can be fooled by adversarial attacks. To strengthen existing systems and stay ahead of attackers, we need better adversarial attacks. In this paper, we show that up to 70% of adversarial examples generated by adversarial attacks should be discarded because they do not preserve semantics. We address this core weakness and propose a new, fully supervised sentence embedding technique called Semantics-Preserving-Encoder (SPE). Our method outperforms existing sentence encoders used in adversarial attacks by achieving 1.2x - 5.1x better real attack success rate. We release our code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

davidherel/semantics-preserving-encoder
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings