Textual Manifold-based Defense Against Natural Language Adversarial   Examples

Dang Minh Nguyen; Luu Anh Tuan

arXiv:2211.02878·cs.CL·November 8, 2022·1 cites

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Dang Minh Nguyen, Luu Anh Tuan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel NLP defense method called Textual Manifold-based Defense (TMD) that projects text embeddings onto a natural manifold to improve robustness against adversarial attacks, outperforming previous defenses.

Contribution

It is the first NLP defense leveraging the embedding manifold structure to enhance robustness against adversarial examples.

Findings

01

TMD significantly outperforms previous defenses under various attack settings.

02

TMD maintains high accuracy on clean data while improving robustness.

03

Adversarial texts tend to diverge from the natural embedding manifold, enabling effective detection.

Abstract

Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dangne/tmd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)