Textual Manifold-based Defense Against Natural Language Adversarial Examples
Dang Minh Nguyen, Luu Anh Tuan

TL;DR
This paper introduces a novel NLP defense method called Textual Manifold-based Defense (TMD) that projects text embeddings onto a natural manifold to improve robustness against adversarial attacks, outperforming previous defenses.
Contribution
It is the first NLP defense leveraging the embedding manifold structure to enhance robustness against adversarial examples.
Findings
TMD significantly outperforms previous defenses under various attack settings.
TMD maintains high accuracy on clean data while improving robustness.
Adversarial texts tend to diverge from the natural embedding manifold, enabling effective detection.
Abstract
Recent studies on adversarial images have shown that they tend to leave the underlying low-dimensional data manifold, making them significantly more challenging for current models to make correct predictions. This so-called off-manifold conjecture has inspired a novel line of defenses against adversarial attacks on images. In this study, we find a similar phenomenon occurs in the contextualized embedding space induced by pretrained language models, in which adversarial texts tend to have their embeddings diverge from the manifold of natural ones. Based on this finding, we propose Textual Manifold-based Defense (TMD), a defense mechanism that projects text embeddings onto an approximated embedding manifold before classification. It reduces the complexity of potential adversarial examples, which ultimately enhances the robustness of the protected model. Through extensive experiments, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
