Detecting Textual Adversarial Examples Based on Distributional   Characteristics of Data Representations

Na Liu; Mark Dras; Wei Emma Zhang

arXiv:2204.13853·cs.CL·May 2, 2022

Detecting Textual Adversarial Examples Based on Distributional Characteristics of Data Representations

Na Liu, Mark Dras, Wei Emma Zhang

PDF

Open Access 1 Repo

TL;DR

This paper introduces two reactive detection methods for textual adversarial examples in NLP, based on distributional properties of data representations, achieving state-of-the-art results across multiple attack levels and datasets.

Contribution

The paper proposes two novel detection techniques, including a new method (MDRE), based on distributional characteristics, filling a gap in reactive NLP adversarial defense.

Findings

01

Adapted LID achieves state-of-the-art detection performance.

02

MDRE outperforms existing baselines on multiple datasets.

03

Both methods effectively detect various levels of textual adversarial attacks.

Abstract

Although deep neural networks have achieved state-of-the-art performance in various machine learning tasks, adversarial examples, constructed by adding small non-random perturbations to correctly classified inputs, successfully fool highly expressive deep classifiers into incorrect predictions. Approaches to adversarial attacks in natural language tasks have boomed in the last five years using character-level, word-level, phrase-level, or sentence-level textual perturbations. While there is some work in NLP on defending against such attacks through proactive methods, like adversarial training, there is to our knowledge no effective general reactive approaches to defence via detection of textual adversarial examples such as is found in the image processing literature. In this paper, we propose two new reactive methods for NLP to fill this gap, which unlike the few limited application…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naliuanna/mdre
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning