Interpretable Adversarial Perturbation in Input Embedding Space for Text

Motoki Sato; Jun Suzuki; Hiroyuki Shindo; Yuji Matsumoto

arXiv:1805.02917·cs.LG·May 9, 2018·37 cites

Interpretable Adversarial Perturbation in Input Embedding Space for Text

Motoki Sato, Jun Suzuki, Hiroyuki Shindo, Yuji Matsumoto

PDF

Open Access 2 Repos

TL;DR

This paper introduces an interpretable adversarial perturbation method in input embedding space for NLP, enabling the generation of adversarial texts through word replacements while maintaining or improving task performance.

Contribution

It proposes a novel approach that restricts adversarial perturbations to existing words in embedding space, restoring interpretability in NLP adversarial training.

Findings

01

Allows reconstruction of adversarial texts via word replacements

02

Maintains or improves NLP task performance

03

Restores interpretability to adversarial perturbations

Abstract

Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Multimodal Machine Learning Applications

MethodsInterpretability