On Adversarial Examples for Character-Level Neural Machine Translation

Javid Ebrahimi; Daniel Lowd; Dejing Dou

arXiv:1806.09030·cs.CL·June 26, 2018·157 cites

On Adversarial Examples for Character-Level Neural Machine Translation

Javid Ebrahimi, Daniel Lowd, Dejing Dou

PDF

Open Access 3 Repos

TL;DR

This paper explores adversarial attacks on character-level neural machine translation, introducing a white-box attack method that outperforms black-box attacks and demonstrating that adversarial training enhances model robustness.

Contribution

It introduces a novel white-box adversarial attack for character-level NMT using differentiable string edits and shows its effectiveness over black-box methods.

Findings

01

White-box attacks are more effective than black-box attacks.

02

Adversarial training significantly improves robustness.

03

New attack methods can target specific words in translations.

Abstract

Evaluating on adversarial examples has become a standard procedure to measure robustness of deep learning models. Due to the difficulty of creating white-box adversarial examples for discrete text input, most analyses of the robustness of NLP models have been done through black-box adversarial examples. We investigate adversarial examples for character-level neural machine translation (NMT), and contrast black-box adversaries with a novel white-box adversary, which employs differentiable string-edit operations to rank adversarial changes. We propose two novel types of attacks which aim to remove or change a word in a translation, rather than simply break the NMT. We demonstrate that white-box adversarial examples are significantly stronger than their black-box counterparts in different attack scenarios, which show more serious vulnerabilities than previously known. In addition, after…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Natural Language Processing Techniques