Rethinking Targeted Adversarial Attacks For Neural Machine Translation
Junjie Wu, Lemao Liu, Wei Bi, Dit-Yan Yeung

TL;DR
This paper introduces a new setting for targeted adversarial attacks on neural machine translation systems, ensuring more reliable attack results, and proposes a novel TWGA method that effectively exploits this setting.
Contribution
It proposes a new, more reliable setting for targeted adversarial attacks on NMT and introduces the TWGA method to effectively generate adversarial examples.
Findings
The new setting yields more faithful attack results.
TWGA effectively attacks NMT systems.
In-depth analysis reveals valuable insights.
Abstract
Targeted adversarial attacks are widely used to evaluate the robustness of neural machine translation systems. Unfortunately, this paper first identifies a critical issue in the existing settings of NMT targeted adversarial attacks, where their attacking results are largely overestimated. To this end, this paper presents a new setting for NMT targeted adversarial attacks that could lead to reliable attacking results. Under the new setting, it then proposes a Targeted Word Gradient adversarial Attack (TWGA) method to craft adversarial examples. Experimental results demonstrate that our proposed setting could provide faithful attacking results for targeted adversarial attacks on NMT systems, and the proposed TWGA method can effectively attack such victim NMT systems. In-depth analyses on a large-scale dataset further illustrate some valuable findings. 1 Our code and data are available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
