Machine Translation Models Stand Strong in the Face of Adversarial Attacks
Pavel Burnyshev, Elizaveta Kostenok, Alexey Zaytsev

TL;DR
This paper investigates the robustness of machine translation models against adversarial attacks, demonstrating they are generally resilient but can be vulnerable to certain advanced perturbation strategies.
Contribution
The study introduces novel attack algorithms, including gradient-based and character-mixing strategies, to evaluate the robustness of seq2seq translation models.
Findings
Machine translation models show robustness against known adversarial attacks.
Advanced attacks can outperform existing methods in certain scenarios.
Perturbations in output are proportional to input perturbations.
Abstract
Adversarial attacks expose vulnerabilities of deep learning models by introducing minor perturbations to the input, which lead to substantial alterations in the output. Our research focuses on the impact of such adversarial attacks on sequence-to-sequence (seq2seq) models, specifically machine translation models. We introduce algorithms that incorporate basic text perturbation heuristics and more advanced strategies, such as the gradient-based attack, which utilizes a differentiable approximation of the inherently non-differentiable translation metric. Through our investigation, we provide evidence that machine translation models display robustness displayed robustness against best performed known adversarial attacks, as the degree of perturbation in the output is directly proportional to the perturbation in the input. However, among underdogs, our attacks outperform alternatives,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
