Revisiting Robust Neural Machine Translation: A Transformer Case Study

Peyman Passban; Puneeth S.M. Saladi; Qun Liu

arXiv:2012.15710·cs.CL·September 13, 2021·1 cites

Revisiting Robust Neural Machine Translation: A Transformer Case Study

Peyman Passban, Puneeth S.M. Saladi, Qun Liu

PDF

Open Access

TL;DR

This paper investigates the vulnerability of Transformer-based neural machine translation systems to noise and introduces novel training techniques, including Target Augmented Fine-tuning, Controlled Denoising, and Dual-Channel Decoding, to improve noise robustness without inference overhead.

Contribution

The paper presents three new methods to enhance Transformer NMT robustness to noise, focusing on training-phase modifications that do not affect inference.

Findings

01

Models tolerate up to 10% noise without performance loss

02

Proposed techniques significantly improve noise robustness in translation

03

Transformers are more resilient with the new training strategies

Abstract

Transformers (Vaswani et al., 2017) have brought a remarkable improvement in the performance of neural machine translation (NMT) systems but they could be surprisingly vulnerable to noise. In this work, we try to investigate how noise breaks Transformers and if there exist solutions to deal with such issues. There is a large body of work in the NMT literature on analyzing the behavior of conventional models for the problem of noise but Transformers are relatively understudied in this context. Motivated by this, we introduce a novel data-driven technique called Target Augmented Fine-tuning (TAFT) to incorporate noise during training. This idea is comparable to the well-known fine-tuning strategy. Moreover, we propose two other novel extensions to the original Transformer: Controlled Denoising (CD) and Dual-Channel Decoding (DCD), that modify the neural architecture as well as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Multi-Head Attention · Dropout · Softmax · Dense Connections · Label Smoothing · Attention Is All You Need