Regularization techniques for fine-tuning in neural machine translation

Antonio Valerio Miceli Barone; Barry Haddow; Ulrich Germann and; Rico Sennrich

arXiv:1707.09920·cs.CL·August 1, 2017

Regularization techniques for fine-tuning in neural machine translation

Antonio Valerio Miceli Barone, Barry Haddow, Ulrich Germann and, Rico Sennrich

PDF

TL;DR

This paper explores regularization methods, including a new technique called tuneout, to improve domain adaptation in neural machine translation, demonstrating enhanced performance on specific language pairs.

Contribution

It introduces tuneout, a novel regularization method, and evaluates multiple techniques for effective supervised domain adaptation in NMT.

Findings

01

Tuneout improves translation quality.

02

A logarithmic relationship exists between data amount and BLEU score gain.

03

Regularization techniques reduce overfitting in domain adaptation.

Abstract

We investigate techniques for supervised domain adaptation for neural machine translation where an existing model trained on a large out-of-domain dataset is adapted to a small in-domain dataset. In this scenario, overfitting is a major challenge. We investigate a number of techniques to reduce overfitting and improve transfer learning, including regularization techniques such as dropout and L2-regularization towards an out-of-domain prior. In addition, we introduce tuneout, a novel regularization technique inspired by dropout. We apply these techniques, alone and in combination, to neural machine translation, obtaining improvements on IWSLT datasets for English->German and English->Russian. We also investigate the amounts of in-domain training data needed for domain adaptation in NMT, and find a logarithmic relationship between the amount of training data and gain in BLEU score.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.