TL;DR
This paper describes neural machine translation systems developed for WMT 2016, utilizing attentional models, BPE segmentation, and data augmentation techniques, achieving significant BLEU score improvements and top human evaluation results.
Contribution
Introduces neural translation systems with BPE and back-translation techniques, demonstrating substantial performance gains and competitive human evaluation outcomes.
Findings
BLEU score improvements of 4.3–11.2 over baselines
Top human evaluation results in 7 out of 8 directions
Effective use of back-translation and dropout techniques
Abstract
We participated in the WMT 2016 shared news translation task by building neural translation systems for four language pairs, each trained in both directions: English<->Czech, English<->German, English<->Romanian and English<->Russian. Our systems are based on an attentional encoder-decoder, using BPE subword segmentation for open-vocabulary translation with a fixed vocabulary. We experimented with using automatic back-translations of the monolingual News corpus as additional training data, pervasive dropout, and target-bidirectional models. All reported methods give substantial improvements, and we see improvements of 4.3--11.2 BLEU over our baseline systems. In the human evaluation, our systems were the (tied) best constrained system for 7 out of 8 translation directions in which we participated.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsByte Pair Encoding
