NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21
Sandeep Subramanian, Oleksii Hrinchuk, Virginia Adams, Oleksii, Kuchaiev

TL;DR
This paper describes NVIDIA NeMo's advanced neural machine translation systems for English-German and English-Russian tasks, utilizing multiple techniques like data augmentation, model ensembling, and domain-specific vocabularies to achieve state-of-the-art results at WMT21.
Contribution
The paper introduces a comprehensive NMT system with novel combinations of techniques for improved translation quality in news and biomedical domains.
Findings
Achieved a BLEU score of 39.5 on WMT'20 En-De test set
Outperformed previous best submissions in biomedical translation tasks
Utilized diverse data augmentation and model ensembling methods
Abstract
This paper provides an overview of NVIDIA NeMo's neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks. Our news task submissions for English-German (En-De) and English-Russian (En-Ru) are built on top of a baseline transformer-based sequence-to-sequence model. Specifically, we use a combination of 1) checkpoint averaging 2) model scaling 3) data augmentation with backtranslation and knowledge distillation from right-to-left factorized models 4) finetuning on test sets from previous years 5) model ensembling 6) shallow fusion decoding with transformer language models and 7) noisy channel re-ranking. Additionally, our biomedical task submission for English-Russian uses a biomedically biased vocabulary and is trained from scratch on news task data, medically relevant text curated from the news task dataset, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsKnowledge Distillation
