WeChat Neural Machine Translation Systems for WMT21
Xianfeng Zeng, Yijin Liu, Ernan Li, Qiu Ran, Fandong Meng, Peng Li,, Jinan Xu, Jie Zhou

TL;DR
This paper presents WeChat AI's advanced neural machine translation systems for multiple language pairs in WMT21, utilizing novel Transformer variants, extensive data augmentation, and ensemble techniques to achieve top BLEU scores.
Contribution
Introduction of effective Transformer variants and comprehensive data augmentation strategies for high-performance machine translation in WMT21.
Findings
Achieved top BLEU scores among constrained systems for all language pairs.
Utilized large-scale synthetic data generation methods like back-translation.
Implemented advanced finetuning and model ensemble techniques.
Abstract
This paper introduces WeChat AI's participation in WMT 2021 shared news translation task on English->Chinese, English->Japanese, Japanese->English and English->German. Our systems are based on the Transformer (Vaswani et al., 2017) with several novel and effective variants. In our experiments, we employ data filtering, large-scale synthetic data generation (i.e., back-translation, knowledge distillation, forward-translation, iterative in-domain knowledge transfer), advanced finetuning approaches, and boosted Self-BLEU based model ensemble. Our constrained systems achieve 36.9, 46.9, 27.8 and 31.3 case-sensitive BLEU scores on English->Chinese, English->Japanese, Japanese->English and English->German, respectively. The BLEU scores of English->Chinese, English->Japanese and Japanese->English are the highest among all submissions, and that of English->German is the highest among all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Dense Connections · Adam · Label Smoothing · Residual Connection · Byte Pair Encoding
