WeChat Neural Machine Translation Systems for WMT20

Fandong Meng; Jianhao Yan; Yijin Liu; Yuan Gao; Xianfeng Zeng; Qinsong; Zeng; Peng Li; Ming Chen; Jie Zhou; Sifan Liu; Hao Zhou

arXiv:2010.00247·cs.CL·November 22, 2020

WeChat Neural Machine Translation Systems for WMT20

Fandong Meng, Jianhao Yan, Yijin Liu, Yuan Gao, Xianfeng Zeng, Qinsong, Zeng, Peng Li, Ming Chen, Jie Zhou, Sifan Liu, Hao Zhou

PDF

TL;DR

This paper presents a Chinese to English neural machine translation system for WMT20, utilizing Transformer variants, data augmentation, and ensemble methods to achieve the highest BLEU score among submissions.

Contribution

It introduces a combination of data selection, synthetic data generation, and advanced finetuning techniques within a Transformer-based framework for improved translation quality.

Findings

01

Achieved 36.9 BLEU score, the highest among submissions.

02

Effective use of synthetic data and ensemble methods.

03

Demonstrated the benefit of in-domain knowledge transfer.

Abstract

We participate in the WMT 2020 shared news translation task on Chinese to English. Our system is based on the Transformer (Vaswani et al., 2017a) with effective variants and the DTMT (Meng and Zhang, 2019) architecture. In our experiments, we employ data selection, several synthetic data generation approaches (i.e., back-translation, knowledge distillation, and iterative in-domain knowledge transfer), advanced finetuning approaches and self-bleu based model ensemble. Our constrained Chinese to English system achieves 36.9 case-sensitive BLEU score, which is the highest among all submissions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Dropout · Layer Normalization · Byte Pair Encoding · Label Smoothing · Multi-Head Attention · Attention Is All You Need