The Volctrans Machine Translation System for WMT20
Liwei Wu, Xiao Pan, Zehui Lin, Yaoming Zhu, Mingxuan Wang, Lei Li

TL;DR
This paper presents the VolcTrans machine translation system developed for the WMT20 shared task, utilizing Transformer variants, data augmentation, and multilingual pre-training to improve translation quality across eight language pairs.
Contribution
The paper introduces a comprehensive translation system combining Transformer variants, data selection, synthetic data, ensemble methods, and multilingual pre-training for WMT20.
Findings
Achieved competitive translation performance across multiple language pairs.
Demonstrated effectiveness of data augmentation and ensemble techniques.
Showcased benefits of multilingual pre-training in translation quality.
Abstract
This paper describes our VolcTrans system on WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer, with several variants (wider or deeper Transformers, dynamic convolutions). The final system includes text pre-process, data selection, synthetic data generation, advanced model ensemble, and multilingual pre-training.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Attention Is All You Need · Residual Connection · Multi-Head Attention · Layer Normalization · Byte Pair Encoding · Softmax · Adam
