VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022
Hai Long Trieu, Song Kiet Bui, Tan Minh Tran, Van Khanh Tran, Hai An, Nguyen

TL;DR
This paper describes neural Transformer-based Chinese-Vietnamese translation systems using mBART and backtranslation, achieving high BLEU scores in the VLSP 2022 shared task.
Contribution
It introduces a multilingual denoising pre-trained model with backtranslation sampling and ensemble techniques for improved translation quality.
Findings
Achieved 38.9 BLEU on Chinese-Vietnamese translation.
Achieved 38.0 BLEU on Vietnamese-Chinese translation.
Outperformed several strong baselines.
Abstract
We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method for backtranslation, which leverage large scale available monolingual data. Additionally, several other methods are applied to improve the translation quality including ensembling and postprocessing. We achieve 38.9 BLEU on ChineseVietnamese and 38.0 BLEU on VietnameseChinese on the public test sets, which outperform several strong baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Adam · Softmax · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Linear Layer · Residual Connection
