Bilingual Dictionary-based Language Model Pretraining for Neural Machine Translation
Yusen Lin, Jiayong Lin, Shuaicheng Zhang, Haoying Dai

TL;DR
This paper introduces a Bilingual Dictionary-based Language Model (BDLM) that enhances neural machine translation by incorporating dictionary translation info during pretraining, improving performance especially in low-resource settings.
Contribution
The novel BDLM method integrates dictionary data into language model pretraining, reducing reliance on parallel corpora and improving translation quality.
Findings
Achieved over 8 BLEU improvement on Chinese-English translation.
Faster convergence and better rare word prediction.
Effective in low-resource Romanian-English translation.
Abstract
Recent studies have demonstrated a perceivable improvement on the performance of neural machine translation by applying cross-lingual language model pretraining (Lample and Conneau, 2019), especially the Translation Language Modeling (TLM). To alleviate the need for expensive parallel corpora by TLM, in this work, we incorporate the translation information from dictionaries into the pretraining process and propose a novel Bilingual Dictionary-based Language Model (BDLM). We evaluate our BDLM in Chinese, English, and Romanian. For Chinese-English, we obtained a 55.0 BLEU on WMT-News19 (Tiedemann, 2012) and a 24.3 BLEU on WMT20 news-commentary, outperforming the Vanilla Transformer (Vaswani et al., 2017) by more than 8.4 BLEU and 2.3 BLEU, respectively. According to our results, the BDLM also has advantages on convergence speed and predicting rare words. The increase in BLEU for WMT16…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout · Attention Is All You Need · Label Smoothing · Adam · Residual Connection · Multi-Head Attention
