Bilingual Dictionary-based Language Model Pretraining for Neural Machine   Translation

Yusen Lin; Jiayong Lin; Shuaicheng Zhang; Haoying Dai

arXiv:2103.07040·cs.CL·March 15, 2021·1 cites

Bilingual Dictionary-based Language Model Pretraining for Neural Machine Translation

Yusen Lin, Jiayong Lin, Shuaicheng Zhang, Haoying Dai

PDF

Open Access

TL;DR

This paper introduces a Bilingual Dictionary-based Language Model (BDLM) that enhances neural machine translation by incorporating dictionary translation info during pretraining, improving performance especially in low-resource settings.

Contribution

The novel BDLM method integrates dictionary data into language model pretraining, reducing reliance on parallel corpora and improving translation quality.

Findings

01

Achieved over 8 BLEU improvement on Chinese-English translation.

02

Faster convergence and better rare word prediction.

03

Effective in low-resource Romanian-English translation.

Abstract

Recent studies have demonstrated a perceivable improvement on the performance of neural machine translation by applying cross-lingual language model pretraining (Lample and Conneau, 2019), especially the Translation Language Modeling (TLM). To alleviate the need for expensive parallel corpora by TLM, in this work, we incorporate the translation information from dictionaries into the pretraining process and propose a novel Bilingual Dictionary-based Language Model (BDLM). We evaluate our BDLM in Chinese, English, and Romanian. For Chinese-English, we obtained a 55.0 BLEU on WMT-News19 (Tiedemann, 2012) and a 24.3 BLEU on WMT20 news-commentary, outperforming the Vanilla Transformer (Vaswani et al., 2017) by more than 8.4 BLEU and 2.3 BLEU, respectively. According to our results, the BDLM also has advantages on convergence speed and predicting rare words. The increase in BLEU for WMT16…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Dropout · Attention Is All You Need · Label Smoothing · Adam · Residual Connection · Multi-Head Attention