Conditional Bilingual Mutual Information Based Adaptive Training for   Neural Machine Translation

Songming Zhang; Yijin Liu; Fandong Meng; Yufeng Chen; Jinan Xu; Jian; Liu; Jie Zhou

arXiv:2203.02951·cs.CL·March 8, 2022

Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation

Songming Zhang, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jian, Liu, Jie Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a target-context-aware statistical metric called CBMI for adaptive training in neural machine translation, improving translation quality by better weighting target tokens based on context.

Contribution

The paper proposes CBMI, a novel efficient metric that incorporates target context into token weighting, enhancing adaptive training for neural machine translation.

Findings

01

Significant improvements over baseline models on WMT datasets.

02

Efficient computation of CBMI without large storage overhead.

03

Outperforms existing adaptive training methods.

Abstract

Token-level adaptive training approaches can alleviate the token imbalance problem and thus improve neural machine translation, through re-weighting the losses of different target tokens based on specific statistical metrics (e.g., token frequency or mutual information). Given that standard translation models make predictions on the condition of previous target contexts, we argue that the above statistical metrics ignore target context information and may assign inappropriate weights to target tokens. While one possible solution is to directly take target contexts into these statistical metrics, the target-context-aware statistical computing is extremely expensive, and the corresponding storage overhead is unrealistic. To solve the above issues, we propose a target-context-aware metric, named conditional bilingual mutual information (CBMI), which makes it feasible to supplement target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

songmzhang/cbmi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Residual Connection · Label Smoothing · Softmax · Adam · Absolute Position Encodings