Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation
Songming Zhang, Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jian, Liu, Jie Zhou

TL;DR
This paper introduces a target-context-aware statistical metric called CBMI for adaptive training in neural machine translation, improving translation quality by better weighting target tokens based on context.
Contribution
The paper proposes CBMI, a novel efficient metric that incorporates target context into token weighting, enhancing adaptive training for neural machine translation.
Findings
Significant improvements over baseline models on WMT datasets.
Efficient computation of CBMI without large storage overhead.
Outperforms existing adaptive training methods.
Abstract
Token-level adaptive training approaches can alleviate the token imbalance problem and thus improve neural machine translation, through re-weighting the losses of different target tokens based on specific statistical metrics (e.g., token frequency or mutual information). Given that standard translation models make predictions on the condition of previous target contexts, we argue that the above statistical metrics ignore target context information and may assign inappropriate weights to target tokens. While one possible solution is to directly take target contexts into these statistical metrics, the target-context-aware statistical computing is extremely expensive, and the corresponding storage overhead is unrealistic. To solve the above issues, we propose a target-context-aware metric, named conditional bilingual mutual information (CBMI), which makes it feasible to supplement target…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Dense Connections · Residual Connection · Label Smoothing · Softmax · Adam · Absolute Position Encodings
