Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses
Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen

TL;DR
This paper presents a novel fine-tuning approach for neural machine translation that enhances pronoun translation accuracy and BLEU scores without additional data, using hybrid loss functions and targeted objectives.
Contribution
It introduces a class of conditional hybrid losses for fine-tuning NMT models, improving pronoun translation and overall BLEU scores across multiple language pairs without extra data.
Findings
Sentence-level model improves BLEU by 0.5 on WMT14 and IWSLT13 datasets.
Contextual model achieves BLEU improvements from 31.81 to 32 and 32.10 to 33.13.
Method generalizes well to additional language pairs, Fr-En and Cs-En.
Abstract
Popular Neural Machine Translation model training uses strategies like backtranslation to improve BLEU scores, requiring large amounts of additional data and training. We introduce a class of conditional generative-discriminative hybrid losses that we use to fine-tune a trained machine translation model. Through a combination of targeted fine-tuning objectives and intuitive re-use of the training data the model has failed to adequately learn from, we improve the model performance of both a sentence-level and a contextual model without using any additional data. We target the improvement of pronoun translations through our fine-tuning and evaluate our models on a pronoun benchmark testset. Our sentence-level model shows a 0.5 BLEU improvement on both the WMT14 and the IWSLT13 De-En testsets, while our contextual model achieves the best results, improving from 31.81 to 32 BLEU on WMT14…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
