A simple discriminative training method for machine translation with large-scale features
Tian Xia, Shaodan Zhai, Shaojun Wang

TL;DR
This paper introduces a new discriminative training method for statistical machine translation that simplifies implementation while maintaining robustness and effectiveness with large-scale features.
Contribution
A novel training approach that treats N-best lists as permutations and minimizes Plackett-Luce loss, offering an easier-to-implement alternative to MIRAs.
Findings
More robust than MERT in experiments
Comparable to MIRAs in performance
Simpler to implement than MIRAs
Abstract
Margin infused relaxed algorithms (MIRAs) dominate model tuning in statistical machine translation in the case of large scale features, but also they are famous for the complexity in implementation. We introduce a new method, which regards an N-best list as a permutation and minimizes the Plackett-Luce loss of ground-truth permutations. Experiments with large-scale features demonstrate that, the new method is more robust than MERT; though it is only matchable with MIRAs, it has a comparatively advantage, easier to implement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
