Coordinate System Selection for Minimum Error Rate Training in Statistical Machine Translation
Chen Lijiang

TL;DR
This paper proposes a coordinate system selection method for minimum error rate training in statistical machine translation, improving convergence and alignment with feature distributions without additional language knowledge.
Contribution
It introduces coordinate system selection into MERT, creating multiple search directions to enhance training effectiveness and avoid local optima.
Findings
Better translation quality results with coordinate system selection.
No additional language knowledge required for improved performance.
Enhanced convergence properties of MERT.
Abstract
Minimum error rate training (MERT) is a widely used training procedure for statistical machine translation. A general problem of this approach is that the search space is easy to converge to a local optimum and the acquired weight set is not in accord with the real distribution of feature functions. This paper introduces coordinate system selection (RSS) into the search algorithm for MERT. Contrary to previous approaches in which every dimension only corresponds to one independent feature function, we create several coordinate systems by moving one of the dimensions to a new direction. The basic idea is quite simple but critical that the training procedure of MERT should be based on a coordinate system formed by search directions but not directly on feature functions. Experiments show that by selecting coordinate systems with tuning set results, better results can be obtained without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
