Multi-Dimensional Machine Translation Evaluation: Model Evaluation and Resource for Korean
Dojun Park, Sebastian Pad\'o

TL;DR
This paper introduces a new Korean-English MT evaluation benchmark based on MQM, and demonstrates how multi-task models can predict detailed quality scores, enhancing interpretability of translation quality assessment.
Contribution
It provides the first MQM benchmark for Korean-English translation and applies multi-task learning with state-of-the-art models for fine-grained MT quality prediction.
Findings
Reference-free models excel in style dimension.
Reference-based models perform better in accuracy.
RemBERT is the most effective model for this task.
Abstract
Almost all frameworks for the manual or automatic evaluation of machine translation characterize the quality of an MT output with a single number. An exception is the Multidimensional Quality Metrics (MQM) framework which offers a fine-grained ontology of quality dimensions for scoring (such as style, fluency, accuracy, and terminology). Previous studies have demonstrated the feasibility of MQM annotation but there are, to our knowledge, no computational models that predict MQM scores for novel texts, due to a lack of resources. In this paper, we address these shortcomings by (a) providing a 1200-sentence MQM evaluation benchmark for the language pair English-Korean and (b) reframing MT evaluation as the multi-task problem of simultaneously predicting several MQM scores using SOTA language models, both in a reference-based MT evaluation setup and a reference-free quality estimation (QE)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsOntology
