HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric Looking into Multi-Word Expressions
Lifeng Han

TL;DR
HilMeMe is a human-in-the-loop evaluation metric focusing on idiomatic and terminological Multi-word Expressions to better assess the quality of Machine Translation systems, especially in recognizing and translating MWEs accurately.
Contribution
This paper introduces a linguistically motivated evaluation metric that emphasizes the importance of MWEs in distinguishing MT system quality, addressing limitations of existing metrics like BLEU.
Findings
MWEs are crucial for evaluating MT quality.
HilMeMe effectively distinguishes MT systems based on MWE translation.
The metric improves assessment accuracy for NMT outputs.
Abstract
With the fast development of Machine Translation (MT) systems, especially the new boost from Neural MT (NMT) models, the MT output quality has reached a new level of accuracy. However, many researchers criticised that the current popular evaluation metrics such as BLEU can not correctly distinguish the state-of-the-art NMT systems regarding quality differences. In this short paper, we describe the design and implementation of a linguistically motivated human-in-the-loop evaluation metric looking into idiomatic and terminological Multi-word Expressions (MWEs). MWEs have played a bottleneck in many Natural Language Processing (NLP) tasks including MT. MWEs can be used as one of the main factors to distinguish different MT systems by looking into their capabilities in recognising and translating MWEs in an accurate and meaning equivalent manner.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
