Evaluating Machine Translation Performance on Chinese Idioms with a   Blacklist Method

Yutong Shao; Rico Sennrich; Bonnie Webber; Federico Fancellu

arXiv:1711.07646·cs.CL·February 21, 2018·5 cites

Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method

Yutong Shao, Rico Sennrich, Bonnie Webber, Federico Fancellu

PDF

Open Access

TL;DR

This paper presents a new blacklist-based evaluation method for Chinese idiom translation in machine translation systems, highlighting the prevalence of literal translation errors and demonstrating the method's effectiveness in identifying such errors.

Contribution

Introduces a novel blacklist approach and dataset for evaluating idiom translation quality in machine translation systems.

Findings

01

46.1% of idioms are mistranslated in the test set

02

Literal translation errors are common in idiom translation

03

Blacklist method effectively detects literal translation errors

Abstract

Idiom translation is a challenging problem in machine translation because the meaning of idioms is non-compositional, and a literal (word-by-word) translation is likely to be wrong. In this paper, we focus on evaluating the quality of idiom translation of MT systems. We introduce a new evaluation method based on an idiom-specific blacklist of literal translations, based on the insight that the occurrence of any blacklisted words in the translation output indicates a likely translation error. We introduce a dataset, CIBB (Chinese Idioms Blacklists Bank), and perform an evaluation of a state-of-the-art Chinese-English neural MT system. Our evaluation confirms that a sizable number of idioms in our test set are mistranslated (46.1%), that literal translation error is a common error type, and that our blacklist method is effective at identifying literal translation errors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies