Linguistic evaluation of German-English Machine Translation using a Test Suite
Eleftherios Avramidis, Vivien Macketanz, Ursula Strohriegel, Hans, Uszkoreit

TL;DR
This paper evaluates German-English machine translation systems using a grammatical test suite, revealing persistent errors in idioms, modals, and multi-word expressions, despite some improvements over the previous year.
Contribution
It introduces a detailed grammatical test suite analysis for German-English MT systems, providing insights into specific linguistic phenomena and their translation accuracy.
Findings
Systems still translate 25% of test items incorrectly
Improvements noted in function words and punctuation
Persistent errors in idioms and multi-word expressions
Abstract
We present the results of the application of a grammatical test suite for GermanEnglish MT on the systems submitted at WMT19, with a detailed analysis for 107 phenomena organized in 14 categories. The systems still translate wrong one out of four test items in average. Low performance is indicated for idioms, modals, pseudo-clefts, multi-word expressions and verb valency. When compared to last year, there has been a improvement of function words, non-verbal agreement and punctuation. More detailed conclusions about particular systems and phenomena are also presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
