TL;DR
This paper investigates how verbal multiword expressions negatively impact machine translation quality across multiple languages, highlighting the need for better handling of these expressions in translation systems.
Contribution
It provides a comprehensive analysis of VMWE effects on MT, introduces an evaluation framework, and releases code for community use.
Findings
VMWEs significantly reduce translation quality.
Degradation is mainly due to VMWE itself, not overall sentence difficulty.
State-of-the-art MT systems struggle with VMWEs.
Abstract
Verbal multiword expressions (VMWEs) remain difficult for machine translation because their meanings are often not recoverable from their component words. In this study, we analyze the impact of three VMWE categories -- verbal idioms, verb-particle constructions, and light verb constructions -- on machine translation quality from English to multiple languages. Using both established multiword expression datasets and standard machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality, with deeper analysis indicating that this degradation is primarily attributable to the VMWE itself rather than general sentence-level difficulty. We release our code and evaluation framework to test new MT systems for the community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
