Evaluating the Impact of Verbal Multiword Expressions on Machine Translation

Linfeng Liu; Saptarshi Ghosh; Tianyu Jiang

arXiv:2508.17458·cs.CL·April 21, 2026

Evaluating the Impact of Verbal Multiword Expressions on Machine Translation

Linfeng Liu, Saptarshi Ghosh, Tianyu Jiang

PDF

1 Repo

TL;DR

This paper investigates how verbal multiword expressions negatively impact machine translation quality across multiple languages, highlighting the need for better handling of these expressions in translation systems.

Contribution

It provides a comprehensive analysis of VMWE effects on MT, introduces an evaluation framework, and releases code for community use.

Findings

01

VMWEs significantly reduce translation quality.

02

Degradation is mainly due to VMWE itself, not overall sentence difficulty.

03

State-of-the-art MT systems struggle with VMWEs.

Abstract

Verbal multiword expressions (VMWEs) remain difficult for machine translation because their meanings are often not recoverable from their component words. In this study, we analyze the impact of three VMWE categories -- verbal idioms, verb-particle constructions, and light verb constructions -- on machine translation quality from English to multiple languages. Using both established multiword expression datasets and standard machine translation datasets, we evaluate how state-of-the-art translation systems handle these expressions. Our experimental results consistently show that VMWEs negatively affect translation quality, with deeper analysis indicating that this degradation is primarily attributable to the VMWE itself rather than general sentence-level difficulty. We release our code and evaluation framework to test new MT systems for the community.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

null
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.