On the Difficulty of Translating Free-Order Case-Marking Languages

Arianna Bisazza; Ahmet \"Ust\"un; Stephan Sportel

arXiv:2107.06055·cs.CL·July 14, 2021

On the Difficulty of Translating Free-Order Case-Marking Languages

Arianna Bisazza, Ahmet \"Ust\"un, Stephan Sportel

PDF

2 Repos

TL;DR

This paper investigates whether free-order case-marking languages are more difficult to translate with neural models, finding that word order flexibility has limited impact on translation quality, but resource constraints still favor fixed-order languages.

Contribution

The study introduces a translation challenge set and synthetic languages to analyze the impact of word order and case marking on NMT performance across different resource levels.

Findings

01

Word order flexibility causes minimal NMT quality loss.

02

Case marking improves disambiguation in free-order languages.

03

Fixed-order languages outperform in low-resource settings.

Abstract

Identifying factors that make certain languages harder to model than others is essential to reach language equality in future Natural Language Processing technologies. Free-order case-marking languages, such as Russian, Latin or Tamil, have proved more challenging than fixed-order languages for the tasks of syntactic parsing and subject-verb agreement prediction. In this work, we investigate whether this class of languages is also more difficult to translate by state-of-the-art Neural Machine Translation models (NMT). Using a variety of synthetic languages and a newly introduced translation challenge set, we find that word order flexibility in the source language only leads to a very small loss of NMT quality, even though the core verb arguments become impossible to disambiguate in sentences without semantic cues. The latter issue is indeed solved by the addition of case marking.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.