Quantifying Synthesis and Fusion and their Impact on Machine Translation
Arturo Oncevay, Duygu Ataman, Niels van Berkel, Barry Haddow, and Alexandra Birch, Johannes Bjerva

TL;DR
This paper introduces a method to quantify morphological diversity in languages using synthesis and fusion indices, and analyzes their impact on machine translation quality across multiple language pairs.
Contribution
It proposes a novel approach to measure morphological typology at word and segment levels and examines their influence on machine translation performance.
Findings
Both synthesis and fusion indices significantly affect translation quality.
Unsupervised segmentation methods are effective for measuring synthesis.
Human evaluation supports the impact of morphological typology on MT quality.
Abstract
Theoretical work in morphological typology offers the possibility of measuring morphological diversity on a continuous scale. However, literature in Natural Language Processing (NLP) typically labels a whole language with a strict type of morphology, e.g. fusional or agglutinative. In this work, we propose to reduce the rigidity of such claims, by quantifying morphological typology at the word and segment level. We consider Payne (2017)'s approach to classify morphology using two indices: synthesis (e.g. analytic to polysynthetic) and fusion (agglutinative to fusional). For computing synthesis, we test unsupervised and supervised morphological segmentation methods for English, German and Turkish, whereas for fusion, we propose a semi-automatic method using Spanish as a case study. Then, we analyse the relationship between machine translation quality and the degree of synthesis and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Text Readability and Simplification
