Role of Morphology Injection in Statistical Machine Translation
Sreelekha S, Pushpak Bhattacharyya

TL;DR
This paper explores the use of morphology injection in factored statistical machine translation models to improve translation quality from English to morphologically rich languages like Hindi and Marathi.
Contribution
It introduces a method to generate and inject unseen morphological forms into training data, enhancing translation performance for morphologically complex languages.
Findings
Morphology injection improves translation adequacy.
Morphology injection enhances fluency of translated text.
Method is effective for Hindi and Marathi translations.
Abstract
Phrase-based Statistical models are more commonly used as they perform optimally in terms of both, translation quality and complexity of the system. Hindi and in general all Indian languages are morphologically richer than English. Hence, even though Phrase-based systems perform very well for the less divergent language pairs, for English to Indian language translation, we need more linguistic information (such as morphology, parse tree, parts of speech tags, etc.) on the source side. Factored models seem to be useful in this case, as Factored models consider word as a vector of factors. These factors can contain any information about the surface word and use it while translating. Hence, the objective of this work is to handle morphological inflections in Hindi and Marathi using Factored translation models while translating from English. SMT approaches face the problem of data sparsity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
