Morphology Generation for Statistical Machine Translation
Sreelekha S, Pushpak Bhattacharyya

TL;DR
This paper proposes a morphology injection method that enriches training data with morphological forms to improve statistical machine translation quality for morphologically rich languages like Hindi and Marathi.
Contribution
It introduces a simple, effective data enrichment technique that alleviates data sparsity issues in SMT for morphologically rich languages, enhancing translation performance.
Findings
Improved translation quality in Hindi and Marathi.
Morphology injection reduces data sparsity effects.
Enhances both automatic and subjective evaluation scores.
Abstract
When translating into morphologically rich languages, Statistical MT approaches face the problem of data sparsity. The severity of the sparseness problem will be high when the corpus size of morphologically richer language is less. Even though we can use factored models to correctly generate morphological forms of words, the problem of data sparseness limits their performance. In this paper, we describe a simple and effective solution which is based on enriching the input corpora with various morphological forms of words. We use this method with the phrase-based and factor-based experiments on two morphologically rich languages: Hindi and Marathi when translating from English. We evaluate the performance of our experiments both in terms automatic evaluation and subjective evaluation such as adequacy and fluency. We observe that the morphology injection method helps in improving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
