Indowordnets help in Indian Language Machine Translation
Sreelekha S, Pushpak Bhattacharyya

TL;DR
This study demonstrates that augmenting statistical machine translation models with Indowordnet synset entries significantly improves translation quality for resource-poor Indian languages, as shown by various evaluation metrics.
Contribution
The paper introduces a method of enhancing Indian language machine translation systems by integrating Indowordnet lexical database, leading to notable performance improvements.
Findings
Significant BLEU, METEOR, and TER score improvements after using Indowordnet.
Lexical database augmentation helps handle lexical ambiguity effectively.
Resource-efficient approach for improving translation in low-resource languages.
Abstract
Being less resource languages, Indian-Indian and English-Indian language MT system developments faces the difficulty to translate various lexical phenomena. In this paper, we present our work on a comparative study of 440 phrase-based statistical trained models for 110 language pairs across 11 Indian languages. We have developed 110 baseline Statistical Machine Translation systems. Then we have augmented the training corpus with Indowordnet synset word entries of lexical database and further trained 110 models on top of the baseline system. We have done a detailed performance comparison using various evaluation metrics such as BLEU score, METEOR and TER. We observed significant improvement in evaluations of translation quality across all the 440 models after using the Indowordnet. These experiments give a detailed insight in two ways : (1) usage of lexical database with synset mapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
