Neural Machine Translation for Low-Resourced Indian Languages
Himanshu Choudhary, Shivansh Rao, Rajesh Rohilla

TL;DR
This paper presents a neural machine translation system for low-resource Indian languages, specifically English-Tamil and English-Malayalam, using advanced attention mechanisms and subword embeddings to improve translation quality over existing tools.
Contribution
The paper introduces a novel NMT model with multihead self-attention and combined BPE embeddings to address OOV issues in low-resource, morphologically rich Indian languages.
Findings
Proposed model achieves BLEU scores of 24.34 and 9.78 for Tamil and Malayalam.
Outperforms Google Translate in BLEU score for both language pairs.
Refined and curated corpus enhances translation quality.
Abstract
A large number of significant assets are available online in English, which is frequently translated into native languages to ease the information sharing among local people who are not much familiar with English. However, manual translation is a very tedious, costly, and time-taking process. To this end, machine translation is an effective approach to convert text to a different language without any human involvement. Neural machine translation (NMT) is one of the most proficient translation techniques amongst all existing machine translation systems. In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam. We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system that overcomes the OOV (Out Of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
