Unsupervised Machine Translation On Dravidian Languages
Sai Koneru, Danni Liu, Jan Niehues

TL;DR
This paper advances unsupervised machine translation for low-resource Dravidian languages, especially Kannada, by leveraging auxiliary data and language similarity metrics to improve translation quality.
Contribution
It introduces methods for utilizing auxiliary languages and a similarity metric to enhance UNMT between English and Kannada, a low-resource Dravidian language.
Findings
Auxiliary data from related languages improves UNMT performance.
Unifying writing systems is crucial for effective translation.
Language similarity metric predicts beneficial auxiliary languages.
Abstract
Unsupervised neural machine translation (UNMT) is beneficial especially for low resource languages such as those from the Dravidian family. However, UNMT systems tend to fail in realistic scenarios involving actual low resource languages. Recent works propose to utilize auxiliary parallel data and have achieved state-of-the-art results. In this work, we focus on unsupervised translation between English and Kannada, a low resource Dravidian language. We additionally utilize a limited amount of auxiliary data between English and other related Dravidian languages. We show that unifying the writing systems is essential in unsupervised translation between the Dravidian languages. We explore several model architectures that use the auxiliary data in order to maximize knowledge sharing and enable UNMT for distant language pairs. Our experiments demonstrate that it is crucial to include…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
