Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study
Wan-Hua Her, Udo Kruschwitz

TL;DR
This paper explores neural machine translation for low-resource languages by focusing on Bavarian, employing techniques like back-translation and transfer learning to improve translation quality amid data scarcity.
Contribution
It demonstrates effective strategies for low-resource NMT, including extensive data preprocessing and leveraging language similarity, with empirical results showing significant improvements.
Findings
Back-translation significantly improves translation quality.
High baseline performance observed despite data limitations.
Extensive preprocessing reduces noise and enhances results.
Abstract
Machine Translation has made impressive progress in recent years offering close to human-level performance on many languages, but studies have primarily focused on high-resource languages with broad online presence and resources. With the help of growing Large Language Models, more and more low-resource languages achieve better results through the presence of other languages. However, studies have shown that not all low-resource languages can benefit from multilingual systems, especially those with insufficient training and evaluation data. In this paper, we revisit state-of-the-art Neural Machine Translation techniques to develop automatic translation systems between German and Bavarian. We investigate conditions of low-resource languages such as data scarcity and parameter sensitivity and focus on refined solutions that combat low-resource difficulties and creative solutions such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsFocus
