Investigating Neural Machine Translation for Low-Resource Languages:   Using Bavarian as a Case Study

Wan-Hua Her; Udo Kruschwitz

arXiv:2404.08259·cs.CL·April 15, 2024·2 cites

Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study

Wan-Hua Her, Udo Kruschwitz

PDF

Open Access 1 Repo

TL;DR

This paper explores neural machine translation for low-resource languages by focusing on Bavarian, employing techniques like back-translation and transfer learning to improve translation quality amid data scarcity.

Contribution

It demonstrates effective strategies for low-resource NMT, including extensive data preprocessing and leveraging language similarity, with empirical results showing significant improvements.

Findings

01

Back-translation significantly improves translation quality.

02

High baseline performance observed despite data limitations.

03

Extensive preprocessing reduces noise and enhances results.

Abstract

Machine Translation has made impressive progress in recent years offering close to human-level performance on many languages, but studies have primarily focused on high-resource languages with broad online presence and resources. With the help of growing Large Language Models, more and more low-resource languages achieve better results through the presence of other languages. However, studies have shown that not all low-resource languages can benefit from multilingual systems, especially those with insufficient training and evaluation data. In this paper, we revisit state-of-the-art Neural Machine Translation techniques to develop automatic translation systems between German and Bavarian. We investigate conditions of low-resource languages such as data scarcity and parameter sensitivity and focus on refined solutions that combat low-resource difficulties and creative solutions such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

whher/nmt-de-bar
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsFocus