Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
Pierre-Edouard Honnet, Andrei Popescu-Belis, Claudiu Musat, Michael, Baeriswyl

TL;DR
This paper develops a machine translation system for Swiss German dialects, utilizing normalization strategies and neural models to improve translation quality despite limited resources.
Contribution
It introduces and compares normalization strategies, demonstrating that character-based neural MT effectively enhances translation of low-resource Swiss German dialects.
Findings
Character-based neural MT outperforms other methods for normalization.
Achieved 36% BLEU score on Bernese dialect translation.
Performance decreases with more distant dialects and topics.
Abstract
The goal of this work is to design a machine translation (MT) system for a low-resource family of dialects, collectively known as Swiss German, which are widely spoken in Switzerland but seldom written. We collected a significant number of parallel written resources to start with, up to a total of about 60k words. Moreover, we identified several other promising data sources for Swiss German. Then, we designed and compared three strategies for normalizing Swiss German input in order to address the regional diversity. We found that character-based neural MT was the best solution for text normalization. In combination with phrase-based statistical MT, our solution reached 36% BLEU score when translating from the Bernese dialect. This value, however, decreases as the testing data becomes more remote from the training one, geographically and topically. These resources and normalization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
