ChrEn: Cherokee-English Machine Translation for Endangered Language Revitalization
Shiyue Zhang, Benjamin Frey, Mohit Bansal

TL;DR
This paper introduces ChrEn, a low-resource Cherokee-English dataset and translation systems, to support machine translation research aimed at revitalizing the endangered Cherokee language.
Contribution
It provides the first Cherokee-English parallel dataset, explores various translation models, and demonstrates effective semi-supervised methods for low-resource language translation.
Findings
Best BLEU scores: 15.8/12.7 for in-domain translation
Out-of-domain BLEU scores: 6.5/5.0
Semi-supervised methods improve translation quality
Abstract
Cherokee is a highly endangered Native American language spoken by the Cherokee people. The Cherokee culture is deeply embedded in its language. However, there are approximately only 2,000 fluent first language Cherokee speakers remaining in the world, and the number is declining every year. To help save this endangered language, we introduce ChrEn, a Cherokee-English parallel dataset, to facilitate machine translation research between Cherokee and English. Compared to some popular machine translation language pairs, ChrEn is extremely low-resource, only containing 14k sentence pairs in total. We split our parallel data in ways that facilitate both in-domain and out-of-domain evaluation. We also collect 5k Cherokee monolingual data to enable semi-supervised learning. Besides these datasets, we propose several Cherokee-English and English-Cherokee machine translation systems. We compare…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
