Massively Multilingual Text Translation For Low-Resource Languages

Zhong Zhou

arXiv:2401.16582·cs.CL·January 31, 2024·1 cites

Massively Multilingual Text Translation For Low-Resource Languages

Zhong Zhou

PDF

Open Access

TL;DR

This paper explores leveraging multilingual resources and strategic adaptation of pretrained models to improve translation quality for low-resource languages, especially for specific limited texts, enabling better collaboration with human translators.

Contribution

It introduces a method for translating limited, well-known texts into low-resource languages by leveraging rich-resource language data and domain adaptation of multilingual models.

Findings

01

Performance improves with careful language family selection.

02

Domain adaptation enhances translation quality.

03

Massive source parallelism reduces human translation effort.

Abstract

Translation into severely low-resource languages has both the cultural goal of saving and reviving those languages and the humanitarian goal of assisting the everyday needs of local communities that are accelerated by the recent COVID-19 pandemic. In many humanitarian efforts, translation into severely low-resource languages often does not require a universal translation engine, but a dedicated text-specific translation engine. For example, healthcare records, hygienic procedures, government communication, emergency procedures and religious texts are all limited texts. While generic translation engines for all languages do not exist, translation of multilingually known limited texts into new, low-resource languages may be possible and reduce human translation effort. We attempt to leverage translation resources from rich-resource languages to efficiently produce best possible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques