Ukrainian-to-English folktale corpus: Parallel corpus creation and   augmentation for machine translation in low-resource languages

Olena Burda-Lassen

arXiv:2410.10063·cs.CL·October 15, 2024

Ukrainian-to-English folktale corpus: Parallel corpus creation and augmentation for machine translation in low-resource languages

Olena Burda-Lassen

PDF

Open Access

TL;DR

This paper introduces a new Ukrainian-English folktale parallel corpus, enhancing resources for machine translation in low-resource languages by combining domain-specific curation and augmentation methods.

Contribution

It presents a novel Ukrainian-English folktale corpus with alignment and augmentation strategies tailored for machine translation in low-resource settings.

Findings

01

Corpus is word and sentence-aligned for optimal meaning preservation.

02

Augmentation improves the size and diversity of the dataset.

03

Resource supports better machine translation for low-resource languages.

Abstract

Folktales are linguistically very rich and culturally significant in understanding the source language. Historically, only human translation has been used for translating folklore. Therefore, the number of translated texts is very sparse, which limits access to knowledge about cultural traditions and customs. We have created a new Ukrainian-To-English parallel corpus of familiar Ukrainian folktales based on available English translations and suggested several new ones. We offer a combined domain-specific approach to building and augmenting this corpus, considering the nature of the domain and differences in the purpose of human versus machine translation. Our corpus is word and sentence-aligned, allowing for the best curation of meaning, specifically tailored for use as training data for machine translation models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Folklore, Mythology, and Literature Studies