Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish
Alp \"Oktem, Rodolfo Zevallos, Yasmin Moslem, G\"une\c{s} \"Ozt\"urk,, Karen \c{S}arhon

TL;DR
This paper develops machine translation and speech synthesis tools to aid in the preservation and revitalization of Judeo-Spanish, leveraging synthetic data and community resources to address language extinction in the digital era.
Contribution
It introduces a rule-based translation system and neural models for Judeo-Spanish, along with a speech corpus, to support language revitalization efforts.
Findings
Created synthetic parallel data for translation
Built neural machine translation models for Judeo-Spanish
Developed a speech synthesis engine with a new speech corpus
Abstract
We develop machine translation and speech synthesis systems to complement the efforts of revitalizing Judeo-Spanish, the exiled language of Sephardic Jews, which survived for centuries, but now faces the threat of extinction in the digital age. Building on resources created by the Sephardic community of Turkey and elsewhere, we create corpora and tools that would help preserve this language for future generations. For machine translation, we first develop a Spanish to Judeo-Spanish rule-based machine translation system, in order to generate large volumes of synthetic parallel data in the relevant language pairs: Turkish, English and Spanish. Then, we train baseline neural machine translation engines using this synthetic data and authentic parallel data created from translations by the Sephardic community. For text-to-speech synthesis, we present a 3.5 hour single speaker speech corpus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
