NushuRescue: Revitalization of the Endangered Nushu Language with AI
Ivory Yang, Weicheng Ma, Soroush Vosoughi

TL;DR
NushuRescue is an AI framework that uses minimal data to train models for revitalizing endangered languages, demonstrated on the Nushu script with promising translation accuracy and new datasets.
Contribution
The paper introduces NushuRescue, a novel AI-driven framework for low-resource language revitalization, including the first publicly available Nushu dataset and models that require minimal human input.
Findings
Achieved 48.69% translation accuracy on Nushu sentences
Developed NCGold, the first Nushu-Chinese parallel corpus
Generated 98 new translated sentences (NCSilver)
Abstract
The preservation and revitalization of endangered and extinct languages is a meaningful endeavor, conserving cultural heritage while enriching fields like linguistics and anthropology. However, these languages are typically low-resource, making their reconstruction labor-intensive and costly. This challenge is exemplified by Nushu, a rare script historically used by Yao women in China for self-expression within a patriarchal society. To address this challenge, we introduce NushuRescue, an AI-driven framework designed to train large language models (LLMs) on endangered languages with minimal data. NushuRescue automates evaluation and expands target corpora to accelerate linguistic revitalization. As a foundational component, we developed NCGold, a 500-sentence Nushu-Chinese parallel corpus, the first publicly available dataset of its kind. Leveraging GPT-4-Turbo, with no prior exposure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sparse Evolutionary Training · Sequence to Sequence
