LIMBA: An Open-Source Framework for the Preservation and Valorization of Low-Resource Languages using Generative Models
Salvatore Mario Carta, Stefano Chessa, Giulia Contu, Andrea Corriga,, Andrea Deidda, Gianni Fenu, Luca Frigau, Alessandro Giuliani, Luca Grassi,, Marco Manolo Manca, Mirko Marras, Francesco Mola, Bastianino Mossa,, Piergiorgio Mura, Marco Ortu, Leonardo Piano, Simone Pisano

TL;DR
This paper introduces LIMBA, an open-source framework designed to generate linguistic resources for endangered low-resource languages, exemplified by Sardinian, to aid in their preservation and revitalization using generative models.
Contribution
It presents a novel framework that addresses data scarcity in low-resource languages, enabling the development of linguistic tools through generative models for preservation efforts.
Findings
Successful application to Sardinian language case study
Enhanced data creation for low-resource language modeling
Support for language revitalization and standardization
Abstract
Minority languages are vital to preserving cultural heritage, yet they face growing risks of extinction due to limited digital resources and the dominance of artificial intelligence models trained on high-resource languages. This white paper proposes a framework to generate linguistic tools for low-resource languages, focusing on data creation to support the development of language models that can aid in preservation efforts. Sardinian, an endangered language, serves as the case study to demonstrate the framework's effectiveness. By addressing the data scarcity that hinders intelligent applications for such languages, we contribute to promoting linguistic diversity and support ongoing efforts in language standardization and revitalization through modern technologies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
