Shaping the Future of Endangered and Low-Resource Languages -- Our Role in the Age of LLMs: A Keynote at ECIR 2024
Josiane Mothe (IRIT-SIG)

TL;DR
This paper discusses the potential and challenges of using Large Language Models to preserve endangered languages, emphasizing a collaborative approach between technology and human expertise, with a focus on the Occitan language.
Contribution
It explores innovative paths for leveraging AI in language preservation, highlighting ethical considerations and practical strategies for endangered language revitalization.
Findings
LLMs can aid in translating and generating content for endangered languages
Collaborative human-AI approaches are essential for effective language preservation
Addressing ethical challenges is crucial for responsible AI use in this domain
Abstract
Isidore of Seville is credited with the adage that it is language that gives birth to a people, and not the other way around , underlining the profound role played by language in the formation of cultural and social identity. Today, of the more than 7100 languages listed, a significant number are endangered. Since the 1970s, linguists, information seekers and enthusiasts have helped develop digital resources and automatic tools to support a wide range of languages, including endangered ones. The advent of Large Language Model (LLM) technologies holds both promise and peril. They offer unprecedented possibilities for the translation and generation of content and resources, key elements in the preservation and revitalisation of languages. They also present threat of homogenisation, cultural oversimplification and the further marginalisation of already vulnerable languages. The talk this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus
