From Data Scarcity to Data Care: Reimagining Language Technologies for Serbian and other Low-Resource Languages
Smiljana Antonijevic Ubois

TL;DR
This paper examines the challenges faced by low-resource languages like Serbian in AI language technology development and proposes a culturally grounded framework called Data Care to promote inclusivity and ethical practices.
Contribution
It introduces the Data Care framework, integrating cultural and ethical considerations into corpus design and governance for low-resource languages.
Findings
Historical destruction of Serbian textual heritage impacts language technology.
Current approaches often prioritize functionality over linguistic nuance.
Data Care offers a culturally grounded, ethical model for language technology development.
Abstract
Large language models are commonly trained on dominant languages like English, and their representation of low resource languages typically reflects cultural and linguistic biases present in the source language materials. Using the Serbian language as a case, this study examines the structural, historical, and sociotechnical factors shaping language technology development for low resource languages in the AI age. Drawing on semi structured interviews with ten scholars and practitioners, including linguists, digital humanists, and AI developers, it traces challenges rooted in historical destruction of Serbian textual heritage, intensified by contemporary issues that drive reductive, engineering first approaches prioritizing functionality over linguistic nuance. These include superficial transliteration, reliance on English-trained models, data bias, and dataset curation lacking cultural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Digital Humanities and Scholarship · Ethics and Social Impacts of AI
