\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language
Verena Platzgummer, John McCrae, Sina Ahmadi

TL;DR
This paper critically examines how Large Language Models interact with non-standard languages, highlighting issues of digital language divide, standardization, and potential for more inclusive AI strategies through interdisciplinary analysis.
Contribution
It offers an interdisciplinary analysis of LLMs' handling of non-standard languages, combining sociolinguistics and computational perspectives to inform more equitable AI development.
Findings
LLMs often struggle with non-standard language varieties.
Standardization processes are rooted in colonial and nationalist histories.
Potential for LLMs to support decolonial and democratic language policies.
Abstract
The design of Large Language Models and generative artificial intelligence has been shown to be "unfair" to less-spoken languages and to deepen the digital language divide. Critical sociolinguistic work has also argued that these technologies are not only made possible by prior socio-historical processes of linguistic standardisation, often grounded in European nationalist and colonial projects, but also exacerbate epistemologies of language as "monolithic, monolingual, syntactically standardized systems of meaning". In our paper, we draw on earlier work on the intersections of technology and language policy and bring our respective expertise in critical sociolinguistics and computational linguistics to bear on an interrogation of these arguments. We take two different complexes of non-standard linguistic varieties in our respective repertoires--South Tyrolean dialects, which are widely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
