\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language

Verena Platzgummer; John McCrae; Sina Ahmadi

arXiv:2603.28213·cs.CL·March 31, 2026

\textit{Versteasch du mi?} Computational and Socio-Linguistic Perspectives on GenAI, LLMs, and Non-Standard Language

Verena Platzgummer, John McCrae, Sina Ahmadi

PDF

TL;DR

This paper critically examines how Large Language Models interact with non-standard languages, highlighting issues of digital language divide, standardization, and potential for more inclusive AI strategies through interdisciplinary analysis.

Contribution

It offers an interdisciplinary analysis of LLMs' handling of non-standard languages, combining sociolinguistics and computational perspectives to inform more equitable AI development.

Findings

01

LLMs often struggle with non-standard language varieties.

02

Standardization processes are rooted in colonial and nationalist histories.

03

Potential for LLMs to support decolonial and democratic language policies.

Abstract

The design of Large Language Models and generative artificial intelligence has been shown to be "unfair" to less-spoken languages and to deepen the digital language divide. Critical sociolinguistic work has also argued that these technologies are not only made possible by prior socio-historical processes of linguistic standardisation, often grounded in European nationalist and colonial projects, but also exacerbate epistemologies of language as "monolithic, monolingual, syntactically standardized systems of meaning". In our paper, we draw on earlier work on the intersections of technology and language policy and bring our respective expertise in critical sociolinguistics and computational linguistics to bear on an interrogation of these arguments. We take two different complexes of non-standard linguistic varieties in our respective repertoires--South Tyrolean dialects, which are widely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.