Diversidade lingu\'istica e inclus\~ao digital: desafios para uma ia brasileira

Raquel Meister Ko Freitag

arXiv:2411.01259·cs.CL·March 24, 2026

Diversidade lingu\'istica e inclus\~ao digital: desafios para uma ia brasileira

Raquel Meister Ko Freitag

PDF

Open Access

TL;DR

This paper explores how linguistic diversity faces threats from AI development, highlighting the bias towards dominant languages and the need for inclusive digital and AI practices that respect linguistic variety.

Contribution

It analyzes the impact of language bias in AI models using sociolinguistic insights, emphasizing the importance of inclusive data for preserving linguistic diversity.

Findings

01

Dominant languages are overrepresented in AI training data.

02

Bias towards standardized varieties perpetuates linguistic inequality.

03

Inclusive documentation is crucial for linguistic diversity in AI.

Abstract

Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducation and Digital Technologies · Digital Communication and Language · Linguistics and Education Research