Diversidade lingu\'istica e inclus\~ao digital: desafios para uma ia brasileira
Raquel Meister Ko Freitag

TL;DR
This paper explores how linguistic diversity faces threats from AI development, highlighting the bias towards dominant languages and the need for inclusive digital and AI practices that respect linguistic variety.
Contribution
It analyzes the impact of language bias in AI models using sociolinguistic insights, emphasizing the importance of inclusive data for preserving linguistic diversity.
Findings
Dominant languages are overrepresented in AI training data.
Bias towards standardized varieties perpetuates linguistic inequality.
Inclusive documentation is crucial for linguistic diversity in AI.
Abstract
Linguistic diversity is a human attribute which, with the advance of generative AIs, is coming under threat. This paper, based on the contributions of sociolinguistics, examines the consequences of the variety selection bias imposed by technological applications and the vicious circle of preserving a variety that becomes dominant and standardized because it has linguistic documentation to feed the large language models for machine learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Digital Technologies · Digital Communication and Language · Linguistics and Education Research
