Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

Eva Vanmassenhove

arXiv:2507.03933·cs.CL·April 24, 2026

Losing our Tail, Again: (Un)Natural Selection & Multilingual LLMs

Eva Vanmassenhove

PDF

TL;DR

This paper discusses how the use of multilingual large language models can lead to the erosion of linguistic diversity through model collapse and self-reinforcing data loops, urging a reevaluation of NLP practices.

Contribution

It highlights the risk of linguistic flattening caused by model collapse in multilingual NLP and advocates for protecting expressive linguistic diversity.

Findings

01

Model collapse can distort data distribution and diminish low-probability linguistic features.

02

Self-consuming training loops lead to underrepresentation of linguistic diversity.

03

The paper calls for reimagining NLP to preserve multilingual expressiveness.

Abstract

Multilingual Large Language Models considerably changed how technologies influence language. While previous technologies could mediate or assist humans, there is now a tendency to offload the task of writing itself to these technologies, enabling models to change our languages more directly. While they provide us quick access to information and impressively fluent output, beneath their (apparent) sophistication lies a subtle, insidious threat: the gradual decline and loss of linguistic diversity. In this position paper, I explore how model collapse, with a particular focus on translation technology, can lead to the loss of linguistic forms, grammatical features, and cultural nuance. Model collapse refers to the consequences of self-consuming training loops, where automatically generated data (re-)enters the training data, leading to a gradual distortion of the data distribution and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.