Evolution of the lexicon: a probabilistic point of view

Maurizio Serva

arXiv:2510.22220·cs.CL·October 28, 2025

Evolution of the lexicon: a probabilistic point of view

Maurizio Serva

PDF

TL;DR

This paper analyzes the probabilistic limits of estimating language divergence times using lexicon evolution, considering both word replacement and gradual lexical modifications, highlighting their impact on accuracy.

Contribution

It introduces a probabilistic framework that accounts for both word replacement and lexical modification, improving the understanding of language divergence estimation.

Findings

01

Limits on accuracy due to probabilistic nature of word replacement

02

Gradual lexical modifications significantly influence language evolution

03

Incorporating lexical modifications enhances temporal separation estimates

Abstract

The Swadesh approach for determining the temporal separation between two languages relies on the stochastic process of words replacement (when a complete new word emerges to represent a given concept). It is well known that the basic assumptions of the Swadesh approach are often unrealistic due to various contamination phenomena and misjudgments (horizontal transfers, variations over time and space of the replacement rate, incorrect assessments of cognacy relationships, presence of synonyms, and so on). All of this means that the results cannot be completely correct. More importantly, even in the unrealistic case that all basic assumptions are satisfied, simple mathematics places limits on the accuracy of estimating the temporal separation between two languages. These limits, which are purely probabilistic in nature and which are often neglected in lexicostatistical studies, are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.