Lexical evolution rates by automated stability measure
Filippo Petroni, Maurizio Serva

TL;DR
This paper introduces an automated method using normalized Levenshtein distances to determine the optimal vocabulary size for phylogenetic language analysis, improving language relationship reconstructions.
Contribution
It presents a novel automated approach to assess lexical stability and optimize vocabulary size in phylogenetic language studies using normalized Levenshtein distances.
Findings
Automated methodology for lexical stability assessment.
Improved language relationship reconstruction accuracy.
Quantitative analysis of word stability in language evolution.
Abstract
Phylogenetic trees can be reconstructed from the matrix which contains the distances between all pairs of languages in a family. Recently, we proposed a new method which uses normalized Levenshtein distances among words with same meaning and averages on all the items of a given list. Decisions about the number of items in the input lists for language comparison have been debated since the beginning of glottochronology. The point is that words associated to some of the meanings have a rapid lexical evolution. Therefore, a large vocabulary comparison is only apparently more accurate then a smaller one since many of the words do not carry any useful information. In principle, one should find the optimal length of the input lists studying the stability of the different items. In this paper we tackle the problem with an automated methodology only based on our normalized Levenshtein distance.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
