Automated languages phylogeny from Levenshtein distance
Maurizio Serva

TL;DR
This paper introduces an automated, objective method for constructing language family trees using Levenshtein distance, enabling quick and replicable analysis of large language datasets without subjective bias.
Contribution
The authors developed a new automated approach based on Levenshtein distance that improves objectivity and efficiency in language phylogeny analysis, avoiding subjective judgments of traditional methods.
Findings
Generated language trees consistent with previous studies
Identified new potential relationships within language families
Demonstrated method's scalability to large datasets
Abstract
Languages evolve over time in a process in which reproduction, mutation and extinction are all possible, similar to what happens to living organisms. Using this similarity it is possible, in principle, to build family trees which show the degree of relatedness between languages. The method used by modern glottochronology, developed by Swadesh in the 1950s, measures distances from the percentage of words with a common historical origin. The weak point of this method is that subjective judgment plays a relevant role. Recently we proposed an automated method that avoids the subjectivity, whose results can be replicated by studies that use the same database and that doesn't require a specific linguistic knowledge. Moreover, the method allows a quick comparison of a large number of languages. We applied our method to the Indo-European and Austronesian families, considering in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Fractal and DNA sequence analysis · Computability, Logic, AI Algorithms
