Automated languages phylogeny from Levenshtein distance

Maurizio Serva

arXiv:0911.3280·cs.CL·July 3, 2012

Automated languages phylogeny from Levenshtein distance

Maurizio Serva

PDF

Open Access

TL;DR

This paper introduces an automated, objective method for constructing language family trees using Levenshtein distance, enabling quick and replicable analysis of large language datasets without subjective bias.

Contribution

The authors developed a new automated approach based on Levenshtein distance that improves objectivity and efficiency in language phylogeny analysis, avoiding subjective judgments of traditional methods.

Findings

01

Generated language trees consistent with previous studies

02

Identified new potential relationships within language families

03

Demonstrated method's scalability to large datasets

Abstract

Languages evolve over time in a process in which reproduction, mutation and extinction are all possible, similar to what happens to living organisms. Using this similarity it is possible, in principle, to build family trees which show the degree of relatedness between languages. The method used by modern glottochronology, developed by Swadesh in the 1950s, measures distances from the percentage of words with a common historical origin. The weak point of this method is that subjective judgment plays a relevant role. Recently we proposed an automated method that avoids the subjectivity, whose results can be replicated by studies that use the same database and that doesn't require a specific linguistic knowledge. Moreover, the method allows a quick comparison of a large number of languages. We applied our method to the Indo-European and Austronesian families, considering in both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Fractal and DNA sequence analysis · Computability, Logic, AI Algorithms