Automated words stability and languages phylogeny
Filippo Petroni, Maurizio Serva

TL;DR
This paper introduces an automated method using normalized Levenshtein distance to measure language distances and analyze word stability and phylogeny, improving understanding of linguistic evolution and relationships.
Contribution
It presents a novel automated approach for assessing language relationships and word stability based solely on normalized Levenshtein distance, advancing computational historical linguistics.
Findings
The method effectively quantifies language distances.
Word stability varies with meaning and usage frequency.
Automated analysis aligns with traditional linguistic classifications.
Abstract
The idea of measuring distance between languages seems to have its roots in the work of the French explorer Dumont D'Urville (D'Urville 1832). He collected comparative words lists of various languages during his voyages aboard the Astrolabe from 1826 to1829 and, in his work about the geographical division of the Pacific, he proposed a method to measure the degree of relation among languages. The method used by modern glottochronology, developed by Morris Swadesh in the 1950s (Swadesh 1952), measures distances from the percentage of shared cognates, which are words with a common historical origin. Recently, we proposed a new automated method which uses normalized Levenshtein distance among words with the same meaning and averages on the words contained in a list. Another classical problem in glottochronology is the study of the stability of words corresponding to different meanings.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Authorship Attribution and Profiling
