Lexical evolution rates by automated stability measure

Filippo Petroni; Maurizio Serva

arXiv:0912.0821·cs.CL·May 14, 2015

Lexical evolution rates by automated stability measure

Filippo Petroni, Maurizio Serva

PDF

TL;DR

This paper introduces an automated method using normalized Levenshtein distances to determine the optimal vocabulary size for phylogenetic language analysis, improving language relationship reconstructions.

Contribution

It presents a novel automated approach to assess lexical stability and optimize vocabulary size in phylogenetic language studies using normalized Levenshtein distances.

Findings

01

Automated methodology for lexical stability assessment.

02

Improved language relationship reconstruction accuracy.

03

Quantitative analysis of word stability in language evolution.

Abstract

Phylogenetic trees can be reconstructed from the matrix which contains the distances between all pairs of languages in a family. Recently, we proposed a new method which uses normalized Levenshtein distances among words with same meaning and averages on all the items of a given list. Decisions about the number of items in the input lists for language comparison have been debated since the beginning of glottochronology. The point is that words associated to some of the meanings have a rapid lexical evolution. Therefore, a large vocabulary comparison is only apparently more accurate then a smaller one since many of the words do not carry any useful information. In principle, one should find the optimal length of the input lists studying the stability of the different items. In this paper we tackle the problem with an automated methodology only based on our normalized Levenshtein distance.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.