On the accuracy of language trees
Simone Pompei, Vittorio Loreto, Francesca Tria

TL;DR
This paper evaluates the accuracy of language tree reconstruction methods by comparing them with expert classifications, analyzing the impact of data completeness, and proposing new metrics for tree comparison.
Contribution
It provides a comprehensive assessment of distance-based phylogeny reconstruction methods against expert classifications and introduces new metrics for comparing language trees.
Findings
Distance-based methods show varying accuracy compared to experts.
Data completeness significantly affects reconstruction accuracy.
New tree distance metrics help evaluate method performance.
Abstract
Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
