Syntactic Phylogenetic Trees
Kevin Shu, Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde, Marcolli

TL;DR
This paper examines the challenges of using SSWL syntactic data for linguistic phylogenetic reconstruction, proposing methods to improve reliability through additional information and algebraic geometry techniques.
Contribution
It identifies problems in naive approaches and introduces phylogenetic algebraic geometry to assess data reliability and improve phylogenetic tree accuracy.
Findings
Restricting analysis to smaller language subfamilies improves data match
Using fully mapped parameters enhances reliability of phylogenetic trees
Phylogenetic invariants effectively evaluate tree consistency
Abstract
In this paper we identify several serious problems that arise in the use of syntactic data from the SSWL database for the purpose of computational phylogenetic reconstruction. We show that the most naive approach fails to produce reliable linguistic phylogenetic trees. We identify some of the sources of the observed problems and we discuss how they may be, at least partly, corrected by using additional information, such as prior subdivision into language families and subfamilies, and a better use of the information about ancient languages. We also describe how the use of phylogenetic algebraic geometry can help in estimating to what extent the probability distribution at the leaves of the phylogenetic tree obtained from the SSWL data can be considered reliable, by testing it on phylogenetic trees established by other forms of linguistic analysis. In simple examples, we find that, after…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Natural Language Processing Techniques · Genomics and Phylogenetic Studies
