Missing data in a stochastic Dollo model for cognate data, and its application to the dating of Proto-Indo-European
Robin J. Ryder, Geoff K. Nicholls

TL;DR
This paper extends a phylogenetic model for linguistic data to handle missing data, incorporates spatial-temporal rate heterogeneity, and estimates the age of Proto-Indo-European with Bayesian methods, improving predictions and model fit.
Contribution
It introduces a new approach to handle missing data in a stochastic Dollo model and adds a catastrophe process for rate heterogeneity, applied to dating Proto-Indo-European.
Findings
Estimated age of Proto-Indo-European at 8400 years BP
Model fit validated with Bayes factors, rejecting some constraints
Inclusion of all languages improves age prediction accuracy
Abstract
Nicholls and Gray (2008) describe a phylogenetic model for trait data. They use their model to estimate branching times on Indo-European language trees from lexical data. Alekseyenko et al. (2008) extended the model and give applications in genetics. In this paper we extend the inference to handle data missing at random. When trait data are gathered, traits are thinned in a way that depends on both the trait and missing-data content. Nicholls and Gray (2008) treat missing records as absent traits. Hittite has 12% missing trait records. Its age is poorly predicted in their cross-validation. Our prediction is consistent with the historical record. Nicholls and Gray (2008) dropped seven languages with too much missing data. We fit all twenty four languages in the lexical data of Ringe (2002). In order to model spatial-temporal rate heterogeneity we add a catastrophe process to the model.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics and language evolution · Phonetics and Phonology Research
