Syntactic structures and the general Markov models
Sitanshu Gakkhar, Matilde Marcolli

TL;DR
This paper investigates the phylogenetic signal in syntactic structures using general Markov models, comparing derived trees with expert consensus and exploring alternative evolutionary models to assess data consistency.
Contribution
It introduces a novel application of Markov models to syntactic data and compares different evolutionary models for analyzing linguistic phylogenetics.
Findings
Syntactic data shows measurable phylogenetic signal.
General Markov models can effectively model syntactic evolution.
Alternative infinite sites models provide additional insights.
Abstract
We study phylogenetic signal present in syntactic information by considering the syntactic structures data from Longobardi (2017b), Collins (2010), Ceolin et al. (2020) and Koopman (2011). Focusing first on the general Markov models, we explore how well the the syntactic structures data conform to the hypothesis required by these models. We do this by comparing derived phylogenetic trees against trees agreed on by the linguistics community. We then interpret the methods of Ceolin et al. (2020) as an infinite sites evolutionary model and compare the consistency of the data with this alternative. The ideas and methods discussed in the present paper are more generally applicable than to the specific setting of syntactic structures, and can be used in other contexts, when analyzing consistency of data with against hypothesized evolutionary models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Genomics and Phylogenetic Studies · Natural Language Processing Techniques
