Constructing a Family Tree of Ten Indo-European Languages with Delexicalized Cross-linguistic Transfer Patterns
Yuanyuan Zhao, Weiwei Sun, Xiaojun Wan

TL;DR
This paper constructs a phylogenetic tree of ten Indo-European languages by analyzing delexicalized transfer patterns derived from web data, linking historical divergence with second language acquisition constraints.
Contribution
It introduces a neural-based method to automatically induce interpretable transfer patterns, bridging historical linguistics and SLA insights with quantitative analysis.
Findings
Transfer patterns support historical divergence hypotheses
Delexicalized transfer aligns with phylogenetic structures
Neural methods effectively extract cross-linguistic transfer patterns
Abstract
It is reasonable to hypothesize that the divergence patterns formulated by historical linguists and typologists reflect constraints on human languages, and are thus consistent with Second Language Acquisition (SLA) in a certain way. In this paper, we validate this hypothesis on ten Indo-European languages. We formalize the delexicalized transfer as interpretable tree-to-string and tree-to-tree patterns which can be automatically induced from web data by applying neural syntactic parsing and grammar induction technologies. This allows us to quantitatively probe cross-linguistic transfer and extend inquiries of SLA. We extend existing works which utilize mixed features and support the agreement between delexicalized cross-linguistic transfer and the phylogenetic structure resulting from the historical-comparative paradigm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language and cultural evolution · Topic Modeling
