Global-scale phylogenetic linguistic inference from lexical resources

Gerhard J\"ager

arXiv:1802.06079·cs.CL·October 18, 2018

Global-scale phylogenetic linguistic inference from lexical resources

Gerhard J\"ager

PDF

TL;DR

This paper introduces machine learning methods to automate phylogenetic linguistic inference from large lexical datasets, expanding scope beyond expert judgments and enabling analysis of extensive language diversity.

Contribution

It develops new techniques for automatic cognate detection and character creation, facilitating large-scale phylogenetic analysis without expert input.

Findings

01

Effective dissimilarity matrix for phylogenetic inference

02

Successful supervised cognate clustering with SVM

03

Binary characters suitable for phylogenetic analysis

Abstract

Automatic phylogenetic inference plays an increasingly important role in computational historical linguistics. Most pertinent work is currently based on expert cognate judgments. This limits the scope of this approach to a small number of well-studied language families. We used machine learning techniques to compile data suitable for phylogenetic inference from the ASJP database, a collection of almost 7,000 phonetically transcribed word lists over 40 concepts, covering two third of the extant world-wide linguistic diversity. First, we estimated Pointwise Mutual Information scores between sound classes using weighted sequence alignment and general-purpose optimization. From this we computed a dissimilarity matrix over all ASJP word lists. This matrix is suitable for distance-based phylogenetic inference. Second, we applied cognate clustering to the ASJP data, using supervised training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSupport Vector Machine