Computational Approaches for Integrating out Subjectivity in Cognate Synonym Selection
Luise H\"auser, Gerhard J\"ager, Alexandros Stamatakis

TL;DR
This paper investigates whether including all synonyms in cognate data improves phylogenetic tree inference, introducing probabilistic character matrices and demonstrating that synonym inclusion affects tree topology.
Contribution
It introduces probabilistic character matrices for cognate data and shows that including all synonyms yields more accurate phylogenetic trees, challenging prior synonym selection practices.
Findings
Including all synonyms results in plausible phylogenetic trees.
A priori synonym selection can lead to different tree topologies.
Probabilistic matrices improve dataset representation.
Abstract
Working with cognate data involves handling synonyms, that is, multiple words that describe the same concept in a language. In the early days of language phylogenetics it was recommended to select one synonym only. However, as we show here, binary character matrices, which are used as input for computational methods, do allow for representing the entire dataset including all synonyms. Here we address the question how one can and if one should include all synonyms or whether it is preferable to select synonyms a priori. To this end, we perform maximum likelihood tree inferences with the widely used RAxML-NG tool and show that it yields plausible trees when all synonyms are used as input. Furthermore, we show that a priori synonym selection can yield topologically substantially different trees and we therefore advise against doing so. To represent cognate data including all synonyms, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Lexicography and Language Studies · linguistics and terminology studies
