# Evaluating the phylogenetic signal of morphosyntax

**Authors:** Ruby Sleeman, Maria-Margarita Makri, Elena Anagnostopoulou, Emmanuel D. Ladoukakis, Dimitris Michelioudakis, Christos Zioutis, Pavlos Pavlidis

PMC · DOI: 10.1515/psicl-2025-0030 · Poznan Studies in Contemporary Linguistics · 2026-02-09

## TL;DR

This paper explores whether morphosyntactic features can be used to study language evolution, similar to how word lists are used.

## Contribution

The study introduces methods to evaluate morphosyntactic characters for phylogenetic analysis in linguistics.

## Key findings

- Morphosyntactic characters show potential but require careful curation to avoid conflating historical and universal tendencies.
- Parsimony scores and hill-climbing algorithms help assess the historical signal of these characters.
- Refining character definitions is expected to improve phylogenetic accuracy.

## Abstract

Computational linguistic phylogenetics has so far relied heavily on cognate data. In contrast, the potential of morphosyntactic characters as a valuable source for phylogenetic analysis has been largely overlooked. We argue that morphosyntactic characters may conflate historical signal with the results of homoplasies, horizontal transfer, and universal tendencies, and must be scrutinized in terms of their propensity to change and borrowing, analogously to the curation of lexical data which produced the Swadesh lists. In this paper we make a start by evaluating a set of morphosyntactic characters based on the World Atlas of Language Structures using three methods: we (1) calculated Pearson correlation coefficients for each character against different language groupings, reflecting either shared ancestry (genera) or contact (geographical proximity); (2) counted the minimum number of mutations needed for the distribution of a character’s states on a cognate-based reference tree (parsimony score), testing whether they correctly reflect language change known from historical linguistics; and (3) ran a classic hill-climbing algorithm to determine which random subsets of characters produced a phylogeny closest to a reference tree. We conclude that these are useful tools, but expect that making the definitions of the characters more theoretically informed will produce a stronger historical signal.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13001714/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13001714/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC13001714/full.md

---
Source: https://tomesphere.com/paper/PMC13001714