Approximately Independent Features of Languages
Eric W. Holman

TL;DR
This paper identifies a set of approximately independent linguistic features using the adjusted Rand index, facilitating more accurate statistical testing in language evolution studies.
Contribution
It introduces a method to select nearly independent linguistic features from a large database for improved statistical analysis.
Findings
47 features with near-zero R' are identified as approximately independent.
Many features show low pairwise dependence, validating the approach.
The selected features are recommended for use in statistical tests.
Abstract
To facilitate the testing of models for the evolution of languages, the present note offers a set of linguistic features that are approximately independent of each other. To find these features, the adjusted Rand index R' is used to estimate the degree of pairwise relationship among 130 linguistic features in a large published database. Many of the R' values prove to be near 0, as predicted for independent features, and a subset of 47 features is found with an average R' of -0.0001. These 47 features are recommended for use in statistical tests that require independent units of analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
