Approximately Independent Features of Languages

Eric W. Holman

arXiv:0709.4536·physics.soc-ph·November 13, 2009

Approximately Independent Features of Languages

Eric W. Holman

PDF

TL;DR

This paper identifies a set of approximately independent linguistic features using the adjusted Rand index, facilitating more accurate statistical testing in language evolution studies.

Contribution

It introduces a method to select nearly independent linguistic features from a large database for improved statistical analysis.

Findings

01

47 features with near-zero R' are identified as approximately independent.

02

Many features show low pairwise dependence, validating the approach.

03

The selected features are recommended for use in statistical tests.

Abstract

To facilitate the testing of models for the evolution of languages, the present note offers a set of linguistic features that are approximately independent of each other. To find these features, the adjusted Rand index R' is used to estimate the degree of pairwise relationship among 130 linguistic features in a large published database. Many of the R' values prove to be near 0, as predicted for independent features, and a subset of 47 features is found with an average R' of -0.0001. These 47 features are recommended for use in statistical tests that require independent units of analysis.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.