Modeling Color Terminology Across Thousands of Languages
Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David, Yarowsky

TL;DR
This study uses computational linguistic measures on cross-linguistic data to evaluate and refine the Berlin and Kay color term hypotheses, supporting their universality and suggesting a spectrum rather than a strict dichotomy.
Contribution
Introduces 14 empirical metrics to analyze color terminology across languages, validating and extending Berlin and Kay's hypotheses with quantitative evidence.
Findings
Strong correlation (gamma=0.96) with Berlin and Kay's color term partition
Empirical support for the universal acquisition sequence of color terms
Suggests viewing color term categorization as a spectrum rather than a binary division
Abstract
There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses. Collectively, the 14 empirically-grounded computational linguistic metrics we design---as well as their aggregation---correlate strongly with both the Berlin and Kay basic/secondary color term partition (gamma=0.96) and their hypothesized universal acquisition sequence. The measures and result provide further empirical evidence from computational linguistics in support of their claims, as well as additional nuance: they suggest treating the partition as a spectrum instead of a dichotomy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCategorization, perception, and language · Language, Metaphor, and Cognition · Multisensory perception and integration
