Semantic categories of artifacts and animals reflect efficient coding
Noga Zaslavsky, Terry Regier, Naftali Tishby, Charles Kemp

TL;DR
This paper demonstrates that the Information Bottleneck principle explains the formation and evolution of semantic categories for artifacts and animals across languages, supporting the idea of efficient coding in communication.
Contribution
It extends the IB-based account of semantic categories from color to containers and animals, showing its broad applicability across domains and languages.
Findings
Container naming in Dutch and French is near-optimal in the IB sense.
IB accounts for soft categories and inconsistent naming patterns.
A hierarchy of animal categories derived from IB captures cross-linguistic tendencies.
Abstract
It has been argued that semantic categories across languages reflect pressure for efficient communication. Recently, this idea has been cast in terms of a general information-theoretic principle of efficiency, the Information Bottleneck (IB) principle, and it has been shown that this principle accounts for the emergence and evolution of named color categories across languages, including soft structure and patterns of inconsistent naming. However, it is not yet clear to what extent this account generalizes to semantic domains other than color. Here we show that it generalizes to two qualitatively different semantic domains: names for containers, and for animals. First, we show that container naming in Dutch and French is near-optimal in the IB sense, and that IB broadly accounts for soft categories and inconsistent naming patterns in both languages. Second, we show that a hierarchy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Categorization, perception, and language · Machine Learning in Bioinformatics
