# Categorization in the Wild: Generalizing Cognitive Models to   Naturalistic Data across Languages

**Authors:** Lea Frermann, Mirella Lapata

arXiv: 1902.08830 · 2019-02-26

## TL;DR

This paper introduces a Bayesian cognitive model that learns and generalizes categories from natural language data across multiple languages, demonstrating the emergence of meaningful, structured categories in diverse cultural contexts.

## Contribution

It presents a novel large-scale, cross-linguistic approach to modeling human categorization using naturalistic data, extending cognitive models beyond small experimental datasets.

## Key findings

- Meaningful categories emerge across languages with structured features
- The model performs well on large, noisy datasets
- Cross-cultural categorization patterns are identified

## Abstract

Categories such as animal or furniture are acquired at an early age and play an important role in processing, organizing, and communicating world knowledge. Categories exist across cultures: they allow to efficiently represent the complexity of the world, and members of a community strongly agree on their nature, revealing a shared mental representation. Models of category learning and representation, however, are typically tested on data from small-scale experiments involving small sets of concepts with artificially restricted features; and experiments predominantly involve participants of selected cultural and socio-economical groups (very often involving western native speakers of English such as U.S. college students) . This work investigates whether models of categorization generalize (a) to rich and noisy data approximating the environment humans live in; and (b) across languages and cultures. We present a Bayesian cognitive model designed to jointly learn categories and their structured representation from natural language text which allows us to (a) evaluate performance on a large scale, and (b) apply our model to a diverse set of languages. We show that meaningful categories comprising hundreds of concepts and richly structured featural representations emerge across languages. Our work illustrates the potential of recent advances in computational modeling and large scale naturalistic datasets for cognitive science research.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.08830/full.md

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1902.08830/full.md

## References

77 references — full list in the complete paper: https://tomesphere.com/paper/1902.08830/full.md

---
Source: https://tomesphere.com/paper/1902.08830