# Word learning as category formation

**Authors:** Spencer Caplan

PMC · DOI: 10.1371/journal.pone.0327615 · PLOS One · 2025-07-03

## TL;DR

This paper explores how children learn word meanings by generalizing from examples, and proposes a new computational model that explains this process through local learning rather than global statistical inference.

## Contribution

The paper introduces the Naïve Generalization Model (NGM), a unified computational model of word learning based on local, incremental category formation.

## Key findings

- The number of training objects and their presentation timing independently influence word generalization.
- The NGM successfully accounts for multiple phenomena in word learning, including the suspicious coincidence effect.
- Rational learning behavior may emerge from local, mechanistic processes rather than global optimization.

## Abstract

A fundamental question in word learning is how, given only evidence about what objects a word has previously referred to, children are able to generalize to the correct class. How does a learner end up knowing that “poodle” only picks out a specific subset of dogs rather than the broader class and vice versa? Numerous phenomena have been identified in guiding learner behavior such as the “suspicious coincidence effect” (SCE)—that an increase in the sample size of training objects facilitates more narrow (subordinate) word meanings. While SCE seems to support a class of models based in statistical inference, such rational behavior is, in fact, consistent with a range of algorithmic processes. Notably, the broadness of semantic generalizations is further affected by the temporal manner in which objects are presented—either simultaneously or sequentially. First, I evaluate the experimental evidence on the factors influencing generalization in word learning. A reanalysis of existing data demonstrates that both the number of training objects and their presentation-timing independently affect learning. This independent effect has been obscured by prior literature’s focus on possible interactions between the two. Second, I present a computational model for learning that accounts for both sets of phenomena in a unified way. The Naïve Generalization Model (NGM) offers an explanation of word learning phenomena grounded in category formation. Under the NGM, learning is local and incremental, without the need to perform a global optimization over pre-specified hypotheses. This computational model is tested against human behavior on seven different experimental conditions for word learning, varying over presentation-timing, number, and hierarchical relation between training items. Looking both at qualitative parameter-independent behavior and quantitative parameter-tuned output, these results support the NGM and suggest that rational learning behavior may arise from local, mechanistic processes rather than global statistical inference.

## Full-text entities

- **Species:** Canis lupus familiaris (dog, subspecies) [taxon 9615], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12225872/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12225872/full.md

## References

94 references — full list in the complete paper: https://tomesphere.com/paper/PMC12225872/full.md

---
Source: https://tomesphere.com/paper/PMC12225872