Novel Aficionados and Doppelg\"angers: a referential task for semantic representations of individual entities
Andrea Bruera, Aur\'elie Herbelot

TL;DR
This paper investigates why proper names are harder to learn and retrieve than common nouns by analyzing their linguistic distributions using a new referential task, dataset, and models, revealing that individual entities are less distinguishable in distributional semantics.
Contribution
It introduces the Doppelg"anger test and the Novel Aficionados dataset to analyze the semantic distinctions of proper names versus common nouns in distributional models.
Findings
Distributional representations of individual entities are less distinguishable than those of common nouns.
The linguistic distribution of proper names reflects their cognitive difficulty in learning and retrieval.
The results mirror human semantic cognition patterns.
Abstract
In human semantic cognition, proper names (names which refer to individual entities) are harder to learn and retrieve than common nouns. This seems to be the case for machine learning algorithms too, but the linguistic and distributional reasons for this behaviour have not been investigated in depth so far. To tackle this issue, we show that the semantic distinction between proper names and common nouns is reflected in their linguistic distributions by employing an original task for distributional semantics, the Doppelg\"anger test, an extensive set of models, and a new dataset, the Novel Aficionados dataset. The results indicate that the distributional representations of different individual entities are less clearly distinguishable from each other than those of common nouns, an outcome which intriguingly mirrors human cognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
