Hidden Structure and Function in the Lexicon
Olivier Picard, M\'elanie Lord, Alexandre Blondin-Mass\'e, Odile, Marcotte, Marcos Lopes, Stevan Harnad

TL;DR
This paper uses graph theory to identify a core set of words in the dictionary that can define all other words, revealing structural insights about lexical organization and grounding.
Contribution
It introduces the concept of a Kernel, Core, Satellites, and Minimal Grounding Sets in the dictionary, providing a novel graph-theoretic analysis of lexical structure.
Findings
Approximately 10% of dictionary words form a Kernel that defines all others.
The Kernel contains a large strongly connected component called the Core and smaller Satellites.
Minimal Grounding Sets are learned earlier, more concrete, and more frequent than other words.
Abstract
How many words are needed to define all the words in a dictionary? Graph-theoretic analysis reveals that about 10% of a dictionary is a unique Kernel of words that define one another and all the rest, but this is not the smallest such subset. The Kernel consists of one huge strongly connected component (SCC), about half its size, the Core, surrounded by many small SCCs, the Satellites. Core words can define one another but not the rest of the dictionary. The Kernel also contains many overlapping Minimal Grounding Sets (MGSs), each about the same size as the Core, each part-Core, part-Satellite. MGS words can define all the rest of the dictionary. They are learned earlier, more concrete and more frequent than the rest of the dictionary. Satellite words, not correlated with age or frequency, are less concrete (more abstract) words that are also needed for full lexical power.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
