The Latent Structure of Dictionaries
Philippe Vincent-Lamarre, Alexandre Blondin Mass\'e, Marcos Lopes,, M\'elanie Lord, Odile Marcotte, Stevan Harnad

TL;DR
This paper analyzes the structure of dictionaries as directed graphs, revealing a core set of words essential for defining all others, and explores implications for language learning and cognitive models.
Contribution
It introduces a graph-theoretic analysis of dictionaries, identifying minimal defining sets and their properties, advancing understanding of lexical structure and language acquisition.
Findings
Approximately 10% of words form the dictionary Kernel.
The Core is a strongly connected subset with definitional paths among its words.
The minimal set of defining words (MinSet) is about 1% of the dictionary.
Abstract
How many words (and which ones) are sufficient to define all other words? When dictionaries are analyzed as directed graphs with links from defining words to defined words, they reveal a latent structure. Recursively removing all words that are reachable by definition but that do not define any further words reduces the dictionary to a Kernel of about 10%. This is still not the smallest number of words that can define all the rest. About 75% of the Kernel turns out to be its Core, a Strongly Connected Subset of words with a definitional path to and from any pair of its words and no word's definition depending on a word outside the set. But the Core cannot define all the rest of the dictionary. The 25% of the Kernel surrounding the Core consists of small strongly connected subsets of words: the Satellites. The size of the smallest set of words that can define all the rest (the graph's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLexicography and Language Studies · Natural Language Processing Techniques · Advanced Text Analysis Techniques
