Identifying missing dictionary entries with frequency-conserving context models
Jake Ryland Williams, Eric M. Clark, James P. Bagrow, Christopher M., Danforth, and Peter Sheridan Dodds

TL;DR
This paper introduces a frequency-conserving context model to identify missing dictionary entries by analyzing phrase data, enhancing lexical coverage through collaborative filtering and prediction.
Contribution
The work presents a novel frequency-conserving phrase model applied to Wiktionary data, enabling effective detection of missing lexical entries for dictionary expansion.
Findings
Successfully identified numerous candidate missing entries
Developed a new lexical extraction technique
Enhanced dictionary coverage with minimal false positives
Abstract
In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data, (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
