Identifying missing dictionary entries with frequency-conserving context   models

Jake Ryland Williams; Eric M. Clark; James P. Bagrow; Christopher M.; Danforth; and Peter Sheridan Dodds

arXiv:1503.02120·cs.CL·July 30, 2015

Identifying missing dictionary entries with frequency-conserving context models

Jake Ryland Williams, Eric M. Clark, James P. Bagrow, Christopher M., Danforth, and Peter Sheridan Dodds

PDF

TL;DR

This paper introduces a frequency-conserving context model to identify missing dictionary entries by analyzing phrase data, enhancing lexical coverage through collaborative filtering and prediction.

Contribution

The work presents a novel frequency-conserving phrase model applied to Wiktionary data, enabling effective detection of missing lexical entries for dictionary expansion.

Findings

01

Successfully identified numerous candidate missing entries

02

Developed a new lexical extraction technique

03

Enhanced dictionary coverage with minimal false positives

Abstract

In an effort to better understand meaning from natural language texts, we explore methods aimed at organizing lexical objects into contexts. A number of these methods for organization fall into a family defined by word ordering. Unlike demographic or spatial partitions of data, these collocation models are of special importance for their universal applicability. While we are interested here in text and have framed our treatment appropriately, our work is potentially applicable to other areas of research (e.g., speech, genomics, and mobility patterns) where one has ordered categorical data, (e.g., sounds, genes, and locations). Our approach focuses on the phrase (whether word or larger) as the primary meaning-bearing lexical unit and object of study. To do so, we employ our previously developed framework for generating word-conserving phrase-frequency data. Upon training our model with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.