FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely   Lexical Substitutes Based on an N-gram Language Model

Deniz Yuret

arXiv:1205.5407·cs.CL·September 4, 2012

FASTSUBS: An Efficient and Exact Procedure for Finding the Most Likely Lexical Substitutes Based on an N-gram Language Model

Deniz Yuret

PDF

1 Repo

TL;DR

This paper introduces FASTSUBS, an efficient algorithm for accurately finding the top lexical substitutes in sentences using n-gram models, significantly reducing computational complexity for large-scale NLP tasks.

Contribution

The paper presents a novel search algorithm, FASTSUBS, that guarantees to find the top K lexical substitutes efficiently based on n-gram language models.

Findings

01

Sub-linear computation in K and vocabulary size V.

02

Implementation and dataset availability for top 100 substitutes.

03

Effective for large-scale lexical substitution tasks.

Abstract

Lexical substitutes have found use in areas such as paraphrasing, text simplification, machine translation, word sense disambiguation, and part of speech induction. However the computational complexity of accurately identifying the most likely substitutes for a word has made large scale experiments difficult. In this paper I introduce a new search algorithm, FASTSUBS, that is guaranteed to find the K most likely lexical substitutes for a given word in a sentence based on an n-gram language model. The computation is sub-linear in both K and the vocabulary size V. An implementation of the algorithm and a dataset with the top 100 substitutes of each token in the WSJ section of the Penn Treebank are available at http://goo.gl/jzKH0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

denizyuret/fastsubs-googlecode
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.