TL;DR
This paper introduces FASTSUBS, an efficient algorithm for accurately finding the top lexical substitutes in sentences using n-gram models, significantly reducing computational complexity for large-scale NLP tasks.
Contribution
The paper presents a novel search algorithm, FASTSUBS, that guarantees to find the top K lexical substitutes efficiently based on n-gram language models.
Findings
Sub-linear computation in K and vocabulary size V.
Implementation and dataset availability for top 100 substitutes.
Effective for large-scale lexical substitution tasks.
Abstract
Lexical substitutes have found use in areas such as paraphrasing, text simplification, machine translation, word sense disambiguation, and part of speech induction. However the computational complexity of accurately identifying the most likely substitutes for a word has made large scale experiments difficult. In this paper I introduce a new search algorithm, FASTSUBS, that is guaranteed to find the K most likely lexical substitutes for a given word in a sentence based on an n-gram language model. The computation is sub-linear in both K and the vocabulary size V. An implementation of the algorithm and a dataset with the top 100 substitutes of each token in the WSJ section of the Penn Treebank are available at http://goo.gl/jzKH0.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
