Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality
Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang

TL;DR
This paper introduces Swords, a new lexical substitution benchmark with significantly higher coverage and quality, achieved by framing the task as a classification problem guided by human judgments, outperforming previous benchmarks.
Contribution
The paper presents a novel benchmark for lexical substitution that improves data coverage and quality by using a classification approach guided by human judgments.
Findings
4.1x more substitutes per target word than previous benchmarks
Substitutes are 1.5x more appropriate based on human judgment
Higher coverage and quality in lexical substitution data
Abstract
We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context. To assist humans with writing, lexical substitution systems can suggest words that humans cannot easily think of. However, existing benchmarks depend on human recall as the only source of data, and therefore lack coverage of the substitutes that would be most helpful to humans. Furthermore, annotators often provide substitutes of low quality, which are not actually appropriate in the given context. We collect higher-coverage and higher-quality data by framing lexical substitution as a classification problem, guided by the intuition that it is easier for humans to judge the appropriateness of candidate substitutes than conjure them from memory. To this end, we use a context-free thesaurus to produce candidates and rely on human judgement to determine contextual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
