Swords: A Benchmark for Lexical Substitution with Improved Data Coverage   and Quality

Mina Lee; Chris Donahue; Robin Jia; Alexander Iyabor; Percy Liang

arXiv:2106.04102·cs.CL·June 15, 2021·1 cites

Swords: A Benchmark for Lexical Substitution with Improved Data Coverage and Quality

Mina Lee, Chris Donahue, Robin Jia, Alexander Iyabor, Percy Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Swords, a new lexical substitution benchmark with significantly higher coverage and quality, achieved by framing the task as a classification problem guided by human judgments, outperforming previous benchmarks.

Contribution

The paper presents a novel benchmark for lexical substitution that improves data coverage and quality by using a classification approach guided by human judgments.

Findings

01

4.1x more substitutes per target word than previous benchmarks

02

Substitutes are 1.5x more appropriate based on human judgment

03

Higher coverage and quality in lexical substitution data

Abstract

We release a new benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context. To assist humans with writing, lexical substitution systems can suggest words that humans cannot easily think of. However, existing benchmarks depend on human recall as the only source of data, and therefore lack coverage of the substitutes that would be most helpful to humans. Furthermore, annotators often provide substitutes of low quality, which are not actually appropriate in the given context. We collect higher-coverage and higher-quality data by framing lexical substitution as a classification problem, guided by the intuition that it is easier for humans to judge the appropriateness of candidate substitutes than conjure them from memory. To this end, we use a context-free thesaurus to produce candidates and rely on human judgement to determine contextual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

p-lambda/swords
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification