WinoGrande: An Adversarial Winograd Schema Challenge at Scale

Keisuke Sakaguchi; Ronan Le Bras; Chandra Bhagavatula; Yejin Choi

arXiv:1907.10641·cs.CL·November 25, 2019·84 cites

WinoGrande: An Adversarial Winograd Schema Challenge at Scale

Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, Yejin Choi

PDF

Open Access 5 Repos 10 Models 5 Datasets

TL;DR

WinoGrande is a large-scale, carefully constructed dataset designed to evaluate and improve the robustness of commonsense reasoning models, revealing that current models still lag behind human performance and highlighting the importance of bias reduction.

Contribution

The paper introduces WinoGrande, a new large-scale dataset with bias reduction techniques, to better assess and advance machine commonsense reasoning capabilities.

Findings

01

State-of-the-art models achieve 59.4-79.1% accuracy on WinoGrande

02

Models perform significantly below human accuracy of 94.0%

03

WinoGrande improves transfer learning and exposes overestimations of model capabilities

Abstract

The Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2011), a benchmark for commonsense reasoning, is a set of 273 expert-crafted pronoun resolution problems originally designed to be unsolvable for statistical models that rely on selectional preferences or word associations. However, recent advances in neural language models have already reached around 90% accuracy on variants of WSC. This raises an important question whether these models have truly acquired robust commonsense capabilities or whether they rely on spurious biases in the datasets that lead to an overestimation of the true capabilities of machine commonsense. To investigate this question, we introduce WinoGrande, a large-scale dataset of 44k problems, inspired by the original WSC design, but adjusted to improve both the scale and the hardness of the dataset. The key steps of the dataset construction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Adversarial Robustness in Machine Learning