Precise Task Formalization Matters in Winograd Schema Evaluations

Haokun Liu; William Huang; Dhara A. Mungra; Samuel R. Bowman

arXiv:2010.04043·cs.CL·October 9, 2020

Precise Task Formalization Matters in Winograd Schema Evaluations

Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman

PDF

Open Access 1 Repo

TL;DR

Recent improvements in Winograd Schema Challenge performance are largely due to changes in task formalization and evaluation setup, not genuine advances in reasoning ability, highlighting the need for more structured benchmarks.

Contribution

The paper demonstrates how task formalization significantly influences Winograd Schema performance and advocates for standardized evaluation procedures to ensure fair comparisons.

Findings

01

Framing as multiple choice improves accuracy by 2-6 points.

02

Reusing pretrained language modeling heads reduces sensitivity to hyperparameters.

03

Task formalization impacts performance more than model reasoning improvements.

Abstract

Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability. We hypothesize that much of this improvement comes from recent changes in task formalization---the combination of input specification, loss function, and reuse of pretrained parameters---by users of the dataset, rather than improvements in the pretrained model's reasoning ability. We perform an ablation on two Winograd Schema datasets that interpolates between the formalizations used before and after this surge, and find (i) framing the task as multiple choice improves performance by 2-6 points and (ii) several additional techniques, including the reuse of a pretrained language modeling head, can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nyu-mll/wsc-formalizations
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)