Precise Task Formalization Matters in Winograd Schema Evaluations
Haokun Liu, William Huang, Dhara A. Mungra, Samuel R. Bowman

TL;DR
Recent improvements in Winograd Schema Challenge performance are largely due to changes in task formalization and evaluation setup, not genuine advances in reasoning ability, highlighting the need for more structured benchmarks.
Contribution
The paper demonstrates how task formalization significantly influences Winograd Schema performance and advocates for standardized evaluation procedures to ensure fair comparisons.
Findings
Framing as multiple choice improves accuracy by 2-6 points.
Reusing pretrained language modeling heads reduces sensitivity to hyperparameters.
Task formalization impacts performance more than model reasoning improvements.
Abstract
Performance on the Winograd Schema Challenge (WSC), a respected English commonsense reasoning benchmark, recently rocketed from chance accuracy to 89% on the SuperGLUE leaderboard, with relatively little corroborating evidence of a correspondingly large improvement in reasoning ability. We hypothesize that much of this improvement comes from recent changes in task formalization---the combination of input specification, loss function, and reuse of pretrained parameters---by users of the dataset, rather than improvements in the pretrained model's reasoning ability. We perform an ablation on two Winograd Schema datasets that interpolates between the formalizations used before and after this surge, and find (i) framing the task as multiple choice improves performance by 2-6 points and (ii) several additional techniques, including the reuse of a pretrained language modeling head, can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
