Back to Square One: Artifact Detection, Training and Commonsense   Disentanglement in the Winograd Schema

Yanai Elazar; Hongming Zhang; Yoav Goldberg; Dan Roth

arXiv:2104.08161·cs.CL·October 14, 2021·1 cites

Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema

Yanai Elazar, Hongming Zhang, Yoav Goldberg, Dan Roth

PDF

Open Access

TL;DR

This paper critically examines the Winograd Schema benchmarks, revealing that recent performance gains are largely due to artifacts and supervision rather than true commonsense reasoning, and proposes improved evaluation methods.

Contribution

It introduces a more robust evaluation framework for WS, identifies artifacts in existing benchmarks, and demonstrates that current models lack genuine commonsense reasoning in zero-shot settings.

Findings

01

Current WS evaluation is sub-optimal

02

Models perform randomly in strict zero-shot settings

03

Progress is mainly due to supervised training artifacts

Abstract

The Winograd Schema (WS) has been proposed as a test for measuring commonsense capabilities of models. Recently, pre-trained language model-based approaches have boosted performance on some WS benchmarks but the source of improvement is still not clear. This paper suggests that the apparent progress on WS may not necessarily reflect progress in commonsense reasoning. To support this claim, we first show that the current evaluation method of WS is sub-optimal and propose a modification that uses twin sentences for evaluation. We also propose two new baselines that indicate the existence of artifacts in WS benchmarks. We then develop a method for evaluating WS-like sentences in a zero-shot setting to account for the commonsense reasoning abilities acquired during the pretraining and observe that popular language models perform randomly in this setting when using our more strict…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI