Experience and Prediction: A Metric of Hardness for a Novel Litmus Test
Nicos Isaak, Loizos Michael

TL;DR
This paper introduces a machine learning-based system to accurately and efficiently predict the perceived difficulty of Winograd schemas, enhancing the ability to differentiate schemas based on human-like hardness levels.
Contribution
The paper presents a novel ML system using random forest and deep learning approaches to assess Winograd schema hardness more accurately and rapidly than previous methods.
Findings
The ML system outperforms earlier approaches in accuracy and speed.
Human performance varies significantly across different schemas.
Large-scale experiments reveal patterns in schema difficulty.
Abstract
In the last decade, the Winograd Schema Challenge (WSC) has become a central aspect of the research community as a novel litmus test. Consequently, the WSC has spurred research interest because it can be seen as the means to understand human behavior. In this regard, the development of new techniques has made possible the usage of Winograd schemas in various fields, such as the design of novel forms of CAPTCHAs. Work from the literature that established a baseline for human adult performance on the WSC has shown that not all schemas are the same, meaning that they could potentially be categorized according to their perceived hardness for humans. In this regard, this \textit{hardness-metric} could be used in future challenges or in the WSC CAPTCHA service to differentiate between Winograd schemas. Recent work of ours has shown that this could be achieved via the design of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methodstravel james
