Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
Kaiqiao Han, Tianqing Fang, Zhaowei Wang, Yangqiu Song, Mark Steedman

TL;DR
This paper introduces CR-WSC, a new dataset to evaluate LLM reasoning robustness by concept reversal, and proposes AoT, a prompt method to enhance reasoning consistency and robustness in LLMs.
Contribution
The paper presents CR-WSC for assessing reasoning robustness and introduces AoT, a novel prompting technique to improve LLM reasoning reliability.
Findings
LLMs' performance drops significantly on CR-WSC with concept reversal
AoT prompt method improves LLM robustness and reasoning consistency
Experiments demonstrate enhanced reasoning accuracy with AoT
Abstract
While Large Language Models (LLMs) have showcased remarkable proficiency in reasoning, there is still a concern about hallucinations and unreliable reasoning issues due to semantic associations and superficial logical chains. To evaluate the extent to which LLMs perform robust reasoning instead of relying on superficial logical chains, we propose a new evaluation dataset, the Concept-Reversed Winograd Schema Challenge (CR-WSC), based on the famous Winograd Schema Challenge (WSC) dataset. By simply reversing the concepts to those that are more associated with the wrong answer, we find that the performance of LLMs drops significantly despite the rationale of reasoning remaining the same. Furthermore, we propose Abstraction-of-Thought (AoT), a novel prompt method for recovering adversarial cases to normal cases using conceptual abstraction to improve LLMs' robustness and consistency in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Bayesian Modeling and Causal Inference · AI-based Problem Solving and Planning
