Concept-Reversed Winograd Schema Challenge: Evaluating and Improving   Robust Reasoning in Large Language Models via Abstraction

Kaiqiao Han; Tianqing Fang; Zhaowei Wang; Yangqiu Song; Mark Steedman

arXiv:2410.12040·cs.CL·October 17, 2024

Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction

Kaiqiao Han, Tianqing Fang, Zhaowei Wang, Yangqiu Song, Mark Steedman

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces CR-WSC, a new dataset to evaluate LLM reasoning robustness by concept reversal, and proposes AoT, a prompt method to enhance reasoning consistency and robustness in LLMs.

Contribution

The paper presents CR-WSC for assessing reasoning robustness and introduces AoT, a novel prompting technique to improve LLM reasoning reliability.

Findings

01

LLMs' performance drops significantly on CR-WSC with concept reversal

02

AoT prompt method improves LLM robustness and reasoning consistency

03

Experiments demonstrate enhanced reasoning accuracy with AoT

Abstract

While Large Language Models (LLMs) have showcased remarkable proficiency in reasoning, there is still a concern about hallucinations and unreliable reasoning issues due to semantic associations and superficial logical chains. To evaluate the extent to which LLMs perform robust reasoning instead of relying on superficial logical chains, we propose a new evaluation dataset, the Concept-Reversed Winograd Schema Challenge (CR-WSC), based on the famous Winograd Schema Challenge (WSC) dataset. By simply reversing the concepts to those that are more associated with the wrong answer, we find that the performance of LLMs drops significantly despite the rationale of reasoning remaining the same. Furthermore, we propose Abstraction-of-Thought (AoT), a novel prompt method for recovering adversarial cases to normal cases using conceptual abstraction to improve LLMs' robustness and consistency in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HKUST-KnowComp/Adv-WSC
noneOfficial

Videos

Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction· underline

Taxonomy

TopicsSemantic Web and Ontologies · Bayesian Modeling and Causal Inference · AI-based Problem Solving and Planning