A Closer Look at the Self-Verification Abilities of Large Language   Models in Logical Reasoning

Ruixin Hong; Hongming Zhang; Xinyu Pang; Dong Yu; Changshui Zhang

arXiv:2311.07954·cs.AI·March 26, 2024·1 cites

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning

Ruixin Hong, Hongming Zhang, Xinyu Pang, Dong Yu, Changshui Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper evaluates the self-verification abilities of large language models in logical reasoning, revealing their struggles with identifying fallacies and highlighting the need for improved self-assessment techniques.

Contribution

It introduces the FALLACIES dataset for categorizing reasoning fallacies and provides a comprehensive analysis of LLMs' ability to detect logical errors.

Findings

01

LLMs often fail to accurately identify reasoning fallacies.

02

Existing models struggle to guarantee the validity of their self-verification.

03

The study offers insights for enhancing self-verification methods in LLMs.

Abstract

Logical reasoning has been an ongoing pursuit in the field of AI. Despite significant advancements made by large language models (LLMs), they still struggle with complex logical reasoning problems. To enhance reasoning performance, one promising direction is scalable oversight, which requires LLMs to identify their own errors and then improve by themselves. Various self-verification methods have been proposed in pursuit of this goal. Nevertheless, whether existing models understand their own errors well is still under investigation. In this paper, we take a closer look at the self-verification abilities of LLMs in the context of logical reasoning, focusing on their ability to identify logical fallacies accurately. We introduce a dataset, FALLACIES, containing 232 types of reasoning fallacies categorized in a hierarchical taxonomy. By conducting exhaustive experiments on FALLACIES, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raising-hrx/fallacies
noneOfficial

Videos

A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification