LLMs cannot find reasoning errors, but can correct them given the error   location

Gladys Tyen; Hassan Mansoor; Victor C\u{a}rbune; Peter Chen; Tony Mak

arXiv:2311.08516·cs.AI·June 5, 2024·6 cites

LLMs cannot find reasoning errors, but can correct them given the error location

Gladys Tyen, Hassan Mansoor, Victor C\u{a}rbune, Peter Chen, Tony Mak

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that large language models struggle to identify reasoning errors but can effectively correct them when provided with the error location, highlighting a key limitation and potential solution for improving LLM reasoning.

Contribution

The study benchmarks LLMs' mistake-finding abilities, shows correction improves with known error locations, and introduces a classifier for locating mistakes without ground truth labels.

Findings

01

LLMs have difficulty finding logical mistakes in reasoning tasks.

02

Providing error location information significantly improves correction performance.

03

A small classifier trained on out-of-domain data outperforms prompting large models in mistake detection.

Abstract

While self-correction has shown promise in improving LLM outputs in terms of style and quality (e.g. Chen et al., 2023b; Madaan et al., 2023), recent attempts to self-correct logical or reasoning errors often cause correct answers to become incorrect, resulting in worse performances overall (Huang et al., 2023). In this paper, we show that poor self-correction performance stems from LLMs' inability to find logical mistakes, rather than their ability to correct a known mistake. Firstly, we benchmark several state-of-the-art LLMs on their mistake-finding ability and demonstrate that they generally struggle with the task, even in highly objective, unambiguous cases. Secondly, we test the correction abilities of LLMs -- separately from mistake finding -- using a backtracking setup that feeds ground truth mistake location information to the model. We show that this boosts downstream task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

whgtyen/big-bench-mistake
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)