Can LLMs Detect Their Own Hallucinations?
Sora Kadotani, Kosuke Nishida, Kyosuke Nishida

TL;DR
This paper explores whether large language models can identify their own factual errors, proposing a framework and method that enable GPT-3.5 Turbo to detect over half of its hallucinations using Chain-of-Thought reasoning.
Contribution
It introduces a novel framework and classification method using Chain-of-Thought to assess and improve LLMs' ability to detect their own hallucinations.
Findings
GPT-3.5 Turbo with CoT detected 58.2% of hallucinations
LLMs can detect hallucinations if they contain sufficient knowledge in parameters
Proposed framework effectively estimates LLMs' hallucination detection capability
Abstract
Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs' capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT- Turbo with CoT detected of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Misinformation and Its Impacts · Benford’s Law and Fraud Detection
