Can LLMs Detect Their Own Hallucinations?

Sora Kadotani; Kosuke Nishida; Kyosuke Nishida

arXiv:2511.11087·cs.CL·November 17, 2025

Can LLMs Detect Their Own Hallucinations?

Sora Kadotani, Kosuke Nishida, Kyosuke Nishida

PDF

Open Access

TL;DR

This paper explores whether large language models can identify their own factual errors, proposing a framework and method that enable GPT-3.5 Turbo to detect over half of its hallucinations using Chain-of-Thought reasoning.

Contribution

It introduces a novel framework and classification method using Chain-of-Thought to assess and improve LLMs' ability to detect their own hallucinations.

Findings

01

GPT-3.5 Turbo with CoT detected 58.2% of hallucinations

02

LLMs can detect hallucinations if they contain sufficient knowledge in parameters

03

Proposed framework effectively estimates LLMs' hallucination detection capability

Abstract

Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs' capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT- $3.5$ Turbo with CoT detected $58.2%$ of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Misinformation and Its Impacts · Benford’s Law and Fraud Detection