Hallucination Detection and Evaluation of Large Language Model

Chenggong Zhang; Haopeng Wang; Hexi Meng

arXiv:2512.22416·cs.CL·April 10, 2026

Hallucination Detection and Evaluation of Large Language Model

Chenggong Zhang, Haopeng Wang, Hexi Meng

PDF

TL;DR

This paper introduces HHEM, a lightweight and efficient hallucination detection model for LLMs, improving evaluation speed and accuracy while analyzing model stability and hallucination patterns.

Contribution

We propose HHEM, a novel classification-based framework that reduces evaluation time significantly and enhances hallucination detection accuracy across various LLMs.

Findings

01

HHEM reduces evaluation time from 8 hours to 10 minutes.

02

HHEM with non-fabrication checking achieves 82.2% accuracy and 78.9% TPR.

03

Larger models (7B-9B) tend to hallucinate less, but intermediate models are more unstable.

Abstract

Hallucinations in Large Language Models (LLMs) pose a significant challenge, generating misleading or unverifiable content that undermines trust and reliability. Existing evaluation methods, such as KnowHalu, employ multi-stage verification but suffer from high computational costs. To address this, we integrate the Hughes Hallucination Evaluation Model (HHEM), a lightweight classification-based framework that operates independently of LLM-based judgments, significantly improving efficiency while maintaining high detection accuracy. We conduct a comparative analysis of hallucination detection methods across various LLMs, evaluating True Positive Rate (TPR), True Negative Rate (TNR), and Accuracy on question-answering (QA) and summarization tasks. Our results show that HHEM reduces evaluation time from 8 hours to 10 minutes, while HHEM with non-fabrication checking achieves the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.