Scalable Token-Level Hallucination Detection in Large Language Models
Rui Min, Tianyu Pang, Chao Du, Minhao Cheng, Yi R. Fung

TL;DR
This paper introduces TokenHD, a scalable token-level hallucination detection method for large language models, improving detection accuracy and generalization without relying on step segmentation.
Contribution
The paper presents TokenHD, a novel pipeline with a scalable data engine and importance-weighted training for effective token-level hallucination detection in LLMs.
Findings
A small 0.6B detector outperforms larger models like QwQ-32B.
Detection performance improves with model size from 0.6B to 8B.
The detector generalizes well across diverse scenarios.
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities, but they still frequently produce hallucinations. These hallucinations are difficult to detect in reasoning-intensive tasks, where the content appears coherent but contains errors like logical flaws and unreliable intermediate results. While step-level analysis is commonly used to detect internal hallucinations, it suffers from limited granularity and poor scalability due to its reliance on step segmentation. To address these limitations, we propose TokenHD, a holistic pipeline for training token-level hallucination detectors. Specifically, TokenHD consists of a scalable data engine for synthesizing large-scale hallucination annotations along with a training recipe featuring an importance-weighted strategy for robust model training. To systematically assess the detection performance, we also provide a rigorous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
