Token-Guard: Towards Token-Level Hallucination Control via Self-Checking Decoding
Yifan Zhu, Huiqiang Rong, Haoran Luo

TL;DR
Token-Guard introduces a token-level hallucination control method using self-checking decoding, internal verification, and latent space evaluation to significantly reduce hallucinations in large language models.
Contribution
It presents a novel, scalable, and modular decoding-based approach for explicit hallucination control at the token level in LLMs.
Findings
Reduces hallucinations significantly on HALU datasets
Improves generation accuracy
Offers a scalable, modular solution for reliable outputs
Abstract
Large Language Models (LLMs) often hallucinate, generating content inconsistent with the input. Retrieval-Augmented Generation (RAG) and Reinforcement Learning with Human Feedback (RLHF) can mitigate hallucinations but require resource-intensive retrieval or large-scale fine-tuning. Decoding-based methods are lighter yet lack explicit hallucination control. To address this, we present Token-Guard, a token-level hallucination control method based on self-checking decoding. Token-Guard performs internal verification at each reasoning step to detect hallucinated tokens before they propagate. Candidate fragments are further evaluated in a latent space with explicit hallucination risk scoring, while iterative pruning and regeneration dynamically correct detected errors. Experiments on HALU datasets show Token-Guard substantially reduces hallucinations and improves generation accuracy,…
Peer Reviews
Decision·ICLR 2026 Poster
1. The paper defines a well-structured three-stage pipeline, incorporating token-level, segment-level, and global iteration. 2. The experiment spans six benchmarks, demonstrating remarkable performance compared with other baselines. 3. The method solves a meaningful task. It detects and mitigates the hallucination at the token-level, segment-level, and globally, preventing hallucination propagation. 4. The case study in Table 4 explicitly demonstrates token-wise correction, supporting the claim
1. The paper claims that current decoding methods lack token-level hallucination checking mechanism. But layer contrasting method (e.g. DoLa, Contrastive decoding) already mitigate hallucinations at the token level and achieve strong results. Relevant baselines are not discussed or compared. 2. In the latent token environment initialization, the method mentioned it requires initializing the accepted tokens a_j, but no details are provided. 3. Some datasets (e.g., RAGTruth, PubMedQA) show near-ze
1) The paper tackles a pertinent research avenue in autoregressive LLMs, namely that of controlled decoding to mitigate hallucinations in generated responses. While the proposed method is a little complex, the paper is well-written and presents each individual component in a fairly clear and lucid manner. 2) TokenGuard is seen to achieve significant improvements in ExactMatch and F1 scores over several decoding methods including Chain-of-Thought and Tree-of-thought, and is demonstrated using st
1) While TokenGuard is seen to be quite effective, the overall method is considerably complex, and involves a large set of hyperparameter choices. This suggests that the methods may not be very practical in several realistic settings. This also raises the question of how these hyperparameters can be set - for instance in Appendix G, some sets are shown, but could the authors clarify if the F1 scores shown are for the final test data, or a hold-out validation set? Furthermore, could the authors c
1. The paper proposes a clear and practical framework for controlling hallucinations at the token level using self-checking decoding, segment-level verification, and global iteration. This multi-stage design offers a novel angle on decoding reliability. 2. The experiments cover six benchmark datasets and two LLM backbones, demonstrating consistent and measurable improvements in both factual accuracy and fluency. 3. The ablation analysis is detailed and informative, helping isolate the contri
1. The method increases computation time and output length, sometimes approaching or exceeding heavy decoding frameworks such as Tree-of-Thought. The claim of being lightweight should be more carefully qualified. 2. The method relies on many fixed hyperparameters, but the paper provides no sensitivity or stability analysis to show robustness across different values. 3. The definition of latent hallucination scoring is intuitive but not empirically verified; it is unclear whether cosine simil
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Ferroelectric and Negative Capacitance Devices
