Distinguishing Ignorance from Error in LLM Hallucinations
Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

TL;DR
This paper differentiates between two types of hallucinations in large language models—those with knowledge gaps and those with knowledge present but incorrectly used—and shows that distinguishing these can improve hallucination mitigation strategies.
Contribution
It introduces a classification of hallucinations into HK- and HK+ types, demonstrating the prevalence of HK+ and the benefits of model-specific detection datasets.
Findings
HK+ hallucinations are common across models and datasets.
Distinguishing hallucination types improves mitigation.
Models hallucinate differently on various examples.
Abstract
Large language models (LLMs) are susceptible to hallucinations -- factually incorrect outputs -- leading to a large body of work on detecting and mitigating such cases. We argue that it is important to distinguish between two types of hallucinations: ones where the model does not hold the correct answer in its parameters, which we term HK-, and ones where the model answers incorrectly despite having the required knowledge, termed HK+. We first find that HK+ hallucinations are prevalent and occur across models and datasets. Then, we demonstrate that distinguishing between these two cases is beneficial for mitigating hallucinations. Importantly, we show that different models hallucinate on different examples, which motivates constructing model-specific hallucination datasets for training detectors. Overall, our findings draw attention to classifying types of hallucinations and provide…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
-Most of existing methods tend to misuse the term of hallucination and mix up these two cases. So this paper is potentially impactful to open up a new direction. -The paper proposes an effective framework to generate good-shot and bad-shot examples.
-There are quite a few unexplained and confusing terms, such as "snowball effect" and "high knowledge". -It would be great if the authors can add a discussion section about how to leverage this categorization for hallucination mitigation. -The experiment section lacks of qualitative analysis to make the results more insightful. -Section 5.3 preemptive detection of hallucination is an interesting idea, but the description is very vague and would be difficult to duplicate.
1. This study intriguingly explores the resilience of knowledge in large language models (LLMs) when faced with incorrect demonstrations. 2. The findings indicate that a probing based on the proposed WACK dataset is successful in identifying knowledge within LLMs that is susceptible to being misled by incorrect demonstrations.
1. What is the value to analyze the types of hallucinations, namely HK- (hallucination caused by lack of knowledge) and HK+ (hallucination despite knowledge)? The scenario involving HK+ seems impractical; in real-world situations, users are unlikely to provide numerous incorrect inputs to elicit the correct answer. A scenario where incorrect information subtly influences the dialogue might be more realistic. 2. The approach to detecting factual accuracy and HK+ is strange. If a model is prompte
- Dividing hallucination into two types can be useful, as suggested by the authors. Each hallucination type (HK+, HK-) can be mitigated using different methods (e.g., HK- can be addressed with external knowledge). - Providing datasets that include a new type of hallucination (HK+) can benefit the NLP community.
If these below points can be clarified through additional responses from the authors, I would be glad to adjust my evaluation accordingly. - The justification for categorizing hallucination types is weak. The paper does not sufficiently verify whether HK+ and HK- are truly mitigated by different methods (e.g., the performance improvement with external knowledge for HK- may be greater than for HK+). Additionally, the analysis supporting the claim that HK+ and HK- have distinct characteristics is
- The presentation is good. The writing is well-structured and easy to follow. - Hallucination detection in LLMs is important and timely. Compared with existing model-specific hallucination datasets, the proposed method can generate samples with a specific type of hallucination (i.e. the model is confused by the context despite having the relevant parametric knowledge), which enables more fine-grained hallucination detection.
- The motivation behind the prompting approach used in Alice-Bob setting is unclear. - The synthetic hallucination dataset is model-specific and the data creation process is sensitive to the in-context hallucination samples in use. It is not clear how such datasets can be applied for evaluation in real-world scenarios since we cannot compare different models or different hallucination mitigation approaches on it.
* This paper proposes a novel categorization and dataset, WACK, to study hallucinations based on knowledge availability. * The experiments are well-designed and comprehensive, employing multiple models and tasks. * The results highlight the importance of model-specific datasets and pave the way for future work on targeted mitigation of hallucinations.
* This paper primarily investigates two categories of reasons behind LLM hallucinations. However, the phenomenon of hallucination has been well studied and the contribution of this work is limited. The authors should further build on their findings to better alleviate hallucinations. * The study is limited to three models in the 7B-9B range, raising questions about generalizability to larger models and commercial LLMs, e.g. gpt-series and claude. * While the synthetic setups (Bad-shots and Alice
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBenford’s Law and Fraud Detection · Pharmacovigilance and Adverse Drug Reactions · Plant-based Medicinal Research
MethodsFocus
