DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models
Jiani Guo, Xiangke Zeng, Jie Wu, Zuchao Li

TL;DR
This paper introduces DHI, a training framework for large language models that enhances hallucination diversity and reduces factual errors by inducing a wider range of hallucinations without needing pre-annotated data, improving reliability.
Contribution
DHI enables the Evil LLM to generate diverse hallucinations through a novel loss function and attention masking, advancing contrastive factuality control without relying on pre-labeled hallucination datasets.
Findings
DHI significantly outperforms existing contrastive decoding methods on multiple benchmarks.
The approach effectively increases hallucination diversity while maintaining factual accuracy.
Empirical results demonstrate improved factuality and robustness in LLM outputs.
Abstract
Large language models (LLMs) frequently produce inaccurate or fabricated information, known as "hallucinations," which compromises their reliability. Existing approaches often train an "Evil LLM" to deliberately generate hallucinations on curated datasets, using these induced hallucinations to guide contrastive decoding against a reliable "positive model" for hallucination mitigation. However, this strategy is limited by the narrow diversity of hallucinations induced, as Evil LLMs trained on specific error types tend to reproduce only these particular patterns, thereby restricting their overall effectiveness. To address these limitations, we propose DHI (Diverse Hallucination Induction), a novel training framework that enables the Evil LLM to generate a broader range of hallucination types without relying on pre-annotated hallucination data. DHI employs a modified loss function that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Mental Health via Writing
