DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models

Jiani Guo; Xiangke Zeng; Jie Wu; Zuchao Li

arXiv:2601.01156·cs.CL·January 6, 2026

DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models

Jiani Guo, Xiangke Zeng, Jie Wu, Zuchao Li

PDF

Open Access

TL;DR

This paper introduces DHI, a training framework for large language models that enhances hallucination diversity and reduces factual errors by inducing a wider range of hallucinations without needing pre-annotated data, improving reliability.

Contribution

DHI enables the Evil LLM to generate diverse hallucinations through a novel loss function and attention masking, advancing contrastive factuality control without relying on pre-labeled hallucination datasets.

Findings

01

DHI significantly outperforms existing contrastive decoding methods on multiple benchmarks.

02

The approach effectively increases hallucination diversity while maintaining factual accuracy.

03

Empirical results demonstrate improved factuality and robustness in LLM outputs.

Abstract

Large language models (LLMs) frequently produce inaccurate or fabricated information, known as "hallucinations," which compromises their reliability. Existing approaches often train an "Evil LLM" to deliberately generate hallucinations on curated datasets, using these induced hallucinations to guide contrastive decoding against a reliable "positive model" for hallucination mitigation. However, this strategy is limited by the narrow diversity of hallucinations induced, as Evil LLMs trained on specific error types tend to reproduce only these particular patterns, thereby restricting their overall effectiveness. To address these limitations, we propose DHI (Diverse Hallucination Induction), a novel training framework that enables the Evil LLM to generate a broader range of hallucination types without relying on pre-annotated hallucination data. DHI employs a modified loss function that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Mental Health via Writing