Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection

Shrey Pandit; Ashwin Vinod; Liu Leqi; Ying Ding

arXiv:2505.17558·cs.CL·May 26, 2025

Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection

Shrey Pandit, Ashwin Vinod, Liu Leqi, Ying Ding

PDF

TL;DR

This paper introduces HaluCheck, a curriculum learning approach using synthetic hallucination negatives in DPO training, significantly enhancing LLM hallucination detection and robustness across benchmarks.

Contribution

It presents a novel curriculum DPO method with synthetic negatives, improving hallucination detection in LLMs beyond existing techniques.

Findings

01

Up to 24% improvement on challenging benchmarks.

02

Enhanced zero-shot robustness over larger models.

03

Effective use of synthetic hallucination negatives.

Abstract

Aligning large language models (LLMs) to accurately detect hallucinations remains a significant challenge due to the sophisticated nature of hallucinated text. Recognizing that hallucinated samples typically exhibit higher deceptive quality than traditional negative samples, we use these carefully engineered hallucinations as negative examples in the DPO alignment procedure. Our method incorporates a curriculum learning strategy, gradually transitioning the training from easier samples, identified based on the greatest reduction in probability scores from independent fact checking models, to progressively harder ones. This structured difficulty scaling ensures stable and incremental learning. Experimental evaluation demonstrates that our HaluCheck models, trained with curriculum DPO approach and high quality negative samples, significantly improves model performance across various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDirect Preference Optimization