Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection

Yong Xie; Karan Aggarwal; Aitzaz Ahmad; Stephen Lau

arXiv:2410.12278·cs.CV·January 12, 2026

Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection

Yong Xie, Karan Aggarwal, Aitzaz Ahmad, Stephen Lau

PDF

Open Access

TL;DR

This paper introduces a two-step synthetic data generation method for hallucination detection, improving detector robustness and generalization across tasks and generators.

Contribution

It proposes a novel task-specific synthetic data generation pipeline with pattern guidance and style alignment, enhancing hallucination detector training.

Findings

01

Synthetic datasets improve hallucination detector accuracy.

02

Detectors trained on synthetic data outperform ICL-based detectors by 32%.

03

Method generalizes well across different tasks and generators.

Abstract

We present a novel approach to automatically generate non-trivial task-specific synthetic datasets for hallucination detection. Our approach features a two-step generation-selection pipeline, using hallucination pattern guidance and a language style alignment during generation. Hallucination pattern guidance leverages the most important task-specific hallucination patterns while language style alignment aligns the style of the synthetic dataset with benchmark text. To obtain robust supervised detectors from synthetic datasets, we also adopt a data mixture strategy to improve performance robustness and generalization. Our results on three datasets show that our generated hallucination text is more closely aligned with non-hallucinated text versus baselines, to train hallucination detectors with better generalization. Our hallucination detectors trained on synthetic datasets outperform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health Research Topics · Functional Brain Connectivity Studies · Anomaly Detection Techniques and Applications