Tone Matters: The Impact of Linguistic Tone on Hallucination in VLMs
Weihao Hong, Zhiyuan Jiang, Bingyu Shen, Xinlei Guan, Yangyi Feng, Meng Xu, Boyang Li

TL;DR
This paper examines how different prompt styles influence hallucination behaviors in vision-language models, revealing that increased prompt coercion does not always lead to more hallucinations and highlighting model-specific limitations.
Contribution
Introduces Ghost-100, a synthetic dataset for controlled analysis of hallucinations, and a structured framework to evaluate prompt pressure effects on VLMs.
Findings
Hallucination rates vary non-monotonically with prompt intensity.
Models are more sensitive to semantic hostility than structural coercion.
Current safety measures are more effective against semantic threats than structural coercion.
Abstract
Vision-Language Models (VLMs) are increasingly used in safety-critical applications that require reliable visual grounding. However, these models often hallucinate details that are not present in the image to satisfy user prompts. While recent datasets and benchmarks have been introduced to evaluate systematic hallucinations in VLMs, many hallucination behaviors remain insufficiently characterized. In particular, prior work primarily focuses on object presence or absence, leaving it unclear how prompt phrasing and structural constraints can systematically induce hallucinations. In this paper, we investigate how different forms of prompt pressure influence hallucination behavior. We introduce Ghost-100, a procedurally generated dataset of synthetic scenes in which key visual details are deliberately removed, enabling controlled analysis of absence-based hallucinations. Using a structured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Face Recognition and Perception · Multimodal Machine Learning Applications
