VOPE: Revisiting Hallucination of Vision-Language Models in Voluntary Imagination Task
Xingming Long, Jie Zhang, Shiguang Shan, Xilin Chen

TL;DR
This paper introduces VOPE, a new evaluation method for assessing hallucinations in vision-language models during voluntary imagination tasks, revealing high hallucination rates and limited mitigation effectiveness.
Contribution
The paper proposes VOPE, a novel presence evaluation method specifically designed for voluntary imagination tasks in LVLMs, highlighting the need for new hallucination mitigation strategies.
Findings
Most LVLMs hallucinate heavily during voluntary imagination.
Performance in presence evaluation is poor on imagined objects.
Existing mitigation methods have limited effect in these tasks.
Abstract
Most research on hallucinations in Large Vision-Language Models (LVLMs) focuses on factual description tasks that prohibit any output absent from the image. However, little attention has been paid to hallucinations in voluntary imagination tasks, e.g., story writing, where the models are expected to generate novel content beyond the given image. In these tasks, it is inappropriate to simply regard such imagined novel content as hallucinations. To address this limitation, we introduce Voluntary-imagined Object Presence Evaluation (VOPE)-a novel method to assess LVLMs' hallucinations in voluntary imagination tasks via presence evaluation. Specifically, VOPE poses recheck-based questions to evaluate how an LVLM interprets the presence of the imagined objects in its own response. The consistency between the model's interpretation and the object's presence in the image is then used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hallucinations in medical conditions · Face Recognition and Perception
