Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding
Ruiqi Ma, Yu Yan, Chunhong Zhang, Minghao Yin, XinChao Liu, Zhihong Jin, Zheng Hu

TL;DR
This paper introduces a novel decoding method called Hallucination Disentangled Decoding (HDD) that reduces object hallucinations in large vision-language models by disentangling visual and linguistic hallucinations without requiring additional training.
Contribution
The paper presents HDD, a training-free approach that improves object recognition in LVLMs by disentangling visual and language hallucinations through image segmentation and augmentation.
Findings
Reduces object hallucinations in LVLMs
Enhances visual recognition performance
Requires no additional training
Abstract
Large Vision-Language Models (LVLMs) bridge the gap between visual and linguistic modalities, demonstrating strong potential across a variety of domains. However, despite significant progress, LVLMs still suffer from severe hallucination issues in object recognition tasks. These models often fail to accurately identify certain objects, leading to text generation that appears fluent but does not correspond to the visual content, which can have serious consequences in real-world applications. Recently, several methods have been proposed to alleviate LVLM hallucinations, but most focus solely on reducing hallucinations in the language modality. To mitigate hallucinations in both the language and visual modalities, we introduce Hallucination Disentangled Decoding (HDD) method that requires no training. HDD enhances the original image by segmenting it and selecting images that augment the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis
