Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding

Ruiqi Ma; Yu Yan; Chunhong Zhang; Minghao Yin; XinChao Liu; Zhihong Jin; Zheng Hu

arXiv:2512.19070·cs.CV·December 23, 2025

Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding

Ruiqi Ma, Yu Yan, Chunhong Zhang, Minghao Yin, XinChao Liu, Zhihong Jin, Zheng Hu

PDF

Open Access

TL;DR

This paper introduces a novel decoding method called Hallucination Disentangled Decoding (HDD) that reduces object hallucinations in large vision-language models by disentangling visual and linguistic hallucinations without requiring additional training.

Contribution

The paper presents HDD, a training-free approach that improves object recognition in LVLMs by disentangling visual and language hallucinations through image segmentation and augmentation.

Findings

01

Reduces object hallucinations in LVLMs

02

Enhances visual recognition performance

03

Requires no additional training

Abstract

Large Vision-Language Models (LVLMs) bridge the gap between visual and linguistic modalities, demonstrating strong potential across a variety of domains. However, despite significant progress, LVLMs still suffer from severe hallucination issues in object recognition tasks. These models often fail to accurately identify certain objects, leading to text generation that appears fluent but does not correspond to the visual content, which can have serious consequences in real-world applications. Recently, several methods have been proposed to alleviate LVLM hallucinations, but most focus solely on reducing hallucinations in the language modality. To mitigate hallucinations in both the language and visual modalities, we introduce Hallucination Disentangled Decoding (HDD) method that requires no training. HDD enhances the original image by segmenting it and selecting images that augment the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Digital Media Forensic Detection · Generative Adversarial Networks and Image Synthesis