Energy-Guided Decoding for Object Hallucination Mitigation

Xixi Liu; Ailin Deng; Christopher Zach

arXiv:2507.07731·cs.CV·July 11, 2025

Energy-Guided Decoding for Object Hallucination Mitigation

Xixi Liu, Ailin Deng, Christopher Zach

PDF

Open Access

TL;DR

This paper introduces an energy-based decoding method for large vision-language models that effectively reduces object hallucination bias and improves accuracy in visual question answering tasks.

Contribution

The paper proposes a simple energy-based decoding approach that dynamically selects hidden states to mitigate hallucination bias in VLMs, outperforming existing methods.

Findings

01

Reduces yes-ratio bias by 8.81% on average

02

Improves accuracy by 4.82% over greedy decoding

03

Enhances performance across three VQA benchmarks

Abstract

Mitigating object hallucination in large vision-language models (LVLMs) is critical to their safe deployment. Existing methods either are restricted to specific decoding methods, or demand sophisticated modifications to visual inputs, or rely on knowledge from external models. In this work, we first reveal the phenomenon that VLMs exhibit significant imbalance in the ``Yes'' ratio ( \ie, the fraction of ``Yes'' answers among the total number of questions) across three different visual question answering (VQA) datasets. Furthermore, we propose an energy-based decoding method, which dynamically selects the hidden states from the layer with minimal energy score. It is simple yet effective in reducing the bias for the yes ratio while boosting performance across three benchmarks (POPE, MME, and MMVP). Our method consistently improves accuracy and F1 score on three VQA datasets across three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning