HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding
Fan Yuan, Chi Qin, Xiaogang Xu, Piji Li

TL;DR
The paper introduces HELPD, a hierarchical feedback learning framework with vision-enhanced penalty decoding, significantly reducing hallucinations in large vision-language models while improving output quality.
Contribution
It presents a novel hierarchical feedback mechanism and penalty decoding method that effectively mitigates multimodal hallucination in LVLMs with minimal training.
Findings
Reduces hallucination by over 15% across benchmarks.
Improves text generation quality in LVLMs.
Seamlessly integrates with existing LVLMs.
Abstract
Large Vision-Language Models (LVLMs) have shown remarkable performance on many visual-language tasks. However, these models still suffer from multimodal hallucination, which means the generation of objects or content that violates the images. Many existing work detects hallucination by directly judging whether an object exists in an image, overlooking the association between the object and semantics. To address this issue, we propose Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding (HELPD). This framework incorporates hallucination feedback at both object and sentence semantic levels. Remarkably, even with a marginal degree of training, this approach can alleviate over 15% of hallucination. Simultaneously, HELPD penalizes the output logits according to the image attention window to avoid being overly affected by generated text. HELPD can be seamlessly integrated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHallucinations in medical conditions · Schizophrenia research and treatment · Functional Brain Connectivity Studies
MethodsSoftmax · Attention Is All You Need
