HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback   Learning with Vision-enhanced Penalty Decoding

Fan Yuan; Chi Qin; Xiaogang Xu; Piji Li

arXiv:2409.20429·cs.CL·October 1, 2024

HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding

Fan Yuan, Chi Qin, Xiaogang Xu, Piji Li

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper introduces HELPD, a hierarchical feedback learning framework with vision-enhanced penalty decoding, significantly reducing hallucinations in large vision-language models while improving output quality.

Contribution

It presents a novel hierarchical feedback mechanism and penalty decoding method that effectively mitigates multimodal hallucination in LVLMs with minimal training.

Findings

01

Reduces hallucination by over 15% across benchmarks.

02

Improves text generation quality in LVLMs.

03

Seamlessly integrates with existing LVLMs.

Abstract

Large Vision-Language Models (LVLMs) have shown remarkable performance on many visual-language tasks. However, these models still suffer from multimodal hallucination, which means the generation of objects or content that violates the images. Many existing work detects hallucination by directly judging whether an object exists in an image, overlooking the association between the object and semantics. To address this issue, we propose Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding (HELPD). This framework incorporates hallucination feedback at both object and sentence semantic levels. Remarkably, even with a marginal degree of training, this approach can alleviate over 15% of hallucination. Simultaneously, HELPD penalizes the output logits according to the image attention window to avoid being overly affected by generated text. HELPD can be seamlessly integrated with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

F-Yuan303/HELPD
pytorchOfficial

Videos

HELPD: Mitigating Hallucination of LVLMs by Hierarchical Feedback Learning with Vision-enhanced Penalty Decoding· underline

Taxonomy

TopicsHallucinations in medical conditions · Schizophrenia research and treatment · Functional Brain Connectivity Studies

MethodsSoftmax · Attention Is All You Need