NoiseBoost: Alleviating Hallucination with Noise Perturbation for   Multimodal Large Language Models

Kai Wu; Boyuan Jiang; Zhengkai Jiang; Qingdong He; Donghao Luo,; Shengzhi Wang; Qingwen Liu; Chengjie Wang

arXiv:2405.20081·cs.CV·June 3, 2024

NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo,, Shengzhi Wang, Qingwen Liu, Chengjie Wang

PDF

Open Access 1 Repo

TL;DR

NoiseBoost is a simple yet effective method that reduces hallucinations in multimodal large language models by using noise perturbations to balance attention between visual and linguistic information, improving accuracy and enabling semi-supervised learning.

Contribution

The paper introduces NoiseBoost, a novel noise perturbation technique that alleviates hallucinations in MLLMs and enables semi-supervised learning, with consistent performance improvements across training strategies.

Findings

01

Improves dense caption accuracy by 8.1% with human evaluation.

02

Enables semi-supervised learning using unlabeled data.

03

Achieves comparable results with half the data.

Abstract

Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading to excessive dependence on linguistic tokens while neglecting vision information. In this paper, we propose NoiseBoost, a broadly applicable and simple method for alleviating hallucinations for MLLMs through the integration of noise feature perturbations. Noise perturbation acts as a regularizer, facilitating a balanced distribution of attention weights among visual and linguistic tokens. Despite its simplicity, NoiseBoost consistently enhances the performance of MLLMs across common training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

KaiWU5/NoiseBoost
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · COVID-19 diagnosis using AI · Machine Learning in Healthcare