Less is More: Mitigating Multimodal Hallucination from an EOS Decision   Perspective

Zihao Yue; Liang Zhang; Qin Jin

arXiv:2402.14545·cs.CL·May 30, 2024·1 cites

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective

Zihao Yue, Liang Zhang, Qin Jin

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

This paper investigates how large multimodal models decide when to stop generating content and proposes methods to reduce hallucinations by improving EOS decision-making and filtering training data.

Contribution

It introduces a new perspective on multimodal hallucinations focusing on EOS decision processes and offers two effective mitigation strategies without extra data.

Findings

01

Improved hallucination reduction using new training objectives

02

Effective data filtering strategy to prevent hallucinations

03

Models better assess visual content to decide generation termination

Abstract

Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they may create content that is not present in the visual inputs. In this paper, we explore a new angle of this issue: overly detailed training data hinders the model's ability to timely terminate generation, leading to continued outputs beyond visual perception limits. By investigating how the model decides to terminate generation with EOS, the special end-of-sentence token, we find that the model assesses the completeness of the entire sequence by comparing the generated text with the image. This observation suggests that the model possesses an inherent potential of making proper EOS decisions based on its visual perception to avoid overly lengthy outputs. To take advantage of such potential, we explore two methods to mitigate multimodal hallucinations: a training objective that enables the model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuezih/less-is-more
pytorchOfficial

Models

Videos

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective· underline

Taxonomy

TopicsHallucinations in medical conditions · Mental Health and Psychiatry