OPERA: Alleviating Hallucination in Multi-Modal Large Language Models   via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang; Xiaoyi Dong; Pan Zhang; Bin Wang; Conghui He; Jiaqi; Wang; Dahua Lin; Weiming Zhang; Nenghai Yu

arXiv:2311.17911·cs.CV·March 13, 2024·5 cites

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation

Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi, Wang, Dahua Lin, Weiming Zhang, Nenghai Yu

PDF

Open Access 2 Repos

TL;DR

OPERA is a decoding strategy for multi-modal large language models that reduces hallucinations by penalizing over-trust in certain tokens and reallocating token choices based on retrospection, without extra training or data.

Contribution

It introduces a novel decoding method combining an over-trust penalty and retrospection-allocation to mitigate hallucinations in MLLMs without additional data or training.

Findings

01

Significantly reduces hallucinations across various MLLMs.

02

Effective without additional training or external knowledge.

03

Proven to be generalizable and efficient.

Abstract

Hallucination, posed as a pervasive challenge of multi-modal large language models (MLLMs), has significantly impeded their real-world usage that demands precise judgment. Existing methods mitigate this issue with either training with specific designed data or inferencing with external knowledge from other sources, incurring inevitable additional costs. In this paper, we present OPERA, a novel MLLM decoding method grounded in an Over-trust Penalty and a Retrospection-Allocation strategy, serving as a nearly free lunch to alleviate the hallucination issue without additional data, knowledge, or training. Our approach begins with an interesting observation that, most hallucinations are closely tied to the knowledge aggregation patterns manifested in the self-attention matrix, i.e., MLLMs tend to generate new tokens by focusing on a few summary tokens, but not all the previous tokens. Such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Machine Learning in Healthcare · Advanced Graph Neural Networks