Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

Feilong Tang; Chengzhi Liu; Zhongxing Xu; Ming Hu; Zelin Peng; Zhiwei Yang; Jionglong Su; Minquan Lin; Yifan Peng; Xuelian Cheng; Imran Razzak; Zongyuan Ge

arXiv:2505.16652·cs.CV·June 10, 2025

Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding

Feilong Tang, Chengzhi Liu, Zhongxing Xu, Ming Hu, Zelin Peng, Zhiwei Yang, Jionglong Su, Minquan Lin, Yifan Peng, Xuelian Cheng, Imran Razzak, Zongyuan Ge

PDF

TL;DR

This paper introduces FarSight, a causal decoding strategy that reduces hallucinations in multimodal large language models by optimizing token interaction and attention propagation, improving accuracy in visual question answering.

Contribution

The paper proposes a novel plug-and-play causal mask-based decoding method, FarSight, to mitigate hallucinations by enhancing token interaction and attention control in MLLMs.

Findings

01

Significant reduction in hallucinations across multiple benchmarks.

02

Improved performance in both image and video question answering tasks.

03

Effective token propagation and attention management demonstrated.

Abstract

Recent advancements in multimodal large language models (MLLMs) have significantly improved performance in visual question answering. However, they often suffer from hallucinations. In this work, hallucinations are categorized into two main types: initial hallucinations and snowball hallucinations. We argue that adequate contextual information can be extracted directly from the token interaction process. Inspired by causal inference in the decoding strategy, we propose to leverage causal masks to establish information propagation between multimodal tokens. The hypothesis is that insufficient interaction between those tokens may lead the model to rely on outlier tokens, overlooking dense and rich contextual cues. Therefore, we propose to intervene in the propagation process by tackling outlier tokens to enhance in-context inference. With this goal, we present FarSight, a versatile…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Causal inference