Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination

Jinrui Jiang; Zhangtai Wu; Zhen Wu; Xinyu Dai

arXiv:2605.19250·cs.AI·May 20, 2026

Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination

Jinrui Jiang, Zhangtai Wu, Zhen Wu, Xinyu Dai

PDF

TL;DR

This paper investigates the internal mechanisms behind modality-conflict hallucination in multimodal large language models, identifying key attention heads responsible and proposing a causal intervention to reduce hallucinations effectively.

Contribution

It introduces a causal analysis of attention heads in MLLMs, revealing opposing roles in hallucination, and proposes MACI, a targeted intervention that significantly reduces hallucinations during inference.

Findings

01

Identified attention head groups with opposing causal roles in hallucination.

02

Distributed heads drive hallucinations; high-importance heads resist them.

03

MACI reduces hallucination with a favorable trade-off on multiple benchmarks.

Abstract

Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads. We find a consistent asymmetry: driving effects are more broadly distributed and carry greater aggregate weight, whereas resisting effects concentrate in a small number of high-importance heads. Ablation experiments further confirm that these groups exert opposing effects during generation: distributed driving influence and localized resistance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.