Causal Evidence for Attention Head Imbalance in Modality Conflict Hallucination
Jinrui Jiang, Zhangtai Wu, Zhen Wu, Xinyu Dai

TL;DR
This paper investigates the internal mechanisms behind modality-conflict hallucination in multimodal large language models, identifying key attention heads responsible and proposing a causal intervention to reduce hallucinations effectively.
Contribution
It introduces a causal analysis of attention heads in MLLMs, revealing opposing roles in hallucination, and proposes MACI, a targeted intervention that significantly reduces hallucinations during inference.
Findings
Identified attention head groups with opposing causal roles in hallucination.
Distributed heads drive hallucinations; high-importance heads resist them.
MACI reduces hallucination with a favorable trade-off on multiple benchmarks.
Abstract
Modality-conflict hallucination occurs when multimodal large language models (MLLMs) prioritize erroneous textual premises over contradictory visual evidence. To understand why visual evidence fails to prevail during generation, we take a mechanistic perspective and examine which internal components drive or resist this failure. We perform head-level causal analysis using path patching across five open-source MLLMs and identify two groups of attention heads with opposing causal roles: hallucination-driving heads and hallucination-resisting heads. We find a consistent asymmetry: driving effects are more broadly distributed and carry greater aggregate weight, whereas resisting effects concentrate in a small number of high-importance heads. Ablation experiments further confirm that these groups exert opposing effects during generation: distributed driving influence and localized resistance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
