Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation

Qiming Li; Zekai Ye; Xiaocheng Feng; Weihong Zhong; Weitao Ma; Xiachong Feng

arXiv:2511.05923·cs.CV·November 20, 2025

Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation

Qiming Li, Zekai Ye, Xiaocheng Feng, Weihong Zhong, Weitao Ma, Xiachong Feng

PDF

Open Access 1 Video

TL;DR

This paper introduces a comprehensive causal analysis framework for large vision-language models, revealing key mechanisms of visual object perception and proposing an intervention technique that improves hallucination mitigation without sacrificing performance.

Contribution

The paper presents FCCT, a detailed causal tracing method covering all tokens and model components, and proposes IRI, a training-free intervention technique that enhances perception and reduces hallucinations.

Findings

01

MHSAs of last tokens in middle layers are crucial for cross-modal aggregation

02

FFNs show a hierarchical progression in visual object representation

03

IRI improves hallucination mitigation while maintaining model performance

Abstract

Despite the remarkable advancements of Large Vision-Language Models (LVLMs), the mechanistic interpretability remains underexplored. Existing analyses are insufficiently comprehensive and lack examination covering visual and textual tokens, model components, and the full range of layers. This limitation restricts actionable insights to improve the faithfulness of model output and the development of downstream tasks, such as hallucination mitigation. To address this limitation, we introduce Fine-grained Cross-modal Causal Tracing (FCCT) framework, which systematically quantifies the causal effects on visual object perception. FCCT conducts fine-grained analysis covering the full range of visual and textual tokens, three core model components including multi-head self-attention (MHSA), feed-forward networks (FFNs), and hidden states, across all decoder layers. Our analysis is the first to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Causal Tracing of Object Representations in Large Vision Language Models: Mechanistic Interpretability and Hallucination Mitigation· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning