A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models

Lixin Xiu; Xufang Luo; Hideki Nakayama

arXiv:2603.29676·cs.LG·April 1, 2026

A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models

Lixin Xiu, Xufang Luo, Hideki Nakayama

PDF

1 Repo 1 Video

TL;DR

This paper introduces a novel information decomposition framework to analyze large vision-language models, revealing their internal decision processes and strategies beyond mere accuracy metrics.

Contribution

It develops a scalable, model-agnostic pipeline using partial information decomposition to profile and understand LVLMs' information dynamics across multiple dimensions.

Findings

01

Identifies two main task regimes: synergy-driven and knowledge-driven.

02

Discovers two contrasting family-level strategies: fusion-centric and language-centric.

03

Uncovers a three-phase pattern in layer-wise processing and highlights visual instruction tuning as key for fusion learning.

Abstract

Large vision-language models (LVLMs) achieve impressive performance, yet their internal decision-making processes remain opaque, making it difficult to determine if the success stems from true multimodal fusion or from reliance on unimodal priors. To address this attribution gap, we introduce a novel framework using partial information decomposition (PID) to quantitatively measure the "information spectrum" of LVLMs -- decomposing a model's decision-relevant information into redundant, unique, and synergistic components. By adapting a scalable estimator to modern LVLM outputs, our model-agnostic pipeline profiles 26 LVLMs on four datasets across three dimensions -- breadth (cross-model & cross-task), depth (layer-wise information dynamics), and time (learning dynamics across training). Our analysis reveals two key results: (i) two task regimes (synergy-driven vs. knowledge-driven) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RiiShin/pid-lvlm-analysis
github

Videos

A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models· slideslive