From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning   Tasks

Xiaofeng Zhang; Yihao Quan; Chen Shen; Xiaosong Yuan; Shaotian Yan,; Liang Xie; Wenxiao Wang; Chaochen Gu; Hao Tang; Jieping Ye

arXiv:2406.06579·cs.CL·October 18, 2024·2 cites

From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks

Xiaofeng Zhang, Yihao Quan, Chen Shen, Xiaosong Yuan, Shaotian Yan,, Liang Xie, Wenxiao Wang, Chaochen Gu, Hao Tang, Jieping Ye

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper analyzes how information flows through large vision-language models during reasoning tasks, revealing layer-wise convergence patterns and the impact of visual features on model performance.

Contribution

It introduces an integrated attention analysis method combining LLaVA-CAM with gradient-based insights to study visual information flow in LVLMs.

Findings

01

Information flow converges in shallow layers

02

Deeper layers show diversified information processing

03

Flow patterns vary with context and task

Abstract

Large Vision Language Models (LVLMs) achieve great performance on visual-language reasoning tasks, however, the black-box nature of LVLMs hinders in-depth research on the reasoning mechanism. As all images need to be converted into image tokens to fit the input format of large language models (LLMs) along with natural language prompts, sequential visual representation is essential to the performance of LVLMs, and the information flow analysis approach can be an effective tool for determining interactions between these representations. In this paper, we propose integrating attention analysis with LLaVA-CAM, concretely, attention scores highlight relevant regions during forward propagation, while LLaVA-CAM captures gradient changes through backward propagation, revealing key image features. By exploring the information flow from the perspective of visual representation contribution, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangbaijin/From-Redundancy-to-Relevance
jaxOfficial

Videos

From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques