Understanding Information Storage and Transfer in Multi-modal Large   Language Models

Samyadeep Basu; Martin Grayson; Cecily Morrison; Besmira Nushi; Soheil; Feizi; Daniela Massiceti

arXiv:2406.04236·cs.CV·June 7, 2024

Understanding Information Storage and Transfer in Multi-modal Large Language Models

Samyadeep Basu, Martin Grayson, Cecily Morrison, Besmira Nushi, Soheil, Feizi, Daniela Massiceti

PDF

Open Access

TL;DR

This paper investigates how multi-modal large language models process and transfer information, revealing early-layer reliance on MLP and self-attention blocks and identifying key visual tokens responsible for information flow.

Contribution

It introduces a causal information tracing method for multi-modal models and a new VQA-Constraints dataset, advancing understanding of information mechanisms in MLLMs.

Findings

01

MLLMs rely on early layers' MLP and self-attention blocks for information storage.

02

A small subset of visual tokens transfer information from images to causal blocks.

03

MultEdit can correct errors and add information by targeting causal blocks.

Abstract

Understanding the mechanisms of information storage and transfer in Transformer-based models is important for driving model understanding progress. Recent work has studied these mechanisms for Large Language Models (LLMs), revealing insights on how information is stored in a model's parameters and how information flows to and from these parameters in response to specific prompts. However, these studies have not yet been extended to Multi-modal Large Language Models (MLLMs). Given their expanding capabilities and real-world use, we start by studying one aspect of these models -- how MLLMs process information in a factual visual question answering task. We use a constraint-based formulation which views a visual question as having a set of visual or textual constraints that the model's generated answer must satisfy to be correct (e.g. What movie directed by the director in this photo has…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSparse Evolutionary Training